Add multm_prev_ layer and enhance gemm() function for PLANE_WISE operations #3020

Cydral · 2024-09-26T09:48:50Z

This pull request introduces a new layer, multm_prev_, and enhances the gemm() function to support PLANE_WISE operations. These changes aim to improve the flexibility and performance of matrix multiplications in deep learning models, particularly for attention mechanisms.

New layer: multm_prev_

The multm_prev_ layer performs element-wise matrix multiplication between the current layer's input and the previous layer's output. This new layer is particularly useful for implementing attention mechanisms and other operations that require element-wise interactions between tensors.

Key features of multm_prev_:

Supports PLANE_WISE matrix multiplication
Preserves sample and channel dimensions
Efficiently handles 4D tensors

Enhancement to gemm() function:

The gemm() function has been updated to support two modes of operation: CHANNEL_WISE (default) and PLANE_WISE. This modification allows for more efficient and flexible matrix multiplications, especially when dealing with 4D tensors.

Key changes to gemm():

Added a new parameter g_mode to specify the operation mode (0 for CHANNEL_WISE, 1 for PLANE_WISE)
Implemented PLANE_WISE mode, which performs matrix multiplication for each corresponding 2D plane across all samples and channels
Updated documentation to reflect the new functionality and requirements for both modes

These changes provide greater flexibility in implementing complex neural network architectures, particularly those involving attention mechanisms or other operations requiring element-wise interactions between tensors.

A new test function, test_multm_prev(), has been added to verify the correct functionality of the multm_prev layer and the enhanced gemm() function.

…PU & CUDA) and Add `add_to` Parameter

* remove using namespace std from headers * more std:: * more std:: * more std:: on windows stuff * remove uses of using namespace std::chrono * do not use C++17 features * Add Davis suggestion * revert some more stuff * revert removing include * more std::chrono stuff

…tions

arrufat · 2024-09-26T10:44:25Z

Nice, I was just wondering if matmul_prev wouldn't be a better name.

Cydral · 2024-09-26T10:58:54Z

We can change the name without any problem. I am already dealing with compilation issues likely due to static uses of mult_prev_ (in the template part), and we will decide on the name to keep afterward.

Cydral · 2024-09-26T11:04:19Z

On the other hand, I was thinking of using the same convention for the transformation to be applied to softmax and thus have a special layer named softmaxm. So we would take mat_softmax or perhaps better msoftmax?

arrufat · 2024-09-26T11:10:40Z

Would it be too difficult to have just an attention_ layer? I know that would mean doing the backpropagation by hand inside that layer, just like the loss_barlowtwins is doing it (but that one is just a bn_con).

Cydral · 2024-09-26T11:18:00Z

It would be simpler to use for some people, but we would lose the flexibility to build attention in a potentially specific way (even though it currently follows fairly standard and structured steps). For instance, we can decide whether or not to mask, apply an additional filter for whether or not to remove a pad token before applying softmax, and so on. I was thinking more of providing, as you did for ResNet, an external definition file that gives a certain definition of the network...

arrufat · 2024-09-26T11:29:45Z

Yes, we would lose flexibility, or maybe that layer could be initialized with a struct of options that control the behavior/features of the attention layer. But yes, it would still be less flexible.

pfeatherstone · 2024-09-27T07:47:43Z

It would be harder to implement something like flash attention without an explicit attention_ layer

Cydral · 2024-09-27T09:03:05Z

Indeed, I can add a high-level declaration in the layer definition file, similar to what was done for the inception layer, like :

template <int embedding_dim, int nb_heads, typename SUBNET>
using attention_ = (...)

davisking · 2024-09-30T02:12:25Z

Sorry, I'm just catching up on these threads. Seems like this PR is still being worked on? There are conflicts with master in any case. Let me know when I should look it over :)

Cydral · 2024-09-30T07:11:08Z

@davis, no you can do the merging. I think the conflicts with the master come from the fact that I created several branches from my own Dlib fork to be able to work on several layers in parallel. Any new layers currently being created are finished and can be integrated please.
Technically, I still have a new layer to release but I'm going to wait until all the changes have been merged into the master branch to avoid any further conflicts... let me know if that's OK with you.

davisking · 2024-09-30T11:13:34Z

@davis, no you can do the merging.

You please merge them :)

I'll review them once there aren't merge conflicts.

Technically, I still have a new layer to release but I'm going to wait until all the changes have been merged into the master branch to avoid any further conflicts... let me know if that's OK with you.

Yeah that's fine :D

davisking · 2024-10-05T02:54:27Z

.vscode/settings.json

@@ -0,0 +1,5 @@
+{


Do you intend to add this? Like does this make life easier for vscode users in some way? I'm not necessarily adverse to it but it seems like the sort of thing that would normally be purely local to individual users.

No, of course. I don't know where that came from... obviously it was added by GitHub tooling when I made fixes

Not required for the merging

Cydral · 2024-11-04T14:02:18Z

@davis, could you please review this PR?

dlib/cuda/tensor_tools.h

Cydral · 2024-12-10T21:09:00Z

@davis, I think I'm on the right track now (a lot of difficulty to find an enum shared by all the classes and accessible from the CPU and CUDA codes for example) and everything is working on my side now (including the dnn.cpp test). As indicated in the closed PR, to simplify the review, I've merged all the modifications (i.e. for the two new layers) via this single PR.
However, when I run dtest, I get this error, which seems to have nothing to do with my new layers (it might have something to do with convolution). Does this sound familiar?
Running test_dnn \

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!! TEST FAILED: test_dnn !!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Failure message from test:

Error occurred at line 1518.
Error occurred in file C:\Users\Shadow\source\repos\multm-prev-layer\dlib\test\dnn.cpp.
Failing expression was max(abs(mat(filter_gradient1)-mat(filter_gradient2))) < 1e-3.
0.00897217

Testing Finished
Total number of individual testing statements executed: 955
Number of failed tests: 1
Number of passed tests: 0

Cydral · 2024-12-12T14:38:16Z

@davis, I think I'm on the right track now (a lot of difficulty to find an enum shared by all the classes and accessible from the CPU and CUDA codes for example) and everything is working on my side now (including the dnn.cpp test). As indicated in the closed PR, to simplify the review, I've merged all the modifications (i.e. for the two new layers) via this single PR. However, when I run dtest, I get this error, which seems to have nothing to do with my new layers (it might have something to do with convolution). Does this sound familiar? Running test_dnn \

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!! TEST FAILED: test_dnn !!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Failure message from test:

Error occurred at line 1518. Error occurred in file C:\Users\Shadow\source\repos\multm-prev-layer\dlib\test\dnn.cpp. Failing expression was max(abs(mat(filter_gradient1)-mat(filter_gradient2))) < 1e-3. 0.00897217

Testing Finished Total number of individual testing statements executed: 955 Number of failed tests: 1 Number of passed tests: 0

@arrufat , would you please be able to pull this development branch on your side and perhaps verify what I noticed - not during compilation but during test program execution? There is indeed an error, at least on my validation platform, in a Conv1D operation which is obviously unrelated to the introduction of the new DNN layers that I am currently adding for formalizing the attention mechanism. Just to make sure you observe the same behavior. Thanks in advance

arrufat · 2024-12-15T07:24:57Z

I've run the test on my GPU machine (dtest --runall) and they all pass.

Cydral · 2024-12-15T18:16:28Z

I've run the test on my GPU machine (dtest --runall) and they all pass.

OK, so maybe it's due to the GPU I'm using (an A4500); could it be possible to review again and merge all now? There are still other elements to work on ;-)

davisking

Nice this is great. Fix up the minor stuff I commented on and I'll merge it right away :D

dlib/cuda/cpu_dlib.cpp

dlib/cuda/cublas_dlibapi.h

dlib/cuda/cudnn_dlibapi.cpp

dlib/cuda/cudnn_dlibapi.h

dlib/cuda/tensor_tools.h

Cydral · 2024-12-19T13:11:27Z

Nice this is great. Fix up the minor stuff I commented on and I'll merge it right away :D

Done.

dlib/dnn/layers.h

Cydral and others added 10 commits September 16, 2024 13:45

Fix Stride Indexing Bugs in reorg and reorg_gradient Functions (C…

4d698fc

…PU & CUDA) and Add `add_to` Parameter

'add_to' parameter missing in cuda call reorg_gradient.launch_kernel()

1d73b6c

Merge branch 'refs/heads/master' into Cydral-master

724ec09

fix build error

4dca9b2

Adjust comment formatting to be like other dlib comments

2f68a11

Merge branch 'davisking:master' into master

64e3471

Add positional encodings layer to Dlib

640c02f

Add multm_prev layer and enhance gemm() function for PLANE_WISE opera…

0f1e250

…tions

Updates

e8e10ce

Updates

06a7f6a

Cydral added 2 commits September 30, 2024 15:17

Merge branch 'master' into multm-prev-layer

d40171d

Resynchronization with tril_ class

0d60627

davisking reviewed Oct 5, 2024

View reviewed changes

Cydral added 3 commits October 6, 2024 18:51

Delete .vscode/settings.json

ed39b2c

Not required for the merging

Merge branch 'master' into multm-prev-layer

8e2a48c

Remove duplicates

300a8c6

Small improvements to PLANE_WISE in gemm() function

d173fbd

davisking reviewed Nov 16, 2024

View reviewed changes

dlib/cuda/tensor_tools.h Outdated Show resolved Hide resolved

Cydral added 5 commits November 18, 2024 17:29

Merge branch 'davisking:master' into multm-prev-layer

89746e2

Introducing a new enum for operation modes in tensor computations

3d60227

Remove a test duplicated call in dnn tests

a257f02

Remove duplicated declaration

21dc524

Comment fixed

439bb87

Cydral requested a review from davisking November 19, 2024 20:00

Cydral added 6 commits December 7, 2024 09:25

Fixing the Cuda compilation

ca01599

Merging with updated softmax_ layer

2772dca

Fixing header for CPU compilation

1ff436e

Adding a missing cast

274f32f

Test fixed to use the new operation_mode enum

8685ed8

softmaxm test fixed

275bafc

davisking requested changes Dec 16, 2024

View reviewed changes

Cydral added 6 commits December 16, 2024 18:37

Enum test removed

6beab3b

Enum test removed

39b09d9

Fixing indentation

caed8ff

Fixing indentation

fbaa299

Test removed

f2dea1e

Move the operation_mode enumeration to its own header

c9cc82f

davisking reviewed Dec 20, 2024

View reviewed changes

dlib/dnn/layers.h Outdated Show resolved Hide resolved

davisking reviewed Dec 20, 2024

View reviewed changes

dlib/dnn/layers.h Outdated Show resolved Hide resolved

Use operation_mode instead of unsigned long

efda8e7

davisking merged commit 230c0b0 into davisking:master Dec 20, 2024
7 of 10 checks passed

Cydral deleted the multm-prev-layer branch January 3, 2025 10:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multm_prev_ layer and enhance gemm() function for PLANE_WISE operations #3020

Add multm_prev_ layer and enhance gemm() function for PLANE_WISE operations #3020

Cydral commented Sep 26, 2024

arrufat commented Sep 26, 2024

Cydral commented Sep 26, 2024

Cydral commented Sep 26, 2024

arrufat commented Sep 26, 2024

Cydral commented Sep 26, 2024

arrufat commented Sep 26, 2024

pfeatherstone commented Sep 27, 2024

Cydral commented Sep 27, 2024

davisking commented Sep 30, 2024

Cydral commented Sep 30, 2024 •

edited

Loading

davisking commented Sep 30, 2024

davisking Oct 5, 2024

Cydral Oct 6, 2024

Cydral commented Nov 4, 2024

Cydral commented Dec 10, 2024

Cydral commented Dec 12, 2024

arrufat commented Dec 15, 2024

Cydral commented Dec 15, 2024

davisking left a comment

Cydral commented Dec 19, 2024

Add multm_prev_ layer and enhance gemm() function for PLANE_WISE operations #3020

Add multm_prev_ layer and enhance gemm() function for PLANE_WISE operations #3020

Conversation

Cydral commented Sep 26, 2024

New layer: multm_prev_

Enhancement to gemm() function:

arrufat commented Sep 26, 2024

Cydral commented Sep 26, 2024

Cydral commented Sep 26, 2024

arrufat commented Sep 26, 2024

Cydral commented Sep 26, 2024

arrufat commented Sep 26, 2024

pfeatherstone commented Sep 27, 2024

Cydral commented Sep 27, 2024

davisking commented Sep 30, 2024

Cydral commented Sep 30, 2024 • edited Loading

davisking commented Sep 30, 2024

davisking Oct 5, 2024

Choose a reason for hiding this comment

Cydral Oct 6, 2024

Choose a reason for hiding this comment

Cydral commented Nov 4, 2024

Cydral commented Dec 10, 2024

Cydral commented Dec 12, 2024

arrufat commented Dec 15, 2024

Cydral commented Dec 15, 2024

davisking left a comment

Choose a reason for hiding this comment

Cydral commented Dec 19, 2024

Cydral commented Sep 30, 2024 •

edited

Loading