[C][PyTorch]Make pytorch extensions pure cpp #1754

ksivaman · 2025-05-07T07:06:58Z

Description

This is a last part in a series of PRs making the framework extensions purely C++.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Move the remaining cuda attention functionality from PyTorch extensions to core lib, introducing the C APIs as necessary.
Fix a bug in the numerics tests.
Fix a bug in the fused attention tests.
Convert PyTorch extension from cuda to CPP. This is improves compilation speed.
mha_fill_kernel has been removed and replaced with nvte_memset.

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

ksivaman · 2025-05-09T01:47:22Z

/te-ci L0 L1

transformer_engine/common/CMakeLists.txt

transformer_engine/common/fused_attn/flash_attn.cu

transformer_engine/pytorch/csrc/extensions/attention.cpp

transformer_engine/common/include/transformer_engine/fused_attn.h

transformer_engine/common/fused_attn/thd.cu

transformer_engine/common/include/transformer_engine/fused_attn.h

transformer_engine/pytorch/csrc/extensions/attention.cpp

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

for more information, see https://pre-commit.ci

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

transformer_engine/common/fused_attn/context_parallel.cu

transformer_engine/common/fused_attn/kv_cache.cu

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

transformer_engine/common/include/transformer_engine/fused_attn.h

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

cyanguwa

LGTM. Thanks for responding so quickly.

ksivaman · 2025-05-10T00:07:26Z

/te-ci L0 L1

ksivaman added 6 commits May 7, 2025 00:30

First pass refactor

e9c868e

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

Merge branch 'NVIDIA:main' into make_pytorch_extensions_pure_cpp

86e15cf

first pass

87495f1

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

core compiles

ff32002

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

Include cuda dirs

7e4feb4

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

Compiles

f368408

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

ksivaman added the 2.4.0 label May 7, 2025

ksivaman marked this pull request as draft May 7, 2025 07:07

Merge branch 'main' into make_pytorch_extensions_pure_cpp

131fd48

ksivaman marked this pull request as ready for review May 7, 2025 17:59

ksivaman requested a review from cyanguwa May 7, 2025 17:59

ksivaman added 7 commits May 7, 2025 20:31

Fix

4524fc7

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

Fix test

b3ca75c

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

Merge branch 'NVIDIA:main' into make_pytorch_extensions_pure_cpp

7e5b392

Move grad outside autocast

69b6998

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

Merge branch 'main' into make_pytorch_extensions_pure_cpp

ca24b14

Fix kv cache

2e69054

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

Merge branch 'main' into make_pytorch_extensions_pure_cpp

5ca2d0e

cyanguwa reviewed May 9, 2025

View reviewed changes

transformer_engine/common/CMakeLists.txt Show resolved Hide resolved

cyanguwa reviewed May 9, 2025

View reviewed changes

transformer_engine/common/fused_attn/flash_attn.cu Outdated Show resolved Hide resolved

cyanguwa reviewed May 9, 2025

View reviewed changes

transformer_engine/pytorch/csrc/extensions/attention.cpp Outdated Show resolved Hide resolved

cyanguwa reviewed May 9, 2025

View reviewed changes

transformer_engine/common/include/transformer_engine/fused_attn.h Show resolved Hide resolved

cyanguwa reviewed May 9, 2025

View reviewed changes

transformer_engine/common/fused_attn/thd.cu Outdated Show resolved Hide resolved

cyanguwa reviewed May 9, 2025

View reviewed changes

transformer_engine/common/include/transformer_engine/fused_attn.h Outdated Show resolved Hide resolved

cyanguwa reviewed May 9, 2025

View reviewed changes

transformer_engine/pytorch/csrc/extensions/attention.cpp Show resolved Hide resolved

Address review comments

5397033

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

ksivaman force-pushed the make_pytorch_extensions_pure_cpp branch from d103923 to 5397033 Compare May 9, 2025 23:12

pre-commit-ci bot and others added 2 commits May 9, 2025 23:13

[pre-commit.ci] auto fixes from pre-commit.com hooks

5113e86

for more information, see https://pre-commit.ci

Change src file name in cmake

0aa469c

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

Merge branch 'main' into make_pytorch_extensions_pure_cpp

c4d09ce

cyanguwa reviewed May 9, 2025

View reviewed changes

transformer_engine/common/fused_attn/context_parallel.cu Outdated Show resolved Hide resolved

cyanguwa reviewed May 9, 2025

View reviewed changes

transformer_engine/common/fused_attn/kv_cache.cu Show resolved Hide resolved

move the kernels too

02ccb75

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

cyanguwa reviewed May 9, 2025

View reviewed changes

transformer_engine/common/include/transformer_engine/fused_attn.h Show resolved Hide resolved

ksivaman added 5 commits May 9, 2025 23:38

fix

5b2a7a0

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

Move comment

833825f

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

Move comments around

f3fe837

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

more movement

709af47

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

move

c31afad

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

cyanguwa approved these changes May 10, 2025

View reviewed changes

ksivaman merged commit 51cd441 into NVIDIA:main May 11, 2025
29 of 39 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C][PyTorch]Make pytorch extensions pure cpp #1754

[C][PyTorch]Make pytorch extensions pure cpp #1754

ksivaman commented May 7, 2025 •

edited

Loading

ksivaman commented May 9, 2025

cyanguwa left a comment

ksivaman commented May 10, 2025

[C][PyTorch]Make pytorch extensions pure cpp #1754

[C][PyTorch]Make pytorch extensions pure cpp #1754

Conversation

ksivaman commented May 7, 2025 • edited Loading

Description

Type of change

Changes

Checklist:

ksivaman commented May 9, 2025

cyanguwa left a comment

Choose a reason for hiding this comment

ksivaman commented May 10, 2025

ksivaman commented May 7, 2025 •

edited

Loading