-
Notifications
You must be signed in to change notification settings - Fork 417
Enhance recipe compatibility #1724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com>
/te-ci L0 |
|
||
recipe = self.fp8_meta["recipe"] | ||
expected_tensor_class = None | ||
if recipe.delayed() or recipe.float8_current_scaling(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a field in the recipe that has this exepected tensor class? Also, I guess for completeness we should not only deal with QuantizedTensor, but also with the *Base (e.g. Float8TensorBase) classes (I know that they are not used for weights in fp8_model_init).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a field in the recipe that has this exepected tensor class?
That is what I also like more. I did not do so due to circular import.
Now, I fixed it with local import, if this is acceptable.
But probably, we need some redesign to better fix this circular import issue (in another PR).
In the future, I would also propose to add a field in the recipe with quantizer class etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I guess for completeness we should not only deal with QuantizedTensor, but also with the *Base (e.g. Float8TensorBase) classes (I know that they are not used for weights in fp8_model_init).
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See 98799f7
else: | ||
raise RuntimeError(f"Unsupported recipe type: {recipe.__class__.__name__}") | ||
|
||
weight_tensors = [getattr(self, name) for name in self.weight_names] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timmoon10 I vaguealy recall that getattr from nn.Module was slow and that you created a faster function for it at some point, do you remember the details?
Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com>
Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com>
Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com>
ecb8b96
to
e64188b
Compare
for more information, see https://pre-commit.ci
/te-ci L0 |
Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com>
Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com>
TE1.x did not save the recipe in the state. With this fix, TE1.x checkpoint can be loaded in ToT TE2.x. (for both TE versions, used the same ToT Megatron-LM) |
Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com>
/te-ci L0 |
Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com>
for more information, see https://pre-commit.ci
/te-ci L0 |
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
/te-ci L0 |
@@ -57,6 +57,12 @@ | |||
_NUM_MAX_UB_STREAMS = 3 | |||
_MIN_STREAM_PRIORITY, _MAX_STREAM_PRIORITY = None, None | |||
layers_atomic_ring_exchange = [] | |||
_QUANTIZED_WEIGHT_TENSOR_TYPES = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We recently merged the change to introduce QuantizedTensorBase
class so you should be able to remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 7c2f5eb
Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com>
Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com>
Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com>
Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com>
/te-ci L0 |
Description
Enables recipe update on the fly. Added a user warning when this happens.
Enables TE1.x checkpoint loading.
Implemented check that is supposed to catch user-defined bug, when there is a mismatch between the recipes in
fp8_model_init
andfp8_autocast
.Example case to check: recipe is
DelayedScaling
(DelayedScaling
is set infp8_autocast()
), but the weight tensor isMXFP8Tensor
(MXFP8BlockScaling
is set infp8_model_init()
).Type of change
Changes
Please list the changes introduced in this PR:
Checklist: