-
Notifications
You must be signed in to change notification settings - Fork 649
Add 16A8W support and test for sigmoid operation #13569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary: This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479. ## Key Changes **1. New Quantization Configuration Function** - Add `get_symmetric_a16w8_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py` - Provides 16-bit activations with HistogramObserver (better precision than 8A8W) - Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient) - **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)** ## Benefits - **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets. Reviewed By: 3l1 Differential Revision: D79763381
Summary: - Adds linear ops test using the 16A8W config in INT16 profile. - Adds support in view ops validation for INT16 Dtype. - Validated with TOSA pipeline test. - Checked earlier marked flaky tests no longer flaky and remove markers. Note: Not verified with tosa reference model run. Differential Revision: D80308822
Summary: Add 16A8W quantization support and test for the add operation in ExecutorTorch ARM backend. This follows the pattern established for linear operations, extending int16 support to add operations. Changes: - Add INT16 dtype validation support in op_add.py - Add test_add_tensor_16a8w_tosa_INT test function - Enable test_add.py in test targets configuration The 16A8W configuration uses 16-bit activations with 8-bit weights, enabling higher precision for activations while maintaining weight efficiency. Differential Revision: D80510463
Summary: Add 16A8W quantization support and test for the mul operation in ExecutorTorch ARM backend. This follows the pattern established for linear operations, extending int16 support to mul operations. Changes: - Add INT16 dtype validation support in op_mul.py - Add test_mul_tensor_16a8w_tosa_INT test function - Enable test_mul.py in test targets configuration The 16A8W configuration uses 16-bit activations with 8-bit weights, enabling higher precision for activations while maintaining weight efficiency. Differential Revision: D80510628
Summary: Add 16A8W quantization support and test for the sigmoid operation in ExecutorTorch ARM backend. This follows the pattern established for linear and mul operations, extending int16 support to sigmoid operations. Changes: - Add INT16 dtype validation support in op_sigmoid.py - Add test_sigmoid_tensor_16a8w_tosa_INT test function - Enable test_sigmoid.py in test targets configuration The 16A8W configuration uses 16-bit activations with 8-bit weights, enabling higher precision for activations while maintaining weight efficiency. Differential Revision: D80510729
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13569
Note: Links to docs will display an error until the docs builds have been completed. ❌ 3 New FailuresAs of commit 5d8308d with merge base 3dac421 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This pull request was exported from Phabricator. Differential Revision: D80510729 |
This PR needs a
|
Summary:
Add 16A8W quantization support and test for the sigmoid operation in ExecutorTorch ARM backend.
This follows the pattern established for linear and mul operations, extending int16 support to sigmoid operations.
Changes:
The 16A8W configuration uses 16-bit activations with 8-bit weights, enabling higher precision for activations while maintaining weight efficiency.
Differential Revision: D80510729