Below are the steps required to integrate QAT from NNCF into a training script written with PyTorch:
Quantize the model using the :doc:`Post-Training Quantization <../quantizing-models-post-training/basic-quantization-flow>` method.
.. doxygensnippet:: docs/optimization_guide/nncf/code/qat_torch.py :language: python :fragment: [quantize]
This step assumes applying fine-tuning to the model the same way it is done for the baseline model. For QAT, it is required to train the model for a few epochs with a small learning rate, for example, 1e-5. Quantized models perform all computations in floating-point precision during fine-tuning by modeling quantization errors in both forward and backward passes.
.. doxygensnippet:: docs/optimization_guide/nncf/code/qat_torch.py :language: python :fragment: [tune_model]
Note
The precision of weights transitions to INT8 only after converting the model to OpenVINO Intermediate Representation. You can expect a reduction in model footprint only for that format.
These steps outline the basics of applying the QAT method from the NNCF. However, in some cases, it is required to save/load model checkpoints during training. Since NNCF wraps the original model with its own object, it provides an API for these needs.
To save a model checkpoint, use the following API:
.. doxygensnippet:: docs/optimization_guide/nncf/code/qat_torch.py :language: python :fragment: [save_checkpoint]
To restore the model from checkpoint, use the following API:
.. doxygensnippet:: docs/optimization_guide/nncf/code/qat_torch.py :language: python :fragment: [load_checkpoint]
The model can be converted into the OpenVINO Intermediate Representation (IR) if needed, compiled, and run with OpenVINO without any additional steps.
.. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_torch.py :language: python :fragment: [inference]
For more details, see the corresponding :doc:`documentation <../../running-inference>`.