Skip to content

[PT FE] Add ipex to GPTQ supported quantization types #30042

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

notsyncing
Copy link

Details:

Hello, I'm trying converting a Qwen2.5 model loaded with gptqmodel, and it complains:

INFO  ENV: Auto setting PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True' for memory saving.                                                       
INFO  ENV: Auto setting CUDA_DEVICE_ORDER=PCI_BUS_ID for correctness.                                                                               
`low_cpu_mem_usage` was None, now default to True since model is quantized.
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.
INFO   Kernel: Auto-selection: adding candidate `IPEXQuantLinear`                                                                                   
`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`.
2025-04-09 20:31:41,266 - openvino.frontend.pytorch.ts_decoder - WARNING - Failed patching of AutoGPTQ model. Error message:
Tracing of the model will likely be unsuccessful or incorrect
Traceback (most recent call last):
  File "/var/home/sfc/Projects/azarrot-py312/.venv/lib/python3.12/site-packages/openvino/frontend/pytorch/ts_decoder.py", line 150, in _get_scripted_model
    quantized.patch_quantized(pt_module)
  File "/var/home/sfc/Projects/azarrot-py312/.venv/lib/python3.12/site-packages/openvino/frontend/pytorch/quantized.py", line 57, in patch_quantized
    gptq.patch_model(model)  # type: ignore
    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/home/sfc/Projects/azarrot-py312/.venv/lib/python3.12/site-packages/openvino/frontend/pytorch/gptq.py", line 120, in patch_model
    raise ValueError(f'Unsupported QUANT_TYPE == {m.QUANT_TYPE} is discovered for '
ValueError: Unsupported QUANT_TYPE == ipex is discovered for AutoGPTQ model, only the following types are supported: ['triton', 'exllama', 'exllamav2', 'cuda-old']

It turns out that gptqmodel use IPEXQuantLinear, which has QUANT_TYPE = "ipex", so it complains about that.
I added "ipex" to supported_quant_types in src/bindings/python/src/openvino/frontend/pytorch/gptq.py, and it works.
Tested on Qwen 2.5 model.

Tickets:

None

@notsyncing notsyncing requested a review from a team as a code owner April 9, 2025 12:44
@github-actions github-actions bot added category: Python API OpenVINO Python bindings category: PyTorch FE OpenVINO PyTorch Frontend labels Apr 9, 2025
@sys-openvino-ci sys-openvino-ci added the ExternalPR External contributor label Apr 9, 2025
@mlukasze mlukasze requested review from a team, rkazants and cavusmustafa and removed request for a team April 9, 2025 13:08
@mvafin
Copy link
Contributor

mvafin commented Apr 9, 2025

@notsyncing Did you verify the accuracy of the model?

@notsyncing
Copy link
Author

@notsyncing Did you verify the accuracy of the model?

Not yet, just some conversations with it. Is there any guide to do this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Python API OpenVINO Python bindings category: PyTorch FE OpenVINO PyTorch Frontend ExternalPR External contributor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants