Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NPU][LNL] Run LLM inference on LNL NPU is very very slow #1563

Open
johnysh opened this issue Jan 16, 2025 · 0 comments
Open

[NPU][LNL] Run LLM inference on LNL NPU is very very slow #1563

johnysh opened this issue Jan 16, 2025 · 0 comments

Comments

@johnysh
Copy link

johnysh commented Jan 16, 2025

[OS] Win11
[Platform]: Intel(R) Core(TM) Ultra 7 258V 2.20 GHz
[RAM]: 32GB
[NPU driver]: 32.0.100.3104
ENV:

https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide/genai-guide-npu.html

pip install nncf==2.12 onnx==1.16.1 optimum-intel==1.19.0
pip install openvino==2024.6 openvino-tokenizers==2024.6 openvino-genai==2024.6

PIP LIST:
openvino 2024.6.0
openvino-genai 2024.6.0.0
openvino-telemetry 2024.5.0
openvino-tokenizers 2024.6.0.0
optimum 1.23.3
optimum-intel 1.19.0

Code:
https://github.com/openvinotoolkit/openvino.genai

CMD:
optimum-cli export openvino -m TheBloke/Llama-2-7B-Chat-GPTQ Llama-2-7B-Chat-GPTQ

python\benchmark_genai>python ./benchmark_genai.py -m Llama-2-7B-Chat-GPTQ -d NPU

Result:

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant