Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NPUW: LLMInferRequest - not copy kvcache for last generated token #28489

Conversation

TolyaTalamanov
Copy link
Contributor

Details:

  • item1
  • ...

Tickets:

  • ticket-id

@TolyaTalamanov TolyaTalamanov requested review from a team as code owners January 16, 2025 13:36
@TolyaTalamanov TolyaTalamanov requested review from kblaszczak-intel and removed request for a team January 16, 2025 13:36
@github-actions github-actions bot added category: inference OpenVINO Runtime library - Inference category: Python API OpenVINO Python bindings category: docs OpenVINO documentation category: CPP API OpenVINO CPP API bindings category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Jan 16, 2025
@TolyaTalamanov TolyaTalamanov force-pushed the at/npuw-llm-pipeline-stop-when-kvcache-is-full branch from 935550f to 11fb7d4 Compare January 16, 2025 13:37
@github-actions github-actions bot removed category: inference OpenVINO Runtime library - Inference category: Python API OpenVINO Python bindings category: docs OpenVINO documentation category: CPP API OpenVINO CPP API bindings labels Jan 16, 2025
@TolyaTalamanov TolyaTalamanov added this to the 2025.0 milestone Jan 16, 2025
@TolyaTalamanov TolyaTalamanov added this pull request to the merge queue Jan 16, 2025
Merged via the queue into openvinotoolkit:master with commit 9f0a52b Jan 16, 2025
161 checks passed
@TolyaTalamanov TolyaTalamanov deleted the at/npuw-llm-pipeline-stop-when-kvcache-is-full branch January 16, 2025 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin Code Freeze
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants