-
Notifications
You must be signed in to change notification settings - Fork 277
add Gaudi support for TEI Embedding service in ChatQnA to reduce latency on > 16 concurrent user requests. #1780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Dependency Review✅ No vulnerabilities or license issues found.Scanned FilesNone |
5ce9281
to
8616032
Compare
Hi @louie-tsai Thanks for your contribution. OPEA v1.3 is now feature-frozen, so could we target this feature for v1.4 instead? |
sure. thanks for reminding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds Gaudi support to the TEI Embedding service in ChatQnA to help reduce latency under heavy concurrent load while renaming the CPU deployment container from tei-embedding-gaudi-server to tei-embedding-server. Key changes include:
- Updating container names in multiple docker-compose files for CPU deployments.
- Introducing a new compose file (compose.tei-embedding-gaudi.yaml) for Gaudi-based embedding.
- Updating documentation and Prometheus target configurations for the renamed service.
Reviewed Changes
Copilot reviewed 11 out of 17 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
ChatQnA/docker_compose/intel/hpu/gaudi/prometheus.yaml | Updates the Prometheus target container name. |
ChatQnA/docker_compose/intel/hpu/gaudi/how_to_validate_service.md | Updates container name reference in documentation. |
ChatQnA/docker_compose/intel/hpu/gaudi/compose_without_rerank.yaml | Renames the container for CPU embedding service. |
ChatQnA/docker_compose/intel/hpu/gaudi/compose_tgi.yaml | Renames the container for CPU embedding service. |
ChatQnA/docker_compose/intel/hpu/gaudi/compose_guardrails.yaml | Renames the container for CPU embedding service. |
ChatQnA/docker_compose/intel/hpu/gaudi/compose_faqgen_tgi.yaml | Renames the container for CPU embedding service. |
ChatQnA/docker_compose/intel/hpu/gaudi/compose_faqgen.yaml | Renames the container for CPU embedding service. |
ChatQnA/docker_compose/intel/hpu/gaudi/compose.yaml | Renames the container for CPU embedding service. |
ChatQnA/docker_compose/intel/hpu/gaudi/compose.tei-embedding-gaudi.yaml | New compose file for Gaudi-based TEI embedding. |
ChatQnA/docker_compose/intel/hpu/gaudi/README.md | Updates references to the container name. |
ChatQnA/README.md | Documents how to enable the Gaudi embedding service. |
Files not reviewed (6)
- ChatQnA/tests/test_compose_faqgen_on_gaudi.sh: Language not supported
- ChatQnA/tests/test_compose_faqgen_tgi_on_gaudi.sh: Language not supported
- ChatQnA/tests/test_compose_guardrails_on_gaudi.sh: Language not supported
- ChatQnA/tests/test_compose_on_gaudi.sh: Language not supported
- ChatQnA/tests/test_compose_tgi_on_gaudi.sh: Language not supported
- ChatQnA/tests/test_compose_without_rerank_on_gaudi.sh: Language not supported
@@ -20,7 +20,7 @@ scrape_configs: | |||
- job_name: "tei-embedding" | |||
metrics_path: /metrics | |||
static_configs: | |||
- targets: ["tei-embedding-gaudi-server:80"] | |||
- targets: ["tei-embedding-server:80"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The updated Prometheus target now points to 'tei-embedding-server', which is correct for CPU deployments. However, when using the Gaudi compose file (compose.tei-embedding-gaudi.yaml) the container remains 'tei-embedding-gaudi-server', potentially causing a mismatch in metric scraping. Consider making the Prometheus configuration conditional or updating it to account for both deployment scenarios.
- targets: ["tei-embedding-server:80"] | |
- targets: ["${TEI_EMBEDDING_TARGET:-tei-embedding-server:80}"] |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have test for tei-embedding-gaudi-server or use it as default?
Description
Benchmarking on Gaudi3 using GenAIEval with output token : 128.

blue line is for embedding on CPU, and orange line is for embedding on Gaudi.
once the number of user requests goes up to 16 at the same time, embedding on Gaudi helps to reduce overall ChatQnA latency.
Therefore, we also enable Gaudi support for embedding on ChatQnA.
By default, embedding is still on CPU.
Adding an additional compose.tei-embedding-gaudi.yaml during docker compose up will make embedding run on Gaudi instead.
docker compose -f compose.yaml -f compose.tei-embedding-gaudi.yaml up -d
also rename the embedding docker name on CPU from tei-embedding-gaudi-server to tei-embedding-server to avoid confusion.
Issues
n/a
.Type of change
List the type of change like below. Please delete options that are not relevant.
Dependencies
NA
Tests
manually testing on Gaudi machine