add Gaudi support for TEI Embedding service in ChatQnA to reduce latency on > 16 concurrent user requests. #1780

louie-tsai · 2025-04-09T01:54:05Z

Description

Benchmarking on Gaudi3 using GenAIEval with output token : 128.
blue line is for embedding on CPU, and orange line is for embedding on Gaudi.
once the number of user requests goes up to 16 at the same time, embedding on Gaudi helps to reduce overall ChatQnA latency.

Therefore, we also enable Gaudi support for embedding on ChatQnA.
By default, embedding is still on CPU.
Adding an additional compose.tei-embedding-gaudi.yaml during docker compose up will make embedding run on Gaudi instead.
docker compose -f compose.yaml -f compose.tei-embedding-gaudi.yaml up -d

also rename the embedding docker name on CPU from tei-embedding-gaudi-server to tei-embedding-server to avoid confusion.

Issues

n/a.

Type of change

List the type of change like below. Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds new functionality)
Breaking change (fix or feature that would break existing design and interface)
Others (enhancement, documentation, validation, etc.)

Dependencies

NA

Tests

manually testing on Gaudi machine

Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

github-actions · 2025-04-09T01:54:20Z

Dependency Review

✅ No vulnerabilities or license issues found.

Scanned Files

None

joshuayao · 2025-04-09T06:37:55Z

Hi @louie-tsai Thanks for your contribution. OPEA v1.3 is now feature-frozen, so could we target this feature for v1.4 instead?

louie-tsai · 2025-04-09T14:41:49Z

Hi @louie-tsai Thanks for your contribution. OPEA v1.3 is now feature-frozen, so could we target this feature for v1.4 instead?

sure. thanks for reminding.

Copilot

Pull Request Overview

This PR adds Gaudi support to the TEI Embedding service in ChatQnA to help reduce latency under heavy concurrent load while renaming the CPU deployment container from tei-embedding-gaudi-server to tei-embedding-server. Key changes include:

Updating container names in multiple docker-compose files for CPU deployments.
Introducing a new compose file (compose.tei-embedding-gaudi.yaml) for Gaudi-based embedding.
Updating documentation and Prometheus target configurations for the renamed service.

Reviewed Changes

Copilot reviewed 11 out of 17 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
ChatQnA/docker_compose/intel/hpu/gaudi/prometheus.yaml	Updates the Prometheus target container name.
ChatQnA/docker_compose/intel/hpu/gaudi/how_to_validate_service.md	Updates container name reference in documentation.
ChatQnA/docker_compose/intel/hpu/gaudi/compose_without_rerank.yaml	Renames the container for CPU embedding service.
ChatQnA/docker_compose/intel/hpu/gaudi/compose_tgi.yaml	Renames the container for CPU embedding service.
ChatQnA/docker_compose/intel/hpu/gaudi/compose_guardrails.yaml	Renames the container for CPU embedding service.
ChatQnA/docker_compose/intel/hpu/gaudi/compose_faqgen_tgi.yaml	Renames the container for CPU embedding service.
ChatQnA/docker_compose/intel/hpu/gaudi/compose_faqgen.yaml	Renames the container for CPU embedding service.
ChatQnA/docker_compose/intel/hpu/gaudi/compose.yaml	Renames the container for CPU embedding service.
ChatQnA/docker_compose/intel/hpu/gaudi/compose.tei-embedding-gaudi.yaml	New compose file for Gaudi-based TEI embedding.
ChatQnA/docker_compose/intel/hpu/gaudi/README.md	Updates references to the container name.
ChatQnA/README.md	Documents how to enable the Gaudi embedding service.

Files not reviewed (6)

ChatQnA/tests/test_compose_faqgen_on_gaudi.sh: Language not supported
ChatQnA/tests/test_compose_faqgen_tgi_on_gaudi.sh: Language not supported
ChatQnA/tests/test_compose_guardrails_on_gaudi.sh: Language not supported
ChatQnA/tests/test_compose_on_gaudi.sh: Language not supported
ChatQnA/tests/test_compose_tgi_on_gaudi.sh: Language not supported
ChatQnA/tests/test_compose_without_rerank_on_gaudi.sh: Language not supported

Copilot · 2025-04-22T08:34:51Z

ChatQnA/docker_compose/intel/hpu/gaudi/prometheus.yaml

@@ -20,7 +20,7 @@ scrape_configs:
  - job_name: "tei-embedding"
    metrics_path: /metrics
    static_configs:
-      - targets: ["tei-embedding-gaudi-server:80"]
+      - targets: ["tei-embedding-server:80"]


The updated Prometheus target now points to 'tei-embedding-server', which is correct for CPU deployments. However, when using the Gaudi compose file (compose.tei-embedding-gaudi.yaml) the container remains 'tei-embedding-gaudi-server', potentially causing a mismatch in metric scraping. Consider making the Prometheus configuration conditional or updating it to account for both deployment scenarios.

Suggested change

- targets: ["tei-embedding-server:80"]

- targets: ["${TEI_EMBEDDING_TARGET:-tei-embedding-server:80}"]

yinghu5

Do you have test for tei-embedding-gaudi-server or use it as default?

louie-tsai requested review from lvliang-intel and letonghan as code owners April 9, 2025 01:54

add Gaudi support for TEI Embedding service in ChatQnA

8616032

Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

louie-tsai force-pushed the chatqna_service_on_both_hpu_cpu branch 2 times, most recently from 5ce9281 to 8616032 Compare April 9, 2025 01:55

louie-tsai requested review from xiguiw and yinghu5 April 9, 2025 01:56

louie-tsai added this to the v1.3 milestone Apr 9, 2025

louie-tsai modified the milestones: v1.3, v1.4 Apr 9, 2025

ashahba added the r1.4 label Apr 9, 2025

joshuayao added this to OPEA Apr 10, 2025

joshuayao moved this to Ready in OPEA Apr 10, 2025

joshuayao added the feature New feature or request label Apr 10, 2025

louie-tsai mentioned this pull request Apr 14, 2025

General Telemetry Doc and Telemetry for ChatQnA and AgentQnA opea-project/docs#350

Merged

yinghu5 requested a review from Copilot April 22, 2025 08:34

Copilot AI reviewed Apr 22, 2025

View reviewed changes

yinghu5 reviewed Apr 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add Gaudi support for TEI Embedding service in ChatQnA to reduce latency on > 16 concurrent user requests. #1780

add Gaudi support for TEI Embedding service in ChatQnA to reduce latency on > 16 concurrent user requests. #1780

louie-tsai commented Apr 9, 2025 •

edited

Loading

github-actions bot commented Apr 9, 2025 •

edited

Loading

joshuayao commented Apr 9, 2025

louie-tsai commented Apr 9, 2025

Copilot AI left a comment

Copilot AI Apr 22, 2025

yinghu5 left a comment

	- targets: ["tei-embedding-server:80"]
	- targets: ["${TEI_EMBEDDING_TARGET:-tei-embedding-server:80}"]

add Gaudi support for TEI Embedding service in ChatQnA to reduce latency on > 16 concurrent user requests. #1780

Are you sure you want to change the base?

add Gaudi support for TEI Embedding service in ChatQnA to reduce latency on > 16 concurrent user requests. #1780

Conversation

louie-tsai commented Apr 9, 2025 • edited Loading

Description

Issues

Type of change

Dependencies

Tests

github-actions bot commented Apr 9, 2025 • edited Loading

Dependency Review

Scanned Files

joshuayao commented Apr 9, 2025

louie-tsai commented Apr 9, 2025

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Copilot AI Apr 22, 2025

Choose a reason for hiding this comment

yinghu5 left a comment

Choose a reason for hiding this comment

louie-tsai commented Apr 9, 2025 •

edited

Loading

github-actions bot commented Apr 9, 2025 •

edited

Loading