Skip to content

add Gaudi support for TEI Embedding service in ChatQnA to reduce latency on > 16 concurrent user requests. #1780

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

louie-tsai
Copy link
Collaborator

@louie-tsai louie-tsai commented Apr 9, 2025

Description

Benchmarking on Gaudi3 using GenAIEval with output token : 128.
blue line is for embedding on CPU, and orange line is for embedding on Gaudi.
once the number of user requests goes up to 16 at the same time, embedding on Gaudi helps to reduce overall ChatQnA latency.
image

Therefore, we also enable Gaudi support for embedding on ChatQnA.
By default, embedding is still on CPU.
Adding an additional compose.tei-embedding-gaudi.yaml during docker compose up will make embedding run on Gaudi instead.
docker compose -f compose.yaml -f compose.tei-embedding-gaudi.yaml up -d

also rename the embedding docker name on CPU from tei-embedding-gaudi-server to tei-embedding-server to avoid confusion.

Issues

n/a.

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)
  • Others (enhancement, documentation, validation, etc.)

Dependencies

NA

Tests

manually testing on Gaudi machine

Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Copy link

github-actions bot commented Apr 9, 2025

Dependency Review

✅ No vulnerabilities or license issues found.

Scanned Files

None

@louie-tsai louie-tsai force-pushed the chatqna_service_on_both_hpu_cpu branch 2 times, most recently from 5ce9281 to 8616032 Compare April 9, 2025 01:55
@louie-tsai louie-tsai requested review from xiguiw and yinghu5 April 9, 2025 01:56
@louie-tsai louie-tsai added this to the v1.3 milestone Apr 9, 2025
@joshuayao
Copy link
Collaborator

Hi @louie-tsai Thanks for your contribution. OPEA v1.3 is now feature-frozen, so could we target this feature for v1.4 instead?

@louie-tsai louie-tsai modified the milestones: v1.3, v1.4 Apr 9, 2025
@louie-tsai
Copy link
Collaborator Author

Hi @louie-tsai Thanks for your contribution. OPEA v1.3 is now feature-frozen, so could we target this feature for v1.4 instead?

sure. thanks for reminding.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds Gaudi support to the TEI Embedding service in ChatQnA to help reduce latency under heavy concurrent load while renaming the CPU deployment container from tei-embedding-gaudi-server to tei-embedding-server. Key changes include:

  • Updating container names in multiple docker-compose files for CPU deployments.
  • Introducing a new compose file (compose.tei-embedding-gaudi.yaml) for Gaudi-based embedding.
  • Updating documentation and Prometheus target configurations for the renamed service.

Reviewed Changes

Copilot reviewed 11 out of 17 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
ChatQnA/docker_compose/intel/hpu/gaudi/prometheus.yaml Updates the Prometheus target container name.
ChatQnA/docker_compose/intel/hpu/gaudi/how_to_validate_service.md Updates container name reference in documentation.
ChatQnA/docker_compose/intel/hpu/gaudi/compose_without_rerank.yaml Renames the container for CPU embedding service.
ChatQnA/docker_compose/intel/hpu/gaudi/compose_tgi.yaml Renames the container for CPU embedding service.
ChatQnA/docker_compose/intel/hpu/gaudi/compose_guardrails.yaml Renames the container for CPU embedding service.
ChatQnA/docker_compose/intel/hpu/gaudi/compose_faqgen_tgi.yaml Renames the container for CPU embedding service.
ChatQnA/docker_compose/intel/hpu/gaudi/compose_faqgen.yaml Renames the container for CPU embedding service.
ChatQnA/docker_compose/intel/hpu/gaudi/compose.yaml Renames the container for CPU embedding service.
ChatQnA/docker_compose/intel/hpu/gaudi/compose.tei-embedding-gaudi.yaml New compose file for Gaudi-based TEI embedding.
ChatQnA/docker_compose/intel/hpu/gaudi/README.md Updates references to the container name.
ChatQnA/README.md Documents how to enable the Gaudi embedding service.
Files not reviewed (6)
  • ChatQnA/tests/test_compose_faqgen_on_gaudi.sh: Language not supported
  • ChatQnA/tests/test_compose_faqgen_tgi_on_gaudi.sh: Language not supported
  • ChatQnA/tests/test_compose_guardrails_on_gaudi.sh: Language not supported
  • ChatQnA/tests/test_compose_on_gaudi.sh: Language not supported
  • ChatQnA/tests/test_compose_tgi_on_gaudi.sh: Language not supported
  • ChatQnA/tests/test_compose_without_rerank_on_gaudi.sh: Language not supported

@@ -20,7 +20,7 @@ scrape_configs:
- job_name: "tei-embedding"
metrics_path: /metrics
static_configs:
- targets: ["tei-embedding-gaudi-server:80"]
- targets: ["tei-embedding-server:80"]
Copy link
Preview

Copilot AI Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updated Prometheus target now points to 'tei-embedding-server', which is correct for CPU deployments. However, when using the Gaudi compose file (compose.tei-embedding-gaudi.yaml) the container remains 'tei-embedding-gaudi-server', potentially causing a mismatch in metric scraping. Consider making the Prometheus configuration conditional or updating it to account for both deployment scenarios.

Suggested change
- targets: ["tei-embedding-server:80"]
- targets: ["${TEI_EMBEDDING_TARGET:-tei-embedding-server:80}"]

Copilot uses AI. Check for mistakes.

Copy link
Collaborator

@yinghu5 yinghu5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have test for tei-embedding-gaudi-server or use it as default?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request r1.4
Projects
Status: Ready
Development

Successfully merging this pull request may close these issues.

4 participants