Skip to content

add Gaudi support for TEI Embedding service in ChatQnA to reduce latency on > 16 concurrent user requests. #1780

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions ChatQnA/README.md
Original file line number Diff line number Diff line change
@@ -261,6 +261,13 @@ cd GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi/
docker compose -f compose.yaml -f compose.telemetry.yaml up -d
```

To enable TEI Embedding on Gaudi for many concurrent user requests, compose.tei-embedding-gaudi.yaml file need to be merged along with default compose.yaml file.

```bash
cd GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi/
docker compose -f compose.yaml -f compose.tei-embedding-gaudi.yaml up -d
```

Refer to the [Gaudi Guide](./docker_compose/intel/hpu/gaudi/README.md) to build docker images from source.

### Deploy ChatQnA on Xeon
6 changes: 3 additions & 3 deletions ChatQnA/docker_compose/intel/hpu/gaudi/README.md
Original file line number Diff line number Diff line change
@@ -66,7 +66,7 @@ The ChatQnA docker images should automatically be downloaded from the `OPEA regi
✔ Network gaudi_default Created 0.1s
✔ Container tei-reranking-gaudi-server Started 0.7s
✔ Container vllm-gaudi-server Started 0.7s
✔ Container tei-embedding-gaudi-server Started 0.3s
✔ Container tei-embedding-server Started 0.3s
✔ Container redis-vector-db Started 0.6s
✔ Container retriever-redis-server Started 1.1s
✔ Container dataprep-redis-server Started 1.1s
@@ -95,7 +95,7 @@ d560c232b120 opea/retriever:latest
a1d7ca2d3787 ghcr.io/huggingface/tei-gaudi:1.5.0 "text-embeddings-rou…" 2 minutes ago Up 2 minutes 0.0.0.0:8808->80/tcp, [::]:8808->80/tcp tei-reranking-gaudi-server
9a9f3fd4fd4c opea/vllm-gaudi:latest "python3 -m vllm.ent…" 2 minutes ago Exited (1) 2 minutes ago vllm-gaudi-server
1ab9bbdf5182 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 2 minutes ago Up 2 minutes 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db
9ee0789d819e ghcr.io/huggingface/text-embeddings-inference:cpu-1.6 "text-embeddings-rou…" 2 minutes ago Up 2 minutes 0.0.0.0:8090->80/tcp, [::]:8090->80/tcp tei-embedding-gaudi-server
9ee0789d819e ghcr.io/huggingface/text-embeddings-inference:cpu-1.6 "text-embeddings-rou…" 2 minutes ago Up 2 minutes 0.0.0.0:8090->80/tcp, [::]:8090->80/tcp tei-embedding-server
```

### Test the Pipeline
@@ -129,7 +129,7 @@ docker compose -f compose.yaml down
✔ Container vllm-gaudi-server Removed 0.0s
✔ Container retriever-redis-server Removed 10.4s
✔ Container tei-reranking-gaudi-server Removed 2.0s
✔ Container tei-embedding-gaudi-server Removed 1.2s
✔ Container tei-embedding-server Removed 1.2s
✔ Container redis-vector-db Removed 0.4s
✔ Network gaudi_default Removed 0.4s
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

services:
tei-embedding-service:
image: ghcr.io/huggingface/tei-gaudi:1.5.0
container_name: tei-embedding-gaudi-server
ports:
- "8090:80"
volumes:
- "${MODEL_CACHE:-./data}:/data"
shm_size: 1g
runtime: habana
cap_add:
- SYS_NICE
ipc: host
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
HF_HUB_DISABLE_PROGRESS_BARS: 1
HF_HUB_ENABLE_HF_TRANSFER: 0
HABANA_VISIBLE_DEVICES: all
OMPI_MCA_btl_vader_single_copy_mechanism: none
MAX_WARMUP_SEQUENCE_LENGTH: 512
command: --model-id ${EMBEDDING_MODEL_ID} --auto-truncate
2 changes: 1 addition & 1 deletion ChatQnA/docker_compose/intel/hpu/gaudi/compose.yaml
Original file line number Diff line number Diff line change
@@ -34,7 +34,7 @@ services:
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
tei-embedding-service:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.6
container_name: tei-embedding-gaudi-server
container_name: tei-embedding-server
ports:
- "8090:80"
volumes:
Original file line number Diff line number Diff line change
@@ -28,7 +28,7 @@ services:
LOGFLAG: ${LOGFLAG}
tei-embedding-service:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.6
container_name: tei-embedding-gaudi-server
container_name: tei-embedding-server
ports:
- "8090:80"
volumes:
Original file line number Diff line number Diff line change
@@ -28,7 +28,7 @@ services:
LOGFLAG: ${LOGFLAG}
tei-embedding-service:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.6
container_name: tei-embedding-gaudi-server
container_name: tei-embedding-server
ports:
- "8090:80"
volumes:
Original file line number Diff line number Diff line change
@@ -66,7 +66,7 @@ services:
restart: unless-stopped
tei-embedding-service:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.6
container_name: tei-embedding-gaudi-server
container_name: tei-embedding-server
ports:
- "8090:80"
volumes:
2 changes: 1 addition & 1 deletion ChatQnA/docker_compose/intel/hpu/gaudi/compose_tgi.yaml
Original file line number Diff line number Diff line change
@@ -27,7 +27,7 @@ services:
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
tei-embedding-service:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.6
container_name: tei-embedding-gaudi-server
container_name: tei-embedding-server
ports:
- "8090:80"
volumes:
Original file line number Diff line number Diff line change
@@ -27,7 +27,7 @@ services:
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
tei-embedding-service:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.6
container_name: tei-embedding-gaudi-server
container_name: tei-embedding-server
ports:
- "8090:80"
volumes:
Original file line number Diff line number Diff line change
@@ -48,7 +48,7 @@ f810f3b4d329 opea/embedding:latest "python embed
2fa17d84605f opea/dataprep:latest "python prepare_doc_…" 2 minutes ago Up 2 minutes 0.0.0.0:6007->6007/tcp, :::6007->5000/tcp dataprep-redis-server
69e1fb59e92c opea/retriever:latest "/home/user/comps/re…" 2 minutes ago Up 2 minutes 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server
313b9d14928a opea/reranking-tei:latest "python reranking_te…" 2 minutes ago Up 2 minutes 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-gaudi-server
174bd43fa6b5 ghcr.io/huggingface/tei-gaudi:1.5.0 "text-embeddings-rou…" 2 minutes ago Up 2 minutes 0.0.0.0:8090->80/tcp, :::8090->80/tcp tei-embedding-gaudi-server
174bd43fa6b5 ghcr.io/huggingface/tei-gaudi:1.5.0 "text-embeddings-rou…" 2 minutes ago Up 2 minutes 0.0.0.0:8090->80/tcp, :::8090->80/tcp tei-embedding-server
05c40b636239 ghcr.io/huggingface/tgi-gaudi:2.3.1 "text-generation-lau…" 2 minutes ago Exited (1) About a minute ago tgi-gaudi-server
74084469aa33 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 2 minutes ago Up 2 minutes 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db
88399dbc9e43 ghcr.io/huggingface/text-embeddings-inference:cpu-1.6 "text-embeddings-rou…" 2 minutes ago Up 2 minutes 0.0.0.0:8808->80/tcp, :::8808->80/tcp tei-reranking-gaudi-server
2 changes: 1 addition & 1 deletion ChatQnA/docker_compose/intel/hpu/gaudi/prometheus.yaml
Original file line number Diff line number Diff line change
@@ -20,7 +20,7 @@ scrape_configs:
- job_name: "tei-embedding"
metrics_path: /metrics
static_configs:
- targets: ["tei-embedding-gaudi-server:80"]
- targets: ["tei-embedding-server:80"]
Copy link
Preview

Copilot AI Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updated Prometheus target now points to 'tei-embedding-server', which is correct for CPU deployments. However, when using the Gaudi compose file (compose.tei-embedding-gaudi.yaml) the container remains 'tei-embedding-gaudi-server', potentially causing a mismatch in metric scraping. Consider making the Prometheus configuration conditional or updating it to account for both deployment scenarios.

Suggested change
- targets: ["tei-embedding-server:80"]
- targets: ["${TEI_EMBEDDING_TARGET:-tei-embedding-server:80}"]

Copilot uses AI. Check for mistakes.

- job_name: "tei-reranking"
metrics_path: /metrics
static_configs:
2 changes: 1 addition & 1 deletion ChatQnA/tests/test_compose_faqgen_on_gaudi.sh
Original file line number Diff line number Diff line change
@@ -120,7 +120,7 @@ function validate_microservices() {
"${ip_address}:8090/embed" \
"[[" \
"tei-embedding" \
"tei-embedding-gaudi-server" \
"tei-embedding-server" \
'{"inputs":"What is Deep Learning?"}'

sleep 1m # retrieval can't curl as expected, try to wait for more time
2 changes: 1 addition & 1 deletion ChatQnA/tests/test_compose_faqgen_tgi_on_gaudi.sh
Original file line number Diff line number Diff line change
@@ -116,7 +116,7 @@ function validate_microservices() {
"${ip_address}:8090/embed" \
"[[" \
"tei-embedding" \
"tei-embedding-gaudi-server" \
"tei-embedding-server" \
'{"inputs":"What is Deep Learning?"}'

sleep 1m # retrieval can't curl as expected, try to wait for more time
2 changes: 1 addition & 1 deletion ChatQnA/tests/test_compose_guardrails_on_gaudi.sh
Original file line number Diff line number Diff line change
@@ -113,7 +113,7 @@ function validate_microservices() {
"${ip_address}:8090/embed" \
"[[" \
"tei-embedding" \
"tei-embedding-gaudi-server" \
"tei-embedding-server" \
'{"inputs":"What is Deep Learning?"}'

sleep 1m # retrieval can't curl as expected, try to wait for more time
2 changes: 1 addition & 1 deletion ChatQnA/tests/test_compose_on_gaudi.sh
Original file line number Diff line number Diff line change
@@ -100,7 +100,7 @@ function validate_microservices() {
"${ip_address}:8090/embed" \
"\[\[" \
"tei-embedding" \
"tei-embedding-gaudi-server" \
"tei-embedding-server" \
'{"inputs":"What is Deep Learning?"}'
echo "::endgroup::"

2 changes: 1 addition & 1 deletion ChatQnA/tests/test_compose_tgi_on_gaudi.sh
Original file line number Diff line number Diff line change
@@ -110,7 +110,7 @@ function validate_microservices() {
"${ip_address}:8090/embed" \
"[[" \
"tei-embedding" \
"tei-embedding-gaudi-server" \
"tei-embedding-server" \
'{"inputs":"What is Deep Learning?"}'

sleep 1m # retrieval can't curl as expected, try to wait for more time
2 changes: 1 addition & 1 deletion ChatQnA/tests/test_compose_without_rerank_on_gaudi.sh
Original file line number Diff line number Diff line change
@@ -109,7 +109,7 @@ function validate_microservices() {
"${ip_address}:8090/embed" \
"[[" \
"tei-embedding" \
"tei-embedding-gaudi-server" \
"tei-embedding-server" \
'{"inputs":"What is Deep Learning?"}'

sleep 1m # retrieval can't curl as expected, try to wait for more time