Skip to content

Commit 5ce9281

Browse files
committed
add Gaudi support for TEI Embedding service in ChatQnA
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
1 parent 7b7728c commit 5ce9281

17 files changed

+50
-17
lines changed

ChatQnA/README.md

+7
Original file line numberDiff line numberDiff line change
@@ -261,6 +261,13 @@ cd GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi/
261261
docker compose -f compose.yaml -f compose.telemetry.yaml up -d
262262
```
263263

264+
To enable TEI Embedding on Gaudi for many concurrent user requests, compose.tei-embedding-gaudi.yaml file need to be merged along with default compose.yaml file.
265+
266+
```bash
267+
cd GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi/
268+
docker compose -f compose.yaml -f compose.tei-embedding-gaudi.yaml up -d
269+
```
270+
264271
Refer to the [Gaudi Guide](./docker_compose/intel/hpu/gaudi/README.md) to build docker images from source.
265272

266273
### Deploy ChatQnA on Xeon

ChatQnA/docker_compose/intel/hpu/gaudi/README.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ The ChatQnA docker images should automatically be downloaded from the `OPEA regi
6666
✔ Network gaudi_default Created 0.1s
6767
✔ Container tei-reranking-gaudi-server Started 0.7s
6868
✔ Container vllm-gaudi-server Started 0.7s
69-
✔ Container tei-embedding-gaudi-server Started 0.3s
69+
✔ Container tei-embedding-server Started 0.3s
7070
✔ Container redis-vector-db Started 0.6s
7171
✔ Container retriever-redis-server Started 1.1s
7272
✔ Container dataprep-redis-server Started 1.1s
@@ -95,7 +95,7 @@ d560c232b120 opea/retriever:latest
9595
a1d7ca2d3787 ghcr.io/huggingface/tei-gaudi:1.5.0 "text-embeddings-rou…" 2 minutes ago Up 2 minutes 0.0.0.0:8808->80/tcp, [::]:8808->80/tcp tei-reranking-gaudi-server
9696
9a9f3fd4fd4c opea/vllm-gaudi:latest "python3 -m vllm.ent…" 2 minutes ago Exited (1) 2 minutes ago vllm-gaudi-server
9797
1ab9bbdf5182 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 2 minutes ago Up 2 minutes 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db
98-
9ee0789d819e ghcr.io/huggingface/text-embeddings-inference:cpu-1.6 "text-embeddings-rou…" 2 minutes ago Up 2 minutes 0.0.0.0:8090->80/tcp, [::]:8090->80/tcp tei-embedding-gaudi-server
98+
9ee0789d819e ghcr.io/huggingface/text-embeddings-inference:cpu-1.6 "text-embeddings-rou…" 2 minutes ago Up 2 minutes 0.0.0.0:8090->80/tcp, [::]:8090->80/tcp tei-embedding-server
9999
```
100100

101101
### Test the Pipeline
@@ -129,7 +129,7 @@ docker compose -f compose.yaml down
129129
✔ Container vllm-gaudi-server Removed 0.0s
130130
✔ Container retriever-redis-server Removed 10.4s
131131
✔ Container tei-reranking-gaudi-server Removed 2.0s
132-
✔ Container tei-embedding-gaudi-server Removed 1.2s
132+
✔ Container tei-embedding-server Removed 1.2s
133133
✔ Container redis-vector-db Removed 0.4s
134134
✔ Network gaudi_default Removed 0.4s
135135
```
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# Copyright (C) 2024 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
services:
5+
tei-embedding-service:
6+
image: ghcr.io/huggingface/tei-gaudi:1.5.0
7+
container_name: tei-embedding-gaudi-server
8+
ports:
9+
- "8090:80"
10+
volumes:
11+
- "${MODEL_CACHE:-./data}:/data"
12+
shm_size: 1g
13+
runtime: habana
14+
cap_add:
15+
- SYS_NICE
16+
ipc: host
17+
environment:
18+
no_proxy: ${no_proxy}
19+
http_proxy: ${http_proxy}
20+
https_proxy: ${https_proxy}
21+
HF_HUB_DISABLE_PROGRESS_BARS: 1
22+
HF_HUB_ENABLE_HF_TRANSFER: 0
23+
HABANA_VISIBLE_DEVICES: all
24+
OMPI_MCA_btl_vader_single_copy_mechanism: none
25+
MAX_WARMUP_SEQUENCE_LENGTH: 512
26+
command: --model-id ${EMBEDDING_MODEL_ID} --auto-truncate

ChatQnA/docker_compose/intel/hpu/gaudi/compose.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ services:
3434
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
3535
tei-embedding-service:
3636
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.6
37-
container_name: tei-embedding-gaudi-server
37+
container_name: tei-embedding-server
3838
ports:
3939
- "8090:80"
4040
volumes:

ChatQnA/docker_compose/intel/hpu/gaudi/compose_faqgen.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ services:
2828
LOGFLAG: ${LOGFLAG}
2929
tei-embedding-service:
3030
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.6
31-
container_name: tei-embedding-gaudi-server
31+
container_name: tei-embedding-server
3232
ports:
3333
- "8090:80"
3434
volumes:

ChatQnA/docker_compose/intel/hpu/gaudi/compose_faqgen_tgi.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ services:
2828
LOGFLAG: ${LOGFLAG}
2929
tei-embedding-service:
3030
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.6
31-
container_name: tei-embedding-gaudi-server
31+
container_name: tei-embedding-server
3232
ports:
3333
- "8090:80"
3434
volumes:

ChatQnA/docker_compose/intel/hpu/gaudi/compose_guardrails.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ services:
6666
restart: unless-stopped
6767
tei-embedding-service:
6868
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.6
69-
container_name: tei-embedding-gaudi-server
69+
container_name: tei-embedding-server
7070
ports:
7171
- "8090:80"
7272
volumes:

ChatQnA/docker_compose/intel/hpu/gaudi/compose_tgi.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ services:
2727
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
2828
tei-embedding-service:
2929
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.6
30-
container_name: tei-embedding-gaudi-server
30+
container_name: tei-embedding-server
3131
ports:
3232
- "8090:80"
3333
volumes:

ChatQnA/docker_compose/intel/hpu/gaudi/compose_without_rerank.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ services:
2727
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
2828
tei-embedding-service:
2929
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.6
30-
container_name: tei-embedding-gaudi-server
30+
container_name: tei-embedding-server
3131
ports:
3232
- "8090:80"
3333
volumes:

ChatQnA/docker_compose/intel/hpu/gaudi/how_to_validate_service.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ f810f3b4d329 opea/embedding:latest "python embed
4848
2fa17d84605f opea/dataprep:latest "python prepare_doc_…" 2 minutes ago Up 2 minutes 0.0.0.0:6007->6007/tcp, :::6007->5000/tcp dataprep-redis-server
4949
69e1fb59e92c opea/retriever:latest "/home/user/comps/re…" 2 minutes ago Up 2 minutes 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server
5050
313b9d14928a opea/reranking-tei:latest "python reranking_te…" 2 minutes ago Up 2 minutes 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-gaudi-server
51-
174bd43fa6b5 ghcr.io/huggingface/tei-gaudi:1.5.0 "text-embeddings-rou…" 2 minutes ago Up 2 minutes 0.0.0.0:8090->80/tcp, :::8090->80/tcp tei-embedding-gaudi-server
51+
174bd43fa6b5 ghcr.io/huggingface/tei-gaudi:1.5.0 "text-embeddings-rou…" 2 minutes ago Up 2 minutes 0.0.0.0:8090->80/tcp, :::8090->80/tcp tei-embedding-server
5252
05c40b636239 ghcr.io/huggingface/tgi-gaudi:2.3.1 "text-generation-lau…" 2 minutes ago Exited (1) About a minute ago tgi-gaudi-server
5353
74084469aa33 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 2 minutes ago Up 2 minutes 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db
5454
88399dbc9e43 ghcr.io/huggingface/text-embeddings-inference:cpu-1.6 "text-embeddings-rou…" 2 minutes ago Up 2 minutes 0.0.0.0:8808->80/tcp, :::8808->80/tcp tei-reranking-gaudi-server

ChatQnA/docker_compose/intel/hpu/gaudi/prometheus.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ scrape_configs:
2020
- job_name: "tei-embedding"
2121
metrics_path: /metrics
2222
static_configs:
23-
- targets: ["tei-embedding-gaudi-server:80"]
23+
- targets: ["tei-embedding-server:80"]
2424
- job_name: "tei-reranking"
2525
metrics_path: /metrics
2626
static_configs:

ChatQnA/tests/test_compose_faqgen_on_gaudi.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,7 @@ function validate_microservices() {
120120
"${ip_address}:8090/embed" \
121121
"[[" \
122122
"tei-embedding" \
123-
"tei-embedding-gaudi-server" \
123+
"tei-embedding-server" \
124124
'{"inputs":"What is Deep Learning?"}'
125125

126126
sleep 1m # retrieval can't curl as expected, try to wait for more time

ChatQnA/tests/test_compose_faqgen_tgi_on_gaudi.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -116,7 +116,7 @@ function validate_microservices() {
116116
"${ip_address}:8090/embed" \
117117
"[[" \
118118
"tei-embedding" \
119-
"tei-embedding-gaudi-server" \
119+
"tei-embedding-server" \
120120
'{"inputs":"What is Deep Learning?"}'
121121

122122
sleep 1m # retrieval can't curl as expected, try to wait for more time

ChatQnA/tests/test_compose_guardrails_on_gaudi.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ function validate_microservices() {
113113
"${ip_address}:8090/embed" \
114114
"[[" \
115115
"tei-embedding" \
116-
"tei-embedding-gaudi-server" \
116+
"tei-embedding-server" \
117117
'{"inputs":"What is Deep Learning?"}'
118118

119119
sleep 1m # retrieval can't curl as expected, try to wait for more time

ChatQnA/tests/test_compose_on_gaudi.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ function validate_microservices() {
100100
"${ip_address}:8090/embed" \
101101
"\[\[" \
102102
"tei-embedding" \
103-
"tei-embedding-gaudi-server" \
103+
"tei-embedding-server" \
104104
'{"inputs":"What is Deep Learning?"}'
105105
echo "::endgroup::"
106106

ChatQnA/tests/test_compose_tgi_on_gaudi.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ function validate_microservices() {
110110
"${ip_address}:8090/embed" \
111111
"[[" \
112112
"tei-embedding" \
113-
"tei-embedding-gaudi-server" \
113+
"tei-embedding-server" \
114114
'{"inputs":"What is Deep Learning?"}'
115115

116116
sleep 1m # retrieval can't curl as expected, try to wait for more time

ChatQnA/tests/test_compose_without_rerank_on_gaudi.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ function validate_microservices() {
109109
"${ip_address}:8090/embed" \
110110
"[[" \
111111
"tei-embedding" \
112-
"tei-embedding-gaudi-server" \
112+
"tei-embedding-server" \
113113
'{"inputs":"What is Deep Learning?"}'
114114

115115
sleep 1m # retrieval can't curl as expected, try to wait for more time

0 commit comments

Comments
 (0)