-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support for answerdotai/ModernBERT-base #457
Comments
I tried launching TEI-gRPC with Docker, using a fine-tuned ModernBERT model, and I got this error:
And when I tried setting dtype to bfloat16:
tranformers 4.47.1 Upgrading to the latest |
@mhillebrand @mathcass hi. I implemented the |
Thank you, @kozistr I've never built TEI from source. Any pointers on getting your branch to work with gRPC? |
you could try with Docker.
|
+1 |
@kozistr I built a docker image from your latest feature/modernbert repo, but I'm seeing some errors. Bash commands:
markdown_classifier.sh: docker run \
--runtime=nvidia \
--gpus device=0 \
-p 8120:80 \
-v /opt/markdown/model:/model \
--rm tei-modernbert:latest \
--model-id /model/final \
--dtype float16 Error: $ /opt/boot/markdown_classifier.sh
2025-02-28T19:04:43.751161Z INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "/mod**/**nal", revision: None, tokenization_workers: None, dtype: Some(Float16), pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 20, auto_truncate: true, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "6269c46b1d6b", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
thread 'main' panicked at /usr/src/router/src/lib.rs:134:62:
tokenizer.json not found. text-embeddings-inference only supports fast tokenizers: Error("data did not match any variant of untagged enum ModelWrapper", line: 251510, column: 1)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace I initially saw an error about missing
|
it seems like there's no tokenizer under the path (error message: |
I also tried to convert ModernBERT's tokenizer to a fast tokenizer, but I failed. I added Args { model_id: "/mod**/**nal", revision: None, tokenization_workers: None, dtype: Some(Float32), pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "49f60a98aedc", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
thread 'main' panicked at /usr/src/router/src/lib.rs:134:62:
tokenizer.json not found. text-embeddings-inference only supports fast tokenizers: Error("data did not match any variant of untagged enum ModelWrapper", line: 251510, column: 1)
stack backtrace:
0: 0x58e788d860e9 - <unknown>
1: 0x58e7889561c3 - <unknown>
2: 0x58e788d859d2 - <unknown>
3: 0x58e788d85f43 - <unknown>
4: 0x58e788d8580b - <unknown>
5: 0x58e788dc2a68 - <unknown>
6: 0x58e788dc29c9 - <unknown>
7: 0x58e788dc2e6c - <unknown>
8: 0x58e78825616f - <unknown>
9: 0x58e7882564e5 - <unknown>
10: 0x58e78865117b - <unknown>
11: 0x58e78879e627 - <unknown>
12: 0x58e78879abee - <unknown>
13: 0x58e7886c6723 - <unknown>
14: 0x58e78879edc4 - <unknown>
15: 0x7986f6cecd90 - <unknown>
16: 0x7986f6cece40 - __libc_start_main
17: 0x58e7885e1855 - <unknown>
18: 0x0 - <unknown> |
could you please double check the directory inside the docker to check the files are correctly located? looks like you mounted your local path ( |
(infer) /opt/markdown/model$ l
total 8.0K
drwxrwxr-x 7 matt matt 4.0K Feb 28 14:23 ModernBERT-base
lrwxrwxrwx 1 matt matt 31 Feb 28 09:55 final -> ModernBERT-base/checkpoint-2100
drwxrwxr-x 3 matt matt 4.0K Feb 28 09:55 qwen
(infer) /opt/markdown/model$ ll final/*
-rw-rw-r-- 1 matt matt 1.5K Feb 28 22:23 final/config.json
-rw-rw-r-- 1 matt matt 286M Feb 28 09:32 final/model.safetensors
-rw-rw-r-- 1 matt matt 571M Feb 28 09:32 final/optimizer.pt
-rw-rw-r-- 1 matt matt 14K Feb 28 09:32 final/rng_state.pth
-rw-rw-r-- 1 matt matt 1.1K Feb 28 09:32 final/scheduler.pt
-rw-rw-r-- 1 matt matt 694 Feb 28 09:32 final/special_tokens_map.json
-rw-rw-r-- 1 matt matt 3.5M Feb 28 14:22 final/tokenizer.json
-rw-rw-r-- 1 matt matt 21K Feb 28 14:22 final/tokenizer_config.json
-rw-rw-r-- 1 matt matt 3.5K Feb 28 09:32 final/trainer_state.json
-rw-rw-r-- 1 matt matt 5.3K Feb 28 09:32 final/training_args.bin |
well, as far as I know, when mounting a local path to a Docker container, symbolic links within the mounted directory are not automatically resolved or followed. So, it'd be great to check the mounted directory inside the container. |
Symbolic links have always worked for me when using Docker. I just did away with the symbolic link, and I got the same bogus error. docker run \
--gpus device=0 \
-p 8120:80 \
-v /opt/markdown/model:/model \
-e RUST_BACKTRACE=full \
--rm tei-modernbert:latest \
--model-id /model/ModernBERT-base/checkpoint-2100 2025-03-01T06:49:31.005543Z INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "/mod**/**********-****/**********-*100", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "49770c8319fe", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
thread 'main' panicked at /usr/src/router/src/lib.rs:134:62:
tokenizer.json not found. text-embeddings-inference only supports fast tokenizers: Error("data did not match any variant of untagged enum ModelWrapper", line: 251510, column: 1)
stack backtrace:
0: 0x6491206bd0e9 - <unknown>
1: 0x64912028d1c3 - <unknown>
2: 0x6491206bc9d2 - <unknown>
3: 0x6491206bcf43 - <unknown>
4: 0x6491206bc80b - <unknown>
5: 0x6491206f9a68 - <unknown>
6: 0x6491206f99c9 - <unknown>
7: 0x6491206f9e6c - <unknown>
8: 0x64911fb8d16f - <unknown>
9: 0x64911fb8d4e5 - <unknown>
10: 0x64911ff8817b - <unknown>
11: 0x6491200d5627 - <unknown>
12: 0x6491200d1bee - <unknown>
13: 0x64911fffd723 - <unknown>
14: 0x6491200d5dc4 - <unknown>
15: 0x7243705bdd90 - <unknown>
16: 0x7243705bde40 - __libc_start_main
17: 0x64911ff18855 - <unknown>
18: 0x0 - <unknown> |
that's weird. I've just tested the
|
If I use 2025-03-01T07:17:12.442165Z INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "ans********/**********-*ase", revision: None, tokenization_workers: None, dtype: Some(Float32), pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: true, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "28fa9a8e0b07", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
2025-03-01T07:17:12.442285Z INFO hf_hub: /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"
2025-03-01T07:17:12.514974Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:20: Starting download
2025-03-01T07:17:12.514982Z INFO download_artifacts:download_pool_config: text_embeddings_core::download: core/src/download.rs:53: Downloading `1_Pooling/config.json`
2025-03-01T07:17:12.654557Z WARN download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Download failed: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/answerdotai/ModernBERT-base/resolve/main/1_Pooling/config.json)
2025-03-01T07:17:13.411914Z INFO download_artifacts:download_new_st_config: text_embeddings_core::download: core/src/download.rs:77: Downloading `config_sentence_transformers.json`
2025-03-01T07:17:13.501716Z WARN download_artifacts: text_embeddings_core::download: core/src/download.rs:36: Download failed: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/answerdotai/ModernBERT-base/resolve/main/config_sentence_transformers.json)
2025-03-01T07:17:13.501737Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:40: Downloading `config.json`
2025-03-01T07:17:13.731007Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:43: Downloading `tokenizer.json`
2025-03-01T07:17:14.047773Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:47: Model artifacts downloaded in 1.532796807s
2025-03-01T07:17:14.047890Z WARN text_embeddings_router: router/src/lib.rs:390: The `--pooling` arg is not set and we could not find a pooling configuration (`1_Pooling/config.json`) for this model but the model is a BERT variant. Defaulting to `CLS` pooling.
...
2025-03-01T07:17:14.113331Z WARN text_embeddings_router: router/src/lib.rs:184: Could not find a Sentence Transformers config
2025-03-01T07:17:14.113336Z INFO text_embeddings_router: router/src/lib.rs:188: Maximum number of tokens per request: 8192
2025-03-01T07:17:14.113347Z INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 64 tokenization workers
2025-03-01T07:17:14.511380Z INFO text_embeddings_router: router/src/lib.rs:230: Starting model backend
2025-03-01T07:17:14.511403Z INFO text_embeddings_backend: backends/src/lib.rs:486: Downloading `model.safetensors`
2025-03-01T07:17:21.334627Z INFO text_embeddings_backend: backends/src/lib.rs:370: Model weights downloaded in 6.823220759s
2025-03-01T07:17:21.334749Z ERROR text_embeddings_backend: backends/src/lib.rs:381: Could not start Candle backend: Could not start backend: Model is not supported
Caused by:
unknown variant `modernbert`, expected one of `bert`, `xlm-roberta`, `camembert`, `roberta`, `distilbert`, `nomic_bert`, `mistral`, `new`, `qwen2`, `mpnet` at line 32 column 28
Error: Could not create backend
Caused by:
Could not start backend: Could not start a suitable backend |
@kozistr I see you're using |
looks like you didn't build from my branch.
you can clone the specific branch directly by |
@kozistr Ha, that explains a lot! :) I've correctly cloned
Here's my docker run command: docker run \
--gpus all \
-p 8080:80 \
-v /opt/models:/models \
--rm tei-modernbert:latest \
--model-id /models/ModernBERT-base \
--dtype float32 I tried upgrading my Nvidia driver and CUDA toolkit to 12.8, and I tried adjusting the CUDA versions in Dockerfile-cuda and recompiling. I also upgraded my Rust compiler. No luck, though. I should point out that my machine has Nvidia RTX A6000 GPUs (Ampere), so |
Model description
I tried running this on AWS SageMaker with the config,
but it failed with the following error message from the AWS console,
Open source status
Provide useful links for the implementation
Released today, https://huggingface.co/blog/modernbert
The text was updated successfully, but these errors were encountered: