support for answerdotai/ModernBERT-base #457

mathcass · 2024-12-19T23:55:02Z

Model description

I tried running this on AWS SageMaker with the config,

    config = {
        "HF_MODEL_ID": "answerdotai/ModernBERT-base",
        "POOLING": "mean",
    }

but it failed with the following error message from the AWS console,

Error: Could not create backend
Caused by:
    Could not start backend: Model is not supported: unknown variant `modernbert`, expected one of `bert`, `xlm-roberta`, `camembert`, `roberta`, `distilbert`, `nomic_bert` at line 32 column 28

Open source status

The model implementation is available
The model weights are available

Provide useful links for the implementation

Released today, https://huggingface.co/blog/modernbert

The text was updated successfully, but these errors were encountered:

mhillebrand · 2024-12-20T20:36:35Z

I tried launching TEI-gRPC with Docker, using a fine-tuned ModernBERT model, and I got this error:

2024-12-20T20:05:51.464907Z  INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "/mod**/**nal", revision: None, tokenization_workers: None, dtype: Some(Float16), pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 4000, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "81f1e6a2b3e2", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
thread 'main' panicked at /usr/src/router/src/lib.rs:134:62:
tokenizer.json not found. text-embeddings-inference only supports fast tokenizers: Error("data did not match any variant of untagged enum ModelWrapper", line: 251498, column: 1)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

And when I tried setting dtype to bfloat16:

error: invalid value 'bfloat16' for '--dtype <DTYPE>'
  [possible values: float16, float32]

tranformers 4.47.1
tokenizers 0.21.0
TEI 1.6.0

Upgrading to the latest main branch of transformers had the same problem.

kozistr · 2024-12-26T03:13:11Z

@mhillebrand @mathcass hi. I implemented the ModernBert model, and you can use it by building from the source based on #459! please feel free to leave a comment if you have an issue :)

mhillebrand · 2025-01-03T21:46:42Z

@mhillebrand @mathcass hi. I implemented the ModernBert model, and you can use it by building from the source based on #459! please feel free to leave a comment if you have an issue :)

Thank you, @kozistr

I've never built TEI from source. Any pointers on getting your branch to work with gRPC?

kozistr · 2025-01-06T03:36:06Z

@mhillebrand @mathcass hi. I implemented the ModernBert model, and you can use it by building from the source based on #459! please feel free to leave a comment if you have an issue :)

Thank you, @kozistr

I've never built TEI from source. Any pointers on getting your branch to work with gRPC?

you could try with Docker.

clone kozistr/text-embeddings-inference repo, feature/modernbert branch
build Docker image (e.g. docker build . -f Dockerfile-cuda --build-arg CUDA_COMPUTE_CAP=$runtime_compute_cap) (docs: https://github.com/huggingface/text-embeddings-inference?tab=readme-ov-file#docker-build)

touhi99 · 2025-01-08T17:12:24Z

+1

mhillebrand · 2025-02-28T19:17:32Z

@kozistr I built a docker image from your latest feature/modernbert repo, but I'm seeing some errors.

Bash commands:

git clone git@github.com:kozistr/text-embeddings-inference.git .
git checkout -b feature/modernbert
docker build . -f Dockerfile-cuda --build-arg CUDA_COMPUTE_CAP=86 -t tei-modernbert:latest

markdown_classifier.sh:

docker run \
    --runtime=nvidia \
    --gpus device=0 \
    -p 8120:80 \
    -v /opt/markdown/model:/model \
    --rm tei-modernbert:latest \
    --model-id /model/final \
    --dtype float16

Error:

$ /opt/boot/markdown_classifier.sh 
2025-02-28T19:04:43.751161Z  INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "/mod**/**nal", revision: None, tokenization_workers: None, dtype: Some(Float16), pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 20, auto_truncate: true, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "6269c46b1d6b", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }

thread 'main' panicked at /usr/src/router/src/lib.rs:134:62:
tokenizer.json not found. text-embeddings-inference only supports fast tokenizers: Error("data did not match any variant of untagged enum ModelWrapper", line: 251510, column: 1)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I initially saw an error about missing id2label and label2id in config.json, so Cursor + Claude Sonnet 3.7 added this:

  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1"
  },
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1
  }

kozistr · 2025-03-01T06:18:51Z

@kozistr I built a docker image from your latest feature/modernbert repo, but I'm seeing some errors.

Bash commands:
git clone git@github.com:kozistr/text-embeddings-inference.git .
git checkout -b feature/modernbert
docker build . -f Dockerfile-cuda --build-arg CUDA_COMPUTE_CAP=86 -t tei-modernbert:latest
markdown_classifier.sh:

docker run
--runtime=nvidia
--gpus device=0
-p 8120:80
-v /opt/markdown/model:/model
--rm tei-modernbert:latest
--model-id /model/final
--dtype float16
Error:

$ /opt/boot/markdown_classifier.sh
2025-02-28T19:04:43.751161Z INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "/mod**/**nal", revision: None, tokenization_workers: None, dtype: Some(Float16), pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 20, auto_truncate: true, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "6269c46b1d6b", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }

thread 'main' panicked at /usr/src/router/src/lib.rs:134:62:
tokenizer.json not found. text-embeddings-inference only supports fast tokenizers: Error("data did not match any variant of untagged enum ModelWrapper", line: 251510, column: 1)
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
I initially saw an error about missing id2label and label2id in config.json, so Cursor + Claude Sonnet 3.7 added this:
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1"
  },
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1
  }

it seems like there's no tokenizer under the path (error message: tokenizer.json not found. text-embeddings-inference only supports fast tokenizers:). maybe, placing tokenizer-related files (e.g. tokenizer.json, tokenizer_config.json, special_token_maps.json) along with the model files would resolve that issue. and, currently fp16 (w/ fa) is not supported, so please use it with fp32.

mhillebrand · 2025-03-01T06:25:25Z

Heh, I definitely checked the existence of tokenizer.json. It's there. BTW, I'm getting the same error with --dtype float32.

mhillebrand · 2025-03-01T06:28:15Z

I also tried to convert ModernBERT's tokenizer to a fast tokenizer, but I failed. I added -e RUST_BACKTRACE=full, but it wasn't much help

Args { model_id: "/mod**/**nal", revision: None, tokenization_workers: None, dtype: Some(Float32), pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "49f60a98aedc", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }

thread 'main' panicked at /usr/src/router/src/lib.rs:134:62:
tokenizer.json not found. text-embeddings-inference only supports fast tokenizers: Error("data did not match any variant of untagged enum ModelWrapper", line: 251510, column: 1)
stack backtrace:
   0:     0x58e788d860e9 - <unknown>
   1:     0x58e7889561c3 - <unknown>
   2:     0x58e788d859d2 - <unknown>
   3:     0x58e788d85f43 - <unknown>
   4:     0x58e788d8580b - <unknown>
   5:     0x58e788dc2a68 - <unknown>
   6:     0x58e788dc29c9 - <unknown>
   7:     0x58e788dc2e6c - <unknown>
   8:     0x58e78825616f - <unknown>
   9:     0x58e7882564e5 - <unknown>
  10:     0x58e78865117b - <unknown>
  11:     0x58e78879e627 - <unknown>
  12:     0x58e78879abee - <unknown>
  13:     0x58e7886c6723 - <unknown>
  14:     0x58e78879edc4 - <unknown>
  15:     0x7986f6cecd90 - <unknown>
  16:     0x7986f6cece40 - __libc_start_main
  17:     0x58e7885e1855 - <unknown>
  18:                0x0 - <unknown>

kozistr · 2025-03-01T06:32:44Z

I also tried to convert ModernBERT's tokenizer to a fast tokenizer, but I failed.

could you please double check the directory inside the docker to check the files are correctly located? looks like you mounted your local path (/opt/markdown/model) to /model and loaded the model from /model/final.

mhillebrand · 2025-03-01T06:35:27Z

final is a symbolic link to the directory in the screenshot above.

(infer) /opt/markdown/model$ l
total 8.0K
drwxrwxr-x 7 matt matt 4.0K Feb 28 14:23 ModernBERT-base
lrwxrwxrwx 1 matt matt   31 Feb 28 09:55 final -> ModernBERT-base/checkpoint-2100
drwxrwxr-x 3 matt matt 4.0K Feb 28 09:55 qwen
(infer) /opt/markdown/model$ ll final/*
-rw-rw-r-- 1 matt matt 1.5K Feb 28 22:23 final/config.json
-rw-rw-r-- 1 matt matt 286M Feb 28 09:32 final/model.safetensors
-rw-rw-r-- 1 matt matt 571M Feb 28 09:32 final/optimizer.pt
-rw-rw-r-- 1 matt matt  14K Feb 28 09:32 final/rng_state.pth
-rw-rw-r-- 1 matt matt 1.1K Feb 28 09:32 final/scheduler.pt
-rw-rw-r-- 1 matt matt  694 Feb 28 09:32 final/special_tokens_map.json
-rw-rw-r-- 1 matt matt 3.5M Feb 28 14:22 final/tokenizer.json
-rw-rw-r-- 1 matt matt  21K Feb 28 14:22 final/tokenizer_config.json
-rw-rw-r-- 1 matt matt 3.5K Feb 28 09:32 final/trainer_state.json
-rw-rw-r-- 1 matt matt 5.3K Feb 28 09:32 final/training_args.bin

kozistr · 2025-03-01T06:49:42Z

final is a symbolic link to the directory in the screenshot above.

(infer) /opt/markdown/model$ l
total 8.0K
drwxrwxr-x 7 matt matt 4.0K Feb 28 14:23 ModernBERT-base
lrwxrwxrwx 1 matt matt 31 Feb 28 09:55 final -> ModernBERT-base/checkpoint-2100
drwxrwxr-x 3 matt matt 4.0K Feb 28 09:55 qwen
(infer) /opt/markdown/model$ ll final/*
-rw-rw-r-- 1 matt matt 1.5K Feb 28 22:23 final/config.json
-rw-rw-r-- 1 matt matt 286M Feb 28 09:32 final/model.safetensors
-rw-rw-r-- 1 matt matt 571M Feb 28 09:32 final/optimizer.pt
-rw-rw-r-- 1 matt matt 14K Feb 28 09:32 final/rng_state.pth
-rw-rw-r-- 1 matt matt 1.1K Feb 28 09:32 final/scheduler.pt
-rw-rw-r-- 1 matt matt 694 Feb 28 09:32 final/special_tokens_map.json
-rw-rw-r-- 1 matt matt 3.5M Feb 28 14:22 final/tokenizer.json
-rw-rw-r-- 1 matt matt 21K Feb 28 14:22 final/tokenizer_config.json
-rw-rw-r-- 1 matt matt 3.5K Feb 28 09:32 final/trainer_state.json
-rw-rw-r-- 1 matt matt 5.3K Feb 28 09:32 final/training_args.bin

well, as far as I know, when mounting a local path to a Docker container, symbolic links within the mounted directory are not automatically resolved or followed. So, it'd be great to check the mounted directory inside the container.

mhillebrand · 2025-03-01T06:51:36Z

Symbolic links have always worked for me when using Docker. I just did away with the symbolic link, and I got the same bogus error.

docker run \
    --gpus device=0 \
    -p 8120:80 \
    -v /opt/markdown/model:/model \
    -e RUST_BACKTRACE=full \
    --rm tei-modernbert:latest \
    --model-id /model/ModernBERT-base/checkpoint-2100

2025-03-01T06:49:31.005543Z  INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "/mod**/**********-****/**********-*100", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "49770c8319fe", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }

thread 'main' panicked at /usr/src/router/src/lib.rs:134:62:
tokenizer.json not found. text-embeddings-inference only supports fast tokenizers: Error("data did not match any variant of untagged enum ModelWrapper", line: 251510, column: 1)
stack backtrace:
   0:     0x6491206bd0e9 - <unknown>
   1:     0x64912028d1c3 - <unknown>
   2:     0x6491206bc9d2 - <unknown>
   3:     0x6491206bcf43 - <unknown>
   4:     0x6491206bc80b - <unknown>
   5:     0x6491206f9a68 - <unknown>
   6:     0x6491206f99c9 - <unknown>
   7:     0x6491206f9e6c - <unknown>
   8:     0x64911fb8d16f - <unknown>
   9:     0x64911fb8d4e5 - <unknown>
  10:     0x64911ff8817b - <unknown>
  11:     0x6491200d5627 - <unknown>
  12:     0x6491200d1bee - <unknown>
  13:     0x64911fffd723 - <unknown>
  14:     0x6491200d5dc4 - <unknown>
  15:     0x7243705bdd90 - <unknown>
  16:     0x7243705bde40 - __libc_start_main
  17:     0x64911ff18855 - <unknown>
  18:                0x0 - <unknown>

kozistr · 2025-03-01T07:05:03Z

Symbolic links have always worked for me when using Docker. I just did away with the symbolic link, and I got the same bogus error.

docker run
--gpus device=0
-p 8120:80
-v /opt/markdown/model:/model
-e RUST_BACKTRACE=full
--rm tei-modernbert:latest
--model-id /model/ModernBERT-base/checkpoint-2100
2025-03-01T06:49:31.005543Z INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "/mod**/-/****-*100", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "49770c8319fe", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }

thread 'main' panicked at /usr/src/router/src/lib.rs:134:62:
tokenizer.json not found. text-embeddings-inference only supports fast tokenizers: Error("data did not match any variant of untagged enum ModelWrapper", line: 251510, column: 1)
stack backtrace:
0: 0x6491206bd0e9 -
1: 0x64912028d1c3 -
2: 0x6491206bc9d2 -
3: 0x6491206bcf43 -
4: 0x6491206bc80b -
5: 0x6491206f9a68 -
6: 0x6491206f99c9 -
7: 0x6491206f9e6c -
8: 0x64911fb8d16f -
9: 0x64911fb8d4e5 -
10: 0x64911ff8817b -
11: 0x6491200d5627 -
12: 0x6491200d1bee -
13: 0x64911fffd723 -
14: 0x6491200d5dc4 -
15: 0x7243705bdd90 -
16: 0x7243705bde40 - __libc_start_main
17: 0x64911ff18855 -
18: 0x0 -

that's weird. I've just tested the ModernBERT-base from here and it worked.

2025-03-01T07:02:55.572364Z  INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "./Mod*******-*ase", revision: None, tokenization_workers: None, dtype: Some(Float32), pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: true, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "0.0.0.0", port: 8888, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: None, payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
2025-03-01T07:02:55.573023Z  WARN text_embeddings_router: router/src/lib.rs:392: The `--pooling` arg is not set and we could not find a pooling configuration (`1_Pooling/config.json`) for this model but the model is a BERT variant. Defaulting to `CLS` pooling.
2025-03-01T07:02:55.639180Z  WARN text_embeddings_router: router/src/lib.rs:184: Could not find a Sentence Transformers config
2025-03-01T07:02:55.639210Z  INFO text_embeddings_router: router/src/lib.rs:188: Maximum number of tokens per request: 8192
2025-03-01T07:02:55.639377Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 8 tokenization workers
2025-03-01T07:02:55.714197Z  INFO text_embeddings_router: router/src/lib.rs:230: Starting model backend
2025-03-01T07:02:55.715244Z  INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:239: Starting ModernBert model on Cpu
2025-03-01T07:02:56.509923Z  WARN text_embeddings_router: router/src/lib.rs:258: Backend does not support a batch size > 4
2025-03-01T07:02:56.509950Z  WARN text_embeddings_router: router/src/lib.rs:259: forcing `max_batch_requests=4`
2025-03-01T07:02:56.510797Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1812: Starting HTTP server: 0.0.0.0:8888
2025-03-01T07:02:56.510826Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1813: Ready

mhillebrand · 2025-03-01T07:20:08Z

If I use --model-id answerdotai/ModernBERT-base, I get this error:

2025-03-01T07:17:12.442165Z  INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "ans********/**********-*ase", revision: None, tokenization_workers: None, dtype: Some(Float32), pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: true, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "28fa9a8e0b07", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
2025-03-01T07:17:12.442285Z  INFO hf_hub: /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"    
2025-03-01T07:17:12.514974Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:20: Starting download
2025-03-01T07:17:12.514982Z  INFO download_artifacts:download_pool_config: text_embeddings_core::download: core/src/download.rs:53: Downloading `1_Pooling/config.json`
2025-03-01T07:17:12.654557Z  WARN download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Download failed: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/answerdotai/ModernBERT-base/resolve/main/1_Pooling/config.json)
2025-03-01T07:17:13.411914Z  INFO download_artifacts:download_new_st_config: text_embeddings_core::download: core/src/download.rs:77: Downloading `config_sentence_transformers.json`
2025-03-01T07:17:13.501716Z  WARN download_artifacts: text_embeddings_core::download: core/src/download.rs:36: Download failed: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/answerdotai/ModernBERT-base/resolve/main/config_sentence_transformers.json)
2025-03-01T07:17:13.501737Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:40: Downloading `config.json`
2025-03-01T07:17:13.731007Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:43: Downloading `tokenizer.json`
2025-03-01T07:17:14.047773Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:47: Model artifacts downloaded in 1.532796807s
2025-03-01T07:17:14.047890Z  WARN text_embeddings_router: router/src/lib.rs:390: The `--pooling` arg is not set and we could not find a pooling configuration (`1_Pooling/config.json`) for this model but the model is a BERT variant. Defaulting to `CLS` pooling.
...    
2025-03-01T07:17:14.113331Z  WARN text_embeddings_router: router/src/lib.rs:184: Could not find a Sentence Transformers config
2025-03-01T07:17:14.113336Z  INFO text_embeddings_router: router/src/lib.rs:188: Maximum number of tokens per request: 8192
2025-03-01T07:17:14.113347Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 64 tokenization workers
2025-03-01T07:17:14.511380Z  INFO text_embeddings_router: router/src/lib.rs:230: Starting model backend
2025-03-01T07:17:14.511403Z  INFO text_embeddings_backend: backends/src/lib.rs:486: Downloading `model.safetensors`
2025-03-01T07:17:21.334627Z  INFO text_embeddings_backend: backends/src/lib.rs:370: Model weights downloaded in 6.823220759s
2025-03-01T07:17:21.334749Z ERROR text_embeddings_backend: backends/src/lib.rs:381: Could not start Candle backend: Could not start backend: Model is not supported

Caused by:
    unknown variant `modernbert`, expected one of `bert`, `xlm-roberta`, `camembert`, `roberta`, `distilbert`, `nomic_bert`, `mistral`, `new`, `qwen2`, `mpnet` at line 32 column 28
Error: Could not create backend

Caused by:
    Could not start backend: Could not start a suitable backend

mhillebrand · 2025-03-01T07:39:57Z

@kozistr I see you're using OlivierDehaene/candle in your Cargo.toml file, which is pretty far behind huggingface/candle. It looks like ModernBERT support was added to huggingface/candle on January 12th. However, if I switch it, I get this error when running docker build:
13.61 the package `text-embeddings-backend-candle` depends on `candle-core`, with features: `accelerate` but `candle-core` does not have these features.

kozistr · 2025-03-02T04:26:09Z

If I use --model-id answerdotai/ModernBERT-base, I get this error:

2025-03-01T07:17:12.442165Z INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "ans********/**********-*ase", revision: None, tokenization_workers: None, dtype: Some(Float32), pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: true, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "28fa9a8e0b07", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
2025-03-01T07:17:12.442285Z INFO hf_hub: /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"
2025-03-01T07:17:12.514974Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:20: Starting download
2025-03-01T07:17:12.514982Z INFO download_artifacts:download_pool_config: text_embeddings_core::download: core/src/download.rs:53: Downloading 1_Pooling/config.json
2025-03-01T07:17:12.654557Z WARN download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Download failed: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/answerdotai/ModernBERT-base/resolve/main/1_Pooling/config.json)
2025-03-01T07:17:13.411914Z INFO download_artifacts:download_new_st_config: text_embeddings_core::download: core/src/download.rs:77: Downloading config_sentence_transformers.json
2025-03-01T07:17:13.501716Z WARN download_artifacts: text_embeddings_core::download: core/src/download.rs:36: Download failed: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/answerdotai/ModernBERT-base/resolve/main/config_sentence_transformers.json)
2025-03-01T07:17:13.501737Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:40: Downloading config.json
2025-03-01T07:17:13.731007Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:43: Downloading tokenizer.json
2025-03-01T07:17:14.047773Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:47: Model artifacts downloaded in 1.532796807s
2025-03-01T07:17:14.047890Z WARN text_embeddings_router: router/src/lib.rs:390: The --pooling arg is not set and we could not find a pooling configuration (1_Pooling/config.json) for this model but the model is a BERT variant. Defaulting to CLS pooling.
...
2025-03-01T07:17:14.113331Z WARN text_embeddings_router: router/src/lib.rs:184: Could not find a Sentence Transformers config
2025-03-01T07:17:14.113336Z INFO text_embeddings_router: router/src/lib.rs:188: Maximum number of tokens per request: 8192
2025-03-01T07:17:14.113347Z INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 64 tokenization workers
2025-03-01T07:17:14.511380Z INFO text_embeddings_router: router/src/lib.rs:230: Starting model backend
2025-03-01T07:17:14.511403Z INFO text_embeddings_backend: backends/src/lib.rs:486: Downloading model.safetensors
2025-03-01T07:17:21.334627Z INFO text_embeddings_backend: backends/src/lib.rs:370: Model weights downloaded in 6.823220759s
2025-03-01T07:17:21.334749Z ERROR text_embeddings_backend: backends/src/lib.rs:381: Could not start Candle backend: Could not start backend: Model is not supported

Caused by:
unknown variant modernbert, expected one of bert, xlm-roberta, camembert, roberta, distilbert, nomic_bert, mistral, new, qwen2, mpnet at line 32 column 28
Error: Could not create backend

Caused by:
Could not start backend: Could not start a suitable backend

looks like you didn't build from my branch.

git checkout -b feature/modernbert -> you created a new branch named feature/modernbert, not checking out.

you can clone the specific branch directly by git clone -b <branch> <remote_repo>

mhillebrand · 2025-03-03T17:11:36Z

@kozistr Ha, that explains a lot! :) I've correctly cloned feature/modernbert and recompiled, but docker run now yields a new error.

Error: Model backend is not healthy

Caused by:
    DriverError(CUDA_ERROR_NOT_FOUND, "named symbol not found") when loading uabs_i64

Here's my docker run command:

docker run \
    --gpus all \
    -p 8080:80 \
    -v /opt/models:/models \
    --rm tei-modernbert:latest \
    --model-id /models/ModernBERT-base \
    --dtype float32

I tried upgrading my Nvidia driver and CUDA toolkit to 12.8, and I tried adjusting the CUDA versions in Dockerfile-cuda and recompiling. I also upgraded my Rust compiler. No luck, though. I should point out that my machine has Nvidia RTX A6000 GPUs (Ampere), so CUDA_COMPUTE_CAP=86 is indeed correct.

kozistr linked a pull request Dec 25, 2024 that will close this issue

Implement the ModernBert model #459

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support for answerdotai/ModernBERT-base #457

support for answerdotai/ModernBERT-base #457

mathcass commented Dec 19, 2024 •

edited

Loading

mhillebrand commented Dec 20, 2024 •

edited

Loading

kozistr commented Dec 26, 2024

mhillebrand commented Jan 3, 2025 •

edited

Loading

kozistr commented Jan 6, 2025

touhi99 commented Jan 8, 2025

mhillebrand commented Feb 28, 2025 •

edited

Loading

kozistr commented Mar 1, 2025

mhillebrand commented Mar 1, 2025

mhillebrand commented Mar 1, 2025 •

edited

Loading

kozistr commented Mar 1, 2025 •

edited

Loading

mhillebrand commented Mar 1, 2025

kozistr commented Mar 1, 2025 •

edited

Loading

mhillebrand commented Mar 1, 2025

kozistr commented Mar 1, 2025

mhillebrand commented Mar 1, 2025

mhillebrand commented Mar 1, 2025 •

edited

Loading

kozistr commented Mar 2, 2025

mhillebrand commented Mar 3, 2025 •

edited

Loading

support for answerdotai/ModernBERT-base #457

support for answerdotai/ModernBERT-base #457

Comments

mathcass commented Dec 19, 2024 • edited Loading

Model description

Open source status

Provide useful links for the implementation

mhillebrand commented Dec 20, 2024 • edited Loading

kozistr commented Dec 26, 2024

mhillebrand commented Jan 3, 2025 • edited Loading

kozistr commented Jan 6, 2025

touhi99 commented Jan 8, 2025

mhillebrand commented Feb 28, 2025 • edited Loading

kozistr commented Mar 1, 2025

mhillebrand commented Mar 1, 2025

mhillebrand commented Mar 1, 2025 • edited Loading

kozistr commented Mar 1, 2025 • edited Loading

mhillebrand commented Mar 1, 2025

kozistr commented Mar 1, 2025 • edited Loading

mhillebrand commented Mar 1, 2025

kozistr commented Mar 1, 2025

mhillebrand commented Mar 1, 2025

mhillebrand commented Mar 1, 2025 • edited Loading

kozistr commented Mar 2, 2025

mhillebrand commented Mar 3, 2025 • edited Loading

mathcass commented Dec 19, 2024 •

edited

Loading

mhillebrand commented Dec 20, 2024 •

edited

Loading

mhillebrand commented Jan 3, 2025 •

edited

Loading

mhillebrand commented Feb 28, 2025 •

edited

Loading

mhillebrand commented Mar 1, 2025 •

edited

Loading

kozistr commented Mar 1, 2025 •

edited

Loading

kozistr commented Mar 1, 2025 •

edited

Loading

mhillebrand commented Mar 1, 2025 •

edited

Loading

mhillebrand commented Mar 3, 2025 •

edited

Loading