Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for answerdotai/ModernBERT-base #457

Open
2 tasks done
mathcass opened this issue Dec 19, 2024 · 18 comments · May be fixed by #459
Open
2 tasks done

support for answerdotai/ModernBERT-base #457

mathcass opened this issue Dec 19, 2024 · 18 comments · May be fixed by #459

Comments

@mathcass
Copy link

mathcass commented Dec 19, 2024

Model description

I tried running this on AWS SageMaker with the config,

    config = {
        "HF_MODEL_ID": "answerdotai/ModernBERT-base",
        "POOLING": "mean",
    }

but it failed with the following error message from the AWS console,

Error: Could not create backend
Caused by:
    Could not start backend: Model is not supported: unknown variant `modernbert`, expected one of `bert`, `xlm-roberta`, `camembert`, `roberta`, `distilbert`, `nomic_bert` at line 32 column 28

Open source status

  • The model implementation is available
  • The model weights are available

Provide useful links for the implementation

Released today, https://huggingface.co/blog/modernbert

@mhillebrand
Copy link

mhillebrand commented Dec 20, 2024

I tried launching TEI-gRPC with Docker, using a fine-tuned ModernBERT model, and I got this error:

2024-12-20T20:05:51.464907Z  INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "/mod**/**nal", revision: None, tokenization_workers: None, dtype: Some(Float16), pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 4000, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "81f1e6a2b3e2", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
thread 'main' panicked at /usr/src/router/src/lib.rs:134:62:
tokenizer.json not found. text-embeddings-inference only supports fast tokenizers: Error("data did not match any variant of untagged enum ModelWrapper", line: 251498, column: 1)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

And when I tried setting dtype to bfloat16:

error: invalid value 'bfloat16' for '--dtype <DTYPE>'
  [possible values: float16, float32]

tranformers 4.47.1
tokenizers 0.21.0
TEI 1.6.0

Upgrading to the latest main branch of transformers had the same problem.

@kozistr kozistr linked a pull request Dec 25, 2024 that will close this issue
5 tasks
@kozistr
Copy link
Contributor

kozistr commented Dec 26, 2024

@mhillebrand @mathcass hi. I implemented the ModernBert model, and you can use it by building from the source based on #459! please feel free to leave a comment if you have an issue :)

@mhillebrand
Copy link

mhillebrand commented Jan 3, 2025

@mhillebrand @mathcass hi. I implemented the ModernBert model, and you can use it by building from the source based on #459! please feel free to leave a comment if you have an issue :)

Thank you, @kozistr

I've never built TEI from source. Any pointers on getting your branch to work with gRPC?

@kozistr
Copy link
Contributor

kozistr commented Jan 6, 2025

@mhillebrand @mathcass hi. I implemented the ModernBert model, and you can use it by building from the source based on #459! please feel free to leave a comment if you have an issue :)

Thank you, @kozistr

I've never built TEI from source. Any pointers on getting your branch to work with gRPC?

you could try with Docker.

  1. clone kozistr/text-embeddings-inference repo, feature/modernbert branch
  2. build Docker image (e.g. docker build . -f Dockerfile-cuda --build-arg CUDA_COMPUTE_CAP=$runtime_compute_cap) (docs: https://github.com/huggingface/text-embeddings-inference?tab=readme-ov-file#docker-build)

@touhi99
Copy link

touhi99 commented Jan 8, 2025

+1

@mhillebrand
Copy link

mhillebrand commented Feb 28, 2025

@kozistr I built a docker image from your latest feature/modernbert repo, but I'm seeing some errors.

Bash commands:

git clone git@github.com:kozistr/text-embeddings-inference.git .
git checkout -b feature/modernbert
docker build . -f Dockerfile-cuda --build-arg CUDA_COMPUTE_CAP=86 -t tei-modernbert:latest

markdown_classifier.sh:

docker run \
    --runtime=nvidia \
    --gpus device=0 \
    -p 8120:80 \
    -v /opt/markdown/model:/model \
    --rm tei-modernbert:latest \
    --model-id /model/final \
    --dtype float16

Error:

$ /opt/boot/markdown_classifier.sh 
2025-02-28T19:04:43.751161Z  INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "/mod**/**nal", revision: None, tokenization_workers: None, dtype: Some(Float16), pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 20, auto_truncate: true, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "6269c46b1d6b", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }

thread 'main' panicked at /usr/src/router/src/lib.rs:134:62:
tokenizer.json not found. text-embeddings-inference only supports fast tokenizers: Error("data did not match any variant of untagged enum ModelWrapper", line: 251510, column: 1)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I initially saw an error about missing id2label and label2id in config.json, so Cursor + Claude Sonnet 3.7 added this:

  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1"
  },
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1
  }

@kozistr
Copy link
Contributor

kozistr commented Mar 1, 2025

@kozistr I built a docker image from your latest feature/modernbert repo, but I'm seeing some errors.

Bash commands:

git clone git@github.com:kozistr/text-embeddings-inference.git .
git checkout -b feature/modernbert
docker build . -f Dockerfile-cuda --build-arg CUDA_COMPUTE_CAP=86 -t tei-modernbert:latest

markdown_classifier.sh:

docker run
--runtime=nvidia
--gpus device=0
-p 8120:80
-v /opt/markdown/model:/model
--rm tei-modernbert:latest
--model-id /model/final
--dtype float16
Error:

$ /opt/boot/markdown_classifier.sh
2025-02-28T19:04:43.751161Z INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "/mod**/**nal", revision: None, tokenization_workers: None, dtype: Some(Float16), pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 20, auto_truncate: true, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "6269c46b1d6b", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }

thread 'main' panicked at /usr/src/router/src/lib.rs:134:62:
tokenizer.json not found. text-embeddings-inference only supports fast tokenizers: Error("data did not match any variant of untagged enum ModelWrapper", line: 251510, column: 1)
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
I initially saw an error about missing id2label and label2id in config.json, so Cursor + Claude Sonnet 3.7 added this:

  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1"
  },
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1
  }

it seems like there's no tokenizer under the path (error message: tokenizer.json not found. text-embeddings-inference only supports fast tokenizers:). maybe, placing tokenizer-related files (e.g. tokenizer.json, tokenizer_config.json, special_token_maps.json) along with the model files would resolve that issue. and, currently fp16 (w/ fa) is not supported, so please use it with fp32.

@mhillebrand
Copy link

Heh, I definitely checked the existence of tokenizer.json. It's there. BTW, I'm getting the same error with --dtype float32.

Image

@mhillebrand
Copy link

mhillebrand commented Mar 1, 2025

I also tried to convert ModernBERT's tokenizer to a fast tokenizer, but I failed. I added -e RUST_BACKTRACE=full, but it wasn't much help

Args { model_id: "/mod**/**nal", revision: None, tokenization_workers: None, dtype: Some(Float32), pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "49f60a98aedc", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }

thread 'main' panicked at /usr/src/router/src/lib.rs:134:62:
tokenizer.json not found. text-embeddings-inference only supports fast tokenizers: Error("data did not match any variant of untagged enum ModelWrapper", line: 251510, column: 1)
stack backtrace:
   0:     0x58e788d860e9 - <unknown>
   1:     0x58e7889561c3 - <unknown>
   2:     0x58e788d859d2 - <unknown>
   3:     0x58e788d85f43 - <unknown>
   4:     0x58e788d8580b - <unknown>
   5:     0x58e788dc2a68 - <unknown>
   6:     0x58e788dc29c9 - <unknown>
   7:     0x58e788dc2e6c - <unknown>
   8:     0x58e78825616f - <unknown>
   9:     0x58e7882564e5 - <unknown>
  10:     0x58e78865117b - <unknown>
  11:     0x58e78879e627 - <unknown>
  12:     0x58e78879abee - <unknown>
  13:     0x58e7886c6723 - <unknown>
  14:     0x58e78879edc4 - <unknown>
  15:     0x7986f6cecd90 - <unknown>
  16:     0x7986f6cece40 - __libc_start_main
  17:     0x58e7885e1855 - <unknown>
  18:                0x0 - <unknown>

@kozistr
Copy link
Contributor

kozistr commented Mar 1, 2025

I also tried to convert ModernBERT's tokenizer to a fast tokenizer, but I failed.

could you please double check the directory inside the docker to check the files are correctly located? looks like you mounted your local path (/opt/markdown/model) to /model and loaded the model from /model/final.

@mhillebrand
Copy link

final is a symbolic link to the directory in the screenshot above.

(infer) /opt/markdown/model$ l
total 8.0K
drwxrwxr-x 7 matt matt 4.0K Feb 28 14:23 ModernBERT-base
lrwxrwxrwx 1 matt matt   31 Feb 28 09:55 final -> ModernBERT-base/checkpoint-2100
drwxrwxr-x 3 matt matt 4.0K Feb 28 09:55 qwen
(infer) /opt/markdown/model$ ll final/*
-rw-rw-r-- 1 matt matt 1.5K Feb 28 22:23 final/config.json
-rw-rw-r-- 1 matt matt 286M Feb 28 09:32 final/model.safetensors
-rw-rw-r-- 1 matt matt 571M Feb 28 09:32 final/optimizer.pt
-rw-rw-r-- 1 matt matt  14K Feb 28 09:32 final/rng_state.pth
-rw-rw-r-- 1 matt matt 1.1K Feb 28 09:32 final/scheduler.pt
-rw-rw-r-- 1 matt matt  694 Feb 28 09:32 final/special_tokens_map.json
-rw-rw-r-- 1 matt matt 3.5M Feb 28 14:22 final/tokenizer.json
-rw-rw-r-- 1 matt matt  21K Feb 28 14:22 final/tokenizer_config.json
-rw-rw-r-- 1 matt matt 3.5K Feb 28 09:32 final/trainer_state.json
-rw-rw-r-- 1 matt matt 5.3K Feb 28 09:32 final/training_args.bin

@kozistr
Copy link
Contributor

kozistr commented Mar 1, 2025

final is a symbolic link to the directory in the screenshot above.

(infer) /opt/markdown/model$ l
total 8.0K
drwxrwxr-x 7 matt matt 4.0K Feb 28 14:23 ModernBERT-base
lrwxrwxrwx 1 matt matt 31 Feb 28 09:55 final -> ModernBERT-base/checkpoint-2100
drwxrwxr-x 3 matt matt 4.0K Feb 28 09:55 qwen
(infer) /opt/markdown/model$ ll final/*
-rw-rw-r-- 1 matt matt 1.5K Feb 28 22:23 final/config.json
-rw-rw-r-- 1 matt matt 286M Feb 28 09:32 final/model.safetensors
-rw-rw-r-- 1 matt matt 571M Feb 28 09:32 final/optimizer.pt
-rw-rw-r-- 1 matt matt 14K Feb 28 09:32 final/rng_state.pth
-rw-rw-r-- 1 matt matt 1.1K Feb 28 09:32 final/scheduler.pt
-rw-rw-r-- 1 matt matt 694 Feb 28 09:32 final/special_tokens_map.json
-rw-rw-r-- 1 matt matt 3.5M Feb 28 14:22 final/tokenizer.json
-rw-rw-r-- 1 matt matt 21K Feb 28 14:22 final/tokenizer_config.json
-rw-rw-r-- 1 matt matt 3.5K Feb 28 09:32 final/trainer_state.json
-rw-rw-r-- 1 matt matt 5.3K Feb 28 09:32 final/training_args.bin

well, as far as I know, when mounting a local path to a Docker container, symbolic links within the mounted directory are not automatically resolved or followed. So, it'd be great to check the mounted directory inside the container.

@mhillebrand
Copy link

Symbolic links have always worked for me when using Docker. I just did away with the symbolic link, and I got the same bogus error.

docker run \
    --gpus device=0 \
    -p 8120:80 \
    -v /opt/markdown/model:/model \
    -e RUST_BACKTRACE=full \
    --rm tei-modernbert:latest \
    --model-id /model/ModernBERT-base/checkpoint-2100
2025-03-01T06:49:31.005543Z  INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "/mod**/**********-****/**********-*100", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "49770c8319fe", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }

thread 'main' panicked at /usr/src/router/src/lib.rs:134:62:
tokenizer.json not found. text-embeddings-inference only supports fast tokenizers: Error("data did not match any variant of untagged enum ModelWrapper", line: 251510, column: 1)
stack backtrace:
   0:     0x6491206bd0e9 - <unknown>
   1:     0x64912028d1c3 - <unknown>
   2:     0x6491206bc9d2 - <unknown>
   3:     0x6491206bcf43 - <unknown>
   4:     0x6491206bc80b - <unknown>
   5:     0x6491206f9a68 - <unknown>
   6:     0x6491206f99c9 - <unknown>
   7:     0x6491206f9e6c - <unknown>
   8:     0x64911fb8d16f - <unknown>
   9:     0x64911fb8d4e5 - <unknown>
  10:     0x64911ff8817b - <unknown>
  11:     0x6491200d5627 - <unknown>
  12:     0x6491200d1bee - <unknown>
  13:     0x64911fffd723 - <unknown>
  14:     0x6491200d5dc4 - <unknown>
  15:     0x7243705bdd90 - <unknown>
  16:     0x7243705bde40 - __libc_start_main
  17:     0x64911ff18855 - <unknown>
  18:                0x0 - <unknown>

@kozistr
Copy link
Contributor

kozistr commented Mar 1, 2025

Symbolic links have always worked for me when using Docker. I just did away with the symbolic link, and I got the same bogus error.

docker run
--gpus device=0
-p 8120:80
-v /opt/markdown/model:/model
-e RUST_BACKTRACE=full
--rm tei-modernbert:latest
--model-id /model/ModernBERT-base/checkpoint-2100
2025-03-01T06:49:31.005543Z INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "/mod**/-/****-*100", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "49770c8319fe", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }

thread 'main' panicked at /usr/src/router/src/lib.rs:134:62:
tokenizer.json not found. text-embeddings-inference only supports fast tokenizers: Error("data did not match any variant of untagged enum ModelWrapper", line: 251510, column: 1)
stack backtrace:
0: 0x6491206bd0e9 -
1: 0x64912028d1c3 -
2: 0x6491206bc9d2 -
3: 0x6491206bcf43 -
4: 0x6491206bc80b -
5: 0x6491206f9a68 -
6: 0x6491206f99c9 -
7: 0x6491206f9e6c -
8: 0x64911fb8d16f -
9: 0x64911fb8d4e5 -
10: 0x64911ff8817b -
11: 0x6491200d5627 -
12: 0x6491200d1bee -
13: 0x64911fffd723 -
14: 0x6491200d5dc4 -
15: 0x7243705bdd90 -
16: 0x7243705bde40 - __libc_start_main
17: 0x64911ff18855 -
18: 0x0 -

that's weird. I've just tested the ModernBERT-base from here and it worked.

2025-03-01T07:02:55.572364Z  INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "./Mod*******-*ase", revision: None, tokenization_workers: None, dtype: Some(Float32), pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: true, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "0.0.0.0", port: 8888, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: None, payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
2025-03-01T07:02:55.573023Z  WARN text_embeddings_router: router/src/lib.rs:392: The `--pooling` arg is not set and we could not find a pooling configuration (`1_Pooling/config.json`) for this model but the model is a BERT variant. Defaulting to `CLS` pooling.
2025-03-01T07:02:55.639180Z  WARN text_embeddings_router: router/src/lib.rs:184: Could not find a Sentence Transformers config
2025-03-01T07:02:55.639210Z  INFO text_embeddings_router: router/src/lib.rs:188: Maximum number of tokens per request: 8192
2025-03-01T07:02:55.639377Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 8 tokenization workers
2025-03-01T07:02:55.714197Z  INFO text_embeddings_router: router/src/lib.rs:230: Starting model backend
2025-03-01T07:02:55.715244Z  INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:239: Starting ModernBert model on Cpu
2025-03-01T07:02:56.509923Z  WARN text_embeddings_router: router/src/lib.rs:258: Backend does not support a batch size > 4
2025-03-01T07:02:56.509950Z  WARN text_embeddings_router: router/src/lib.rs:259: forcing `max_batch_requests=4`
2025-03-01T07:02:56.510797Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1812: Starting HTTP server: 0.0.0.0:8888
2025-03-01T07:02:56.510826Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1813: Ready

@mhillebrand
Copy link

If I use --model-id answerdotai/ModernBERT-base, I get this error:

2025-03-01T07:17:12.442165Z  INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "ans********/**********-*ase", revision: None, tokenization_workers: None, dtype: Some(Float32), pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: true, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "28fa9a8e0b07", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
2025-03-01T07:17:12.442285Z  INFO hf_hub: /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"    
2025-03-01T07:17:12.514974Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:20: Starting download
2025-03-01T07:17:12.514982Z  INFO download_artifacts:download_pool_config: text_embeddings_core::download: core/src/download.rs:53: Downloading `1_Pooling/config.json`
2025-03-01T07:17:12.654557Z  WARN download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Download failed: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/answerdotai/ModernBERT-base/resolve/main/1_Pooling/config.json)
2025-03-01T07:17:13.411914Z  INFO download_artifacts:download_new_st_config: text_embeddings_core::download: core/src/download.rs:77: Downloading `config_sentence_transformers.json`
2025-03-01T07:17:13.501716Z  WARN download_artifacts: text_embeddings_core::download: core/src/download.rs:36: Download failed: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/answerdotai/ModernBERT-base/resolve/main/config_sentence_transformers.json)
2025-03-01T07:17:13.501737Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:40: Downloading `config.json`
2025-03-01T07:17:13.731007Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:43: Downloading `tokenizer.json`
2025-03-01T07:17:14.047773Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:47: Model artifacts downloaded in 1.532796807s
2025-03-01T07:17:14.047890Z  WARN text_embeddings_router: router/src/lib.rs:390: The `--pooling` arg is not set and we could not find a pooling configuration (`1_Pooling/config.json`) for this model but the model is a BERT variant. Defaulting to `CLS` pooling.
...    
2025-03-01T07:17:14.113331Z  WARN text_embeddings_router: router/src/lib.rs:184: Could not find a Sentence Transformers config
2025-03-01T07:17:14.113336Z  INFO text_embeddings_router: router/src/lib.rs:188: Maximum number of tokens per request: 8192
2025-03-01T07:17:14.113347Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 64 tokenization workers
2025-03-01T07:17:14.511380Z  INFO text_embeddings_router: router/src/lib.rs:230: Starting model backend
2025-03-01T07:17:14.511403Z  INFO text_embeddings_backend: backends/src/lib.rs:486: Downloading `model.safetensors`
2025-03-01T07:17:21.334627Z  INFO text_embeddings_backend: backends/src/lib.rs:370: Model weights downloaded in 6.823220759s
2025-03-01T07:17:21.334749Z ERROR text_embeddings_backend: backends/src/lib.rs:381: Could not start Candle backend: Could not start backend: Model is not supported

Caused by:
    unknown variant `modernbert`, expected one of `bert`, `xlm-roberta`, `camembert`, `roberta`, `distilbert`, `nomic_bert`, `mistral`, `new`, `qwen2`, `mpnet` at line 32 column 28
Error: Could not create backend

Caused by:
    Could not start backend: Could not start a suitable backend

@mhillebrand
Copy link

mhillebrand commented Mar 1, 2025

@kozistr I see you're using OlivierDehaene/candle in your Cargo.toml file, which is pretty far behind huggingface/candle. It looks like ModernBERT support was added to huggingface/candle on January 12th. However, if I switch it, I get this error when running docker build:
13.61 the package `text-embeddings-backend-candle` depends on `candle-core`, with features: `accelerate` but `candle-core` does not have these features.

@kozistr
Copy link
Contributor

kozistr commented Mar 2, 2025

If I use --model-id answerdotai/ModernBERT-base, I get this error:

2025-03-01T07:17:12.442165Z INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "ans********/**********-*ase", revision: None, tokenization_workers: None, dtype: Some(Float32), pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: true, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "28fa9a8e0b07", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
2025-03-01T07:17:12.442285Z INFO hf_hub: /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"
2025-03-01T07:17:12.514974Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:20: Starting download
2025-03-01T07:17:12.514982Z INFO download_artifacts:download_pool_config: text_embeddings_core::download: core/src/download.rs:53: Downloading 1_Pooling/config.json
2025-03-01T07:17:12.654557Z WARN download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Download failed: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/answerdotai/ModernBERT-base/resolve/main/1_Pooling/config.json)
2025-03-01T07:17:13.411914Z INFO download_artifacts:download_new_st_config: text_embeddings_core::download: core/src/download.rs:77: Downloading config_sentence_transformers.json
2025-03-01T07:17:13.501716Z WARN download_artifacts: text_embeddings_core::download: core/src/download.rs:36: Download failed: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/answerdotai/ModernBERT-base/resolve/main/config_sentence_transformers.json)
2025-03-01T07:17:13.501737Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:40: Downloading config.json
2025-03-01T07:17:13.731007Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:43: Downloading tokenizer.json
2025-03-01T07:17:14.047773Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:47: Model artifacts downloaded in 1.532796807s
2025-03-01T07:17:14.047890Z WARN text_embeddings_router: router/src/lib.rs:390: The --pooling arg is not set and we could not find a pooling configuration (1_Pooling/config.json) for this model but the model is a BERT variant. Defaulting to CLS pooling.
...
2025-03-01T07:17:14.113331Z WARN text_embeddings_router: router/src/lib.rs:184: Could not find a Sentence Transformers config
2025-03-01T07:17:14.113336Z INFO text_embeddings_router: router/src/lib.rs:188: Maximum number of tokens per request: 8192
2025-03-01T07:17:14.113347Z INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 64 tokenization workers
2025-03-01T07:17:14.511380Z INFO text_embeddings_router: router/src/lib.rs:230: Starting model backend
2025-03-01T07:17:14.511403Z INFO text_embeddings_backend: backends/src/lib.rs:486: Downloading model.safetensors
2025-03-01T07:17:21.334627Z INFO text_embeddings_backend: backends/src/lib.rs:370: Model weights downloaded in 6.823220759s
2025-03-01T07:17:21.334749Z ERROR text_embeddings_backend: backends/src/lib.rs:381: Could not start Candle backend: Could not start backend: Model is not supported

Caused by:
unknown variant modernbert, expected one of bert, xlm-roberta, camembert, roberta, distilbert, nomic_bert, mistral, new, qwen2, mpnet at line 32 column 28
Error: Could not create backend

Caused by:
Could not start backend: Could not start a suitable backend

looks like you didn't build from my branch.

git checkout -b feature/modernbert -> you created a new branch named feature/modernbert, not checking out.

you can clone the specific branch directly by git clone -b <branch> <remote_repo>

@mhillebrand
Copy link

mhillebrand commented Mar 3, 2025

@kozistr Ha, that explains a lot! :) I've correctly cloned feature/modernbert and recompiled, but docker run now yields a new error.

Error: Model backend is not healthy

Caused by:
    DriverError(CUDA_ERROR_NOT_FOUND, "named symbol not found") when loading uabs_i64

Here's my docker run command:

docker run \
    --gpus all \
    -p 8080:80 \
    -v /opt/models:/models \
    --rm tei-modernbert:latest \
    --model-id /models/ModernBERT-base \
    --dtype float32

I tried upgrading my Nvidia driver and CUDA toolkit to 12.8, and I tried adjusting the CUDA versions in Dockerfile-cuda and recompiling. I also upgraded my Rust compiler. No luck, though. I should point out that my machine has Nvidia RTX A6000 GPUs (Ampere), so CUDA_COMPUTE_CAP=86 is indeed correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants