The server is failing to run #591

u650080 · 2024-08-27T13:23:07Z

System Info

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

just run the docker with these args :
docker run --gpus '"device=0,1,2,3"'
-p 8800:80
--shm-size=150gb
-d
--name lorax
-v /home/dockerfiles/:/app
-v /home/dockerfiles/data-lorax:/data
ghcr.io/predibase/lorax:main
--model-id /app/Mixtral-8x7B-v0.1
--adapter-id /app/adapters/vprn_adapter
--adapter-source local
--master-port 29400

Expected behavior

2024-08-26T19:59:50.323362Z INFO lorax_launcher: Args { model_id: "/app/Mixtral-8x7B-v0.1", adapter_id: Some("/app/adapters/vprn_adapter"), source: "hub", default_adapter_source: None, adapter_source: "local", revision: None, validation_workers: 2, sharded: None, embedding_model: None, num_shard: None, quantize: None, compile: false, speculative_tokens: None, preloaded_adapter_ids: [], dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, eager_prefill: None, prefix_caching: None, max_active_adapters: 1024, adapter_cycle_time_s: 2, adapter_memory_fraction: 0.1, hostname: "2203c0c58385", port: 80, shard_uds_path: "/tmp/lorax-server", master_addr: "localhost", master_port: 29400, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, json_output: false, otlp_endpoint: None, cors_allow_origin: [], cors_allow_header: [], cors_expose_header: [], cors_allow_method: [], cors_allow_credentials: None, watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false, download_only: false, tokenizer_config_path: None }
2024-08-26T19:59:50.323401Z INFO lorax_launcher: Sharding model on 4 processes
2024-08-26T19:59:50.323517Z INFO download: lorax_launcher: Starting download process.
2024-08-26T19:59:57.513571Z INFO lorax_launcher: weights.py:474 Files are already present on the host. Skipping download.

2024-08-26T19:59:57.513621Z INFO lorax_launcher: weights.py:474 Files are already present on the host. Skipping download.

2024-08-26T19:59:58.333244Z INFO download: lorax_launcher: Successfully downloaded weights.
2024-08-26T19:59:58.333887Z INFO shard-manager: lorax_launcher: Starting shard rank=0
2024-08-26T19:59:58.333887Z INFO shard-manager: lorax_launcher: Starting shard rank=1
2024-08-26T19:59:58.333952Z INFO shard-manager: lorax_launcher: Starting shard rank=2
2024-08-26T19:59:58.334032Z INFO shard-manager: lorax_launcher: Starting shard rank=3
2024-08-26T20:00:08.347181Z INFO shard-manager: lorax_launcher: Waiting for shard to be ready... rank=0
2024-08-26T20:00:08.347898Z INFO shard-manager: lorax_launcher: Waiting for shard to be ready... rank=1
2024-08-26T20:00:08.348047Z INFO shard-manager: lorax_launcher: Waiting for shard to be ready... rank=2
2024-08-26T20:00:08.349334Z INFO shard-manager: lorax_launcher: Waiting for shard to be ready... rank=3
2024-08-26T20:00:14.389465Z ERROR lorax_launcher: server.py:287 Error when initializing model
Traceback (most recent call last):
File "/opt/conda/bin/lorax-server", line 8, in
sys.exit(app())
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in call
return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
return _main(
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/opt/conda/lib/python3.10/site-packages/lorax_server/cli.py", line 87, in serve
server.serve(
File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 408, in serve
asyncio.run(
File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
self._run_once()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
handle._run()
File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)

File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 274, in serve_inner
model = get_model(
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/init.py", line 221, in get_model
return FlashMixtral(
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/flash_mixtral.py", line 65, in init
torch.distributed.barrier(group=self.process_group)
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 79, in wrapper
return func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 3938, in barrier
work = group.barrier(opts=opts)
torch.distributed.DistBackendError: NCCL error in: /opt/conda/conda-bld/pytorch_1720538438429/work/torch/csrc/distributed/c10d/NCCLUtils.hpp:275, unhandled cuda error (run with NCCL_DEBUG=INFO for details), NCCL version 2.20.5
ncclUnhandledCudaError: Call to CUDA function failed.
Last error:
Cuda failure 3 'initialization error'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The server is failing to run #591

The server is failing to run #591

u650080 commented Aug 27, 2024

The server is failing to run #591

The server is failing to run #591

Comments

u650080 commented Aug 27, 2024

System Info

Information

Tasks

Reproduction

Expected behavior