You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using docker image and it is 2 days old ghcr.io/predibase/lorax:main
This is my host nvidia info :
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H100 80GB HBM3 Off | 00000000:55:00.0 Off | 0 |
| N/A 37C P0 72W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA H100 80GB HBM3 Off | 00000000:68:00.0 Off | 0 |
| N/A 37C P0 72W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA H100 80GB HBM3 Off | 00000000:D2:00.0 Off | 0 |
| N/A 37C P0 70W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA H100 80GB HBM3 Off | 00000000:E4:00.0 Off | 0 |
| N/A 37C P0 72W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Information
Docker
The CLI directly
Tasks
An officially supported command
My own modifications
Reproduction
just run the docker with these args :
docker run --gpus '"device=0,1,2,3"'
-p 8800:80
--shm-size=150gb
-d
--name lorax
-v /home/dockerfiles/:/app
-v /home/dockerfiles/data-lorax:/data
ghcr.io/predibase/lorax:main
--model-id /app/Mixtral-8x7B-v0.1
--adapter-id /app/adapters/vprn_adapter
--adapter-source local
--master-port 29400
2024-08-26T19:59:57.513621Z INFO lorax_launcher: weights.py:474 Files are already present on the host. Skipping download.
2024-08-26T19:59:58.333244Z INFO download: lorax_launcher: Successfully downloaded weights.
2024-08-26T19:59:58.333887Z INFO shard-manager: lorax_launcher: Starting shard rank=0
2024-08-26T19:59:58.333887Z INFO shard-manager: lorax_launcher: Starting shard rank=1
2024-08-26T19:59:58.333952Z INFO shard-manager: lorax_launcher: Starting shard rank=2
2024-08-26T19:59:58.334032Z INFO shard-manager: lorax_launcher: Starting shard rank=3
2024-08-26T20:00:08.347181Z INFO shard-manager: lorax_launcher: Waiting for shard to be ready... rank=0
2024-08-26T20:00:08.347898Z INFO shard-manager: lorax_launcher: Waiting for shard to be ready... rank=1
2024-08-26T20:00:08.348047Z INFO shard-manager: lorax_launcher: Waiting for shard to be ready... rank=2
2024-08-26T20:00:08.349334Z INFO shard-manager: lorax_launcher: Waiting for shard to be ready... rank=3
2024-08-26T20:00:14.389465Z ERROR lorax_launcher: server.py:287 Error when initializing model
Traceback (most recent call last):
File "/opt/conda/bin/lorax-server", line 8, in
sys.exit(app())
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in call
return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
return _main(
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/opt/conda/lib/python3.10/site-packages/lorax_server/cli.py", line 87, in serve
server.serve(
File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 408, in serve
asyncio.run(
File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
self._run_once()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
handle._run()
File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 274, in serve_inner
model = get_model(
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/init.py", line 221, in get_model
return FlashMixtral(
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/flash_mixtral.py", line 65, in init
torch.distributed.barrier(group=self.process_group)
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 79, in wrapper
return func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 3938, in barrier
work = group.barrier(opts=opts)
torch.distributed.DistBackendError: NCCL error in: /opt/conda/conda-bld/pytorch_1720538438429/work/torch/csrc/distributed/c10d/NCCLUtils.hpp:275, unhandled cuda error (run with NCCL_DEBUG=INFO for details), NCCL version 2.20.5
ncclUnhandledCudaError: Call to CUDA function failed.
Last error:
Cuda failure 3 'initialization error'
The text was updated successfully, but these errors were encountered:
System Info
I am using docker image and it is 2 days old ghcr.io/predibase/lorax:main
This is my host nvidia info :
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H100 80GB HBM3 Off | 00000000:55:00.0 Off | 0 |
| N/A 37C P0 72W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA H100 80GB HBM3 Off | 00000000:68:00.0 Off | 0 |
| N/A 37C P0 72W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA H100 80GB HBM3 Off | 00000000:D2:00.0 Off | 0 |
| N/A 37C P0 70W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA H100 80GB HBM3 Off | 00000000:E4:00.0 Off | 0 |
| N/A 37C P0 72W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Information
Tasks
Reproduction
just run the docker with these args :
docker run --gpus '"device=0,1,2,3"'
-p 8800:80
--shm-size=150gb
-d
--name lorax
-v /home/dockerfiles/:/app
-v /home/dockerfiles/data-lorax:/data
ghcr.io/predibase/lorax:main
--model-id /app/Mixtral-8x7B-v0.1
--adapter-id /app/adapters/vprn_adapter
--adapter-source local
--master-port 29400
Expected behavior
2024-08-26T19:59:50.323362Z INFO lorax_launcher: Args { model_id: "/app/Mixtral-8x7B-v0.1", adapter_id: Some("/app/adapters/vprn_adapter"), source: "hub", default_adapter_source: None, adapter_source: "local", revision: None, validation_workers: 2, sharded: None, embedding_model: None, num_shard: None, quantize: None, compile: false, speculative_tokens: None, preloaded_adapter_ids: [], dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, eager_prefill: None, prefix_caching: None, max_active_adapters: 1024, adapter_cycle_time_s: 2, adapter_memory_fraction: 0.1, hostname: "2203c0c58385", port: 80, shard_uds_path: "/tmp/lorax-server", master_addr: "localhost", master_port: 29400, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, json_output: false, otlp_endpoint: None, cors_allow_origin: [], cors_allow_header: [], cors_expose_header: [], cors_allow_method: [], cors_allow_credentials: None, watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false, download_only: false, tokenizer_config_path: None }
2024-08-26T19:59:50.323401Z INFO lorax_launcher: Sharding model on 4 processes
2024-08-26T19:59:50.323517Z INFO download: lorax_launcher: Starting download process.
2024-08-26T19:59:57.513571Z INFO lorax_launcher: weights.py:474 Files are already present on the host. Skipping download.
2024-08-26T19:59:57.513621Z INFO lorax_launcher: weights.py:474 Files are already present on the host. Skipping download.
2024-08-26T19:59:58.333244Z INFO download: lorax_launcher: Successfully downloaded weights.
2024-08-26T19:59:58.333887Z INFO shard-manager: lorax_launcher: Starting shard rank=0
2024-08-26T19:59:58.333887Z INFO shard-manager: lorax_launcher: Starting shard rank=1
2024-08-26T19:59:58.333952Z INFO shard-manager: lorax_launcher: Starting shard rank=2
2024-08-26T19:59:58.334032Z INFO shard-manager: lorax_launcher: Starting shard rank=3
2024-08-26T20:00:08.347181Z INFO shard-manager: lorax_launcher: Waiting for shard to be ready... rank=0
2024-08-26T20:00:08.347898Z INFO shard-manager: lorax_launcher: Waiting for shard to be ready... rank=1
2024-08-26T20:00:08.348047Z INFO shard-manager: lorax_launcher: Waiting for shard to be ready... rank=2
2024-08-26T20:00:08.349334Z INFO shard-manager: lorax_launcher: Waiting for shard to be ready... rank=3
2024-08-26T20:00:14.389465Z ERROR lorax_launcher: server.py:287 Error when initializing model
Traceback (most recent call last):
File "/opt/conda/bin/lorax-server", line 8, in
sys.exit(app())
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in call
return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
return _main(
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/opt/conda/lib/python3.10/site-packages/lorax_server/cli.py", line 87, in serve
server.serve(
File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 408, in serve
asyncio.run(
File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
self._run_once()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
handle._run()
File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
The text was updated successfully, but these errors were encountered: