Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The server is failing to run #591

Open
1 of 4 tasks
u650080 opened this issue Aug 27, 2024 · 0 comments
Open
1 of 4 tasks

The server is failing to run #591

u650080 opened this issue Aug 27, 2024 · 0 comments

Comments

@u650080
Copy link

u650080 commented Aug 27, 2024

System Info

I am using docker image and it is 2 days old ghcr.io/predibase/lorax:main
This is my host nvidia info :
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H100 80GB HBM3 Off | 00000000:55:00.0 Off | 0 |
| N/A 37C P0 72W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA H100 80GB HBM3 Off | 00000000:68:00.0 Off | 0 |
| N/A 37C P0 72W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA H100 80GB HBM3 Off | 00000000:D2:00.0 Off | 0 |
| N/A 37C P0 70W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA H100 80GB HBM3 Off | 00000000:E4:00.0 Off | 0 |
| N/A 37C P0 72W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

just run the docker with these args :
docker run --gpus '"device=0,1,2,3"'
-p 8800:80
--shm-size=150gb
-d
--name lorax
-v /home/dockerfiles/:/app
-v /home/dockerfiles/data-lorax:/data
ghcr.io/predibase/lorax:main
--model-id /app/Mixtral-8x7B-v0.1
--adapter-id /app/adapters/vprn_adapter
--adapter-source local
--master-port 29400

Expected behavior

2024-08-26T19:59:50.323362Z INFO lorax_launcher: Args { model_id: "/app/Mixtral-8x7B-v0.1", adapter_id: Some("/app/adapters/vprn_adapter"), source: "hub", default_adapter_source: None, adapter_source: "local", revision: None, validation_workers: 2, sharded: None, embedding_model: None, num_shard: None, quantize: None, compile: false, speculative_tokens: None, preloaded_adapter_ids: [], dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, eager_prefill: None, prefix_caching: None, max_active_adapters: 1024, adapter_cycle_time_s: 2, adapter_memory_fraction: 0.1, hostname: "2203c0c58385", port: 80, shard_uds_path: "/tmp/lorax-server", master_addr: "localhost", master_port: 29400, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, json_output: false, otlp_endpoint: None, cors_allow_origin: [], cors_allow_header: [], cors_expose_header: [], cors_allow_method: [], cors_allow_credentials: None, watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false, download_only: false, tokenizer_config_path: None }
2024-08-26T19:59:50.323401Z INFO lorax_launcher: Sharding model on 4 processes
2024-08-26T19:59:50.323517Z INFO download: lorax_launcher: Starting download process.
2024-08-26T19:59:57.513571Z INFO lorax_launcher: weights.py:474 Files are already present on the host. Skipping download.

2024-08-26T19:59:57.513621Z INFO lorax_launcher: weights.py:474 Files are already present on the host. Skipping download.

2024-08-26T19:59:58.333244Z INFO download: lorax_launcher: Successfully downloaded weights.
2024-08-26T19:59:58.333887Z INFO shard-manager: lorax_launcher: Starting shard rank=0
2024-08-26T19:59:58.333887Z INFO shard-manager: lorax_launcher: Starting shard rank=1
2024-08-26T19:59:58.333952Z INFO shard-manager: lorax_launcher: Starting shard rank=2
2024-08-26T19:59:58.334032Z INFO shard-manager: lorax_launcher: Starting shard rank=3
2024-08-26T20:00:08.347181Z INFO shard-manager: lorax_launcher: Waiting for shard to be ready... rank=0
2024-08-26T20:00:08.347898Z INFO shard-manager: lorax_launcher: Waiting for shard to be ready... rank=1
2024-08-26T20:00:08.348047Z INFO shard-manager: lorax_launcher: Waiting for shard to be ready... rank=2
2024-08-26T20:00:08.349334Z INFO shard-manager: lorax_launcher: Waiting for shard to be ready... rank=3
2024-08-26T20:00:14.389465Z ERROR lorax_launcher: server.py:287 Error when initializing model
Traceback (most recent call last):
File "/opt/conda/bin/lorax-server", line 8, in
sys.exit(app())
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in call
return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
return _main(
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/opt/conda/lib/python3.10/site-packages/lorax_server/cli.py", line 87, in serve
server.serve(
File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 408, in serve
asyncio.run(
File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
self._run_once()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
handle._run()
File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)

File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 274, in serve_inner
model = get_model(
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/init.py", line 221, in get_model
return FlashMixtral(
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/flash_mixtral.py", line 65, in init
torch.distributed.barrier(group=self.process_group)
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 79, in wrapper
return func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 3938, in barrier
work = group.barrier(opts=opts)
torch.distributed.DistBackendError: NCCL error in: /opt/conda/conda-bld/pytorch_1720538438429/work/torch/csrc/distributed/c10d/NCCLUtils.hpp:275, unhandled cuda error (run with NCCL_DEBUG=INFO for details), NCCL version 2.20.5
ncclUnhandledCudaError: Call to CUDA function failed.
Last error:
Cuda failure 3 'initialization error'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant