We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
torchrun --nproc_per_node 8 test/single_req_test.py request.max_new_tokens=64 models=DeepSeek-R1 models.ckpt_dir=/mnt/ds/models/DeepSeek-R1 infer.pp_size=1 infer.tp_size=8
The text was updated successfully, but these errors were encountered:
No branches or pull requests
使用1台A100(80G),尝试跑了下,大约10分钟后就会异常退出。是不是需要2台A100才行?
启动指令是:
用top观察也有报错Killed:
服务器监控指标:
chitu运行期间的top:
The text was updated successfully, but these errors were encountered: