-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
单机多卡预训练BUG #7604
Labels
solved
This problem has been already solved
Comments
遇到一样的问题,不开启unsloth模式能正常LORA微调,开启后会报错。 Please restructure your imports with 'import unsloth' at the top of your file.
from unsloth import FastLanguageModel # type: ignore
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
/root/Smile_L/LLaMA-Factory/src/llamafactory/model/model_utils/unsloth.py:51: UserWarning: WARNING: Unsloth should be imported before trl, transformers, peft to ensure all optimizations are applied. Your code may run slower or encounter memory issues without these optimizations.
Please restructure your imports with 'import unsloth' at the top of your file.
from unsloth import FastLanguageModel # type: ignore
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
🦥 Unsloth Zoo will now patch everything to make training faster!
[rank0]: Traceback (most recent call last):
[rank0]: File "/root/Smile_L/LLaMA-Factory/src/llamafactory/launcher.py", line 23, in <module>
[rank0]: launch()
[rank0]: File "/root/Smile_L/LLaMA-Factory/src/llamafactory/launcher.py", line 19, in launch
[rank0]: run_exp()
[rank0]: File "/root/Smile_L/LLaMA-Factory/src/llamafactory/train/tuner.py", line 107, in run_exp
[rank0]: _training_function(config={"args": args, "callbacks": callbacks})
[rank0]: File "/root/Smile_L/LLaMA-Factory/src/llamafactory/train/tuner.py", line 69, in _training_function
[rank0]: run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
[rank0]: File "/root/Smile_L/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 52, in run_sft
[rank0]: model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train)
[rank0]: File "/root/Smile_L/LLaMA-Factory/src/llamafactory/model/loader.py", line 136, in load_model
[rank0]: model = load_unsloth_pretrained_model(config, model_args)
[rank0]: File "/root/Smile_L/LLaMA-Factory/src/llamafactory/model/model_utils/unsloth.py", line 51, in load_unsloth_pretrained_model
[rank0]: from unsloth import FastLanguageModel # type: ignore
[rank0]: File "/root/anaconda3/envs/llama_factory/lib/python3.10/site-packages/unsloth/__init__.py", line 219, in <module>
[rank0]: from .models import *
[rank0]: File "/root/anaconda3/envs/llama_factory/lib/python3.10/site-packages/unsloth/models/__init__.py", line 15, in <module>
[rank0]: from .llama import FastLlamaModel
[rank0]: File "/root/anaconda3/envs/llama_factory/lib/python3.10/site-packages/unsloth/models/llama.py", line 2748, in <module>
[rank0]: PatchFastRL(FastLanguageModel = FastLlamaModel)
[rank0]: File "/root/anaconda3/envs/llama_factory/lib/python3.10/site-packages/unsloth/models/rl.py", line 742, in PatchFastRL
[rank0]: patch_trl_rl_trainers()
[rank0]: File "/root/anaconda3/envs/llama_factory/lib/python3.10/site-packages/unsloth/models/rl.py", line 735, in patch_trl_rl_trainers
[rank0]: _patch_trl_rl_trainers(trainer)
[rank0]: File "/root/anaconda3/envs/llama_factory/lib/python3.10/site-packages/unsloth/models/rl.py", line 555, in _patch_trl_rl_trainers
[rank0]: created_module = create_new_function(
[rank0]: File "/root/anaconda3/envs/llama_factory/lib/python3.10/site-packages/unsloth_zoo/compiler.py", line 329, in create_new_function
[rank0]: compile_folder, UNSLOTH_COMPILE_USE_TEMP = get_compile_folder(use_tempfile = False)
[rank0]: File "/root/anaconda3/envs/llama_factory/lib/python3.10/site-packages/unsloth_zoo/compiler.py", line 265, in get_compile_folder
[rank0]: location, UNSLOTH_COMPILE_USE_TEMP = distributed_function(2, _get_compile_folder, use_tempfile)
[rank0]: File "/root/anaconda3/envs/llama_factory/lib/python3.10/site-packages/unsloth_zoo/utils.py", line 82, in distributed_function
[rank0]: torch.distributed.broadcast_object_list(object_list, src = 0, device = "cpu")
[rank0]: File "/root/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 81, in wrapper
[rank0]: return func(*args, **kwargs)
[rank0]: File "/root/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 3479, in broadcast_object_list
[rank0]: broadcast(object_sizes_tensor, src=global_src, group=group)
[rank0]: File "/root/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 81, in wrapper
[rank0]: return func(*args, **kwargs)
[rank0]: File "/root/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 2726, in broadcast
[rank0]: work = group.broadcast([tensor], opts)
[rank0]: RuntimeError: No backend type associated with device type cpu
[rank1]: Traceback (most recent call last):
[rank1]: File "/root/Smile_L/LLaMA-Factory/src/llamafactory/launcher.py", line 23, in <module>
[rank1]: launch()
[rank1]: File "/root/Smile_L/LLaMA-Factory/src/llamafactory/launcher.py", line 19, in launch
[rank1]: run_exp()
[rank1]: File "/root/Smile_L/LLaMA-Factory/src/llamafactory/train/tuner.py", line 107, in run_exp
[rank1]: _training_function(config={"args": args, "callbacks": callbacks})
[rank1]: File "/root/Smile_L/LLaMA-Factory/src/llamafactory/train/tuner.py", line 69, in _training_function
[rank1]: run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
[rank1]: File "/root/Smile_L/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 52, in run_sft
[rank1]: model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train)
[rank1]: File "/root/Smile_L/LLaMA-Factory/src/llamafactory/model/loader.py", line 136, in load_model
[rank1]: model = load_unsloth_pretrained_model(config, model_args)
[rank1]: File "/root/Smile_L/LLaMA-Factory/src/llamafactory/model/model_utils/unsloth.py", line 51, in load_unsloth_pretrained_model
[rank1]: from unsloth import FastLanguageModel # type: ignore
[rank1]: File "/root/anaconda3/envs/llama_factory/lib/python3.10/site-packages/unsloth/__init__.py", line 219, in <module>
[rank1]: from .models import *
[rank1]: File "/root/anaconda3/envs/llama_factory/lib/python3.10/site-packages/unsloth/models/__init__.py", line 15, in <module>
[rank1]: from .llama import FastLlamaModel
[rank1]: File "/root/anaconda3/envs/llama_factory/lib/python3.10/site-packages/unsloth/models/llama.py", line 2748, in <module>
[rank1]: PatchFastRL(FastLanguageModel = FastLlamaModel)
[rank1]: File "/root/anaconda3/envs/llama_factory/lib/python3.10/site-packages/unsloth/models/rl.py", line 742, in PatchFastRL
[rank1]: patch_trl_rl_trainers()
[rank1]: File "/root/anaconda3/envs/llama_factory/lib/python3.10/site-packages/unsloth/models/rl.py", line 735, in patch_trl_rl_trainers
[rank1]: _patch_trl_rl_trainers(trainer)
[rank1]: File "/root/anaconda3/envs/llama_factory/lib/python3.10/site-packages/unsloth/models/rl.py", line 555, in _patch_trl_rl_trainers
[rank1]: created_module = create_new_function(
[rank1]: File "/root/anaconda3/envs/llama_factory/lib/python3.10/site-packages/unsloth_zoo/compiler.py", line 329, in create_new_function
[rank1]: compile_folder, UNSLOTH_COMPILE_USE_TEMP = get_compile_folder(use_tempfile = False)
[rank1]: File "/root/anaconda3/envs/llama_factory/lib/python3.10/site-packages/unsloth_zoo/compiler.py", line 265, in get_compile_folder
[rank1]: location, UNSLOTH_COMPILE_USE_TEMP = distributed_function(2, _get_compile_folder, use_tempfile)
[rank1]: File "/root/anaconda3/envs/llama_factory/lib/python3.10/site-packages/unsloth_zoo/utils.py", line 82, in distributed_function
[rank1]: torch.distributed.broadcast_object_list(object_list, src = 0, device = "cpu")
[rank1]: File "/root/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 81, in wrapper
[rank1]: return func(*args, **kwargs)
[rank1]: File "/root/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 3479, in broadcast_object_list
[rank1]: broadcast(object_sizes_tensor, src=global_src, group=group)
[rank1]: File "/root/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 81, in wrapper
[rank1]: return func(*args, **kwargs)
[rank1]: File "/root/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 2726, in broadcast
[rank1]: work = group.broadcast([tensor], opts)
[rank1]: RuntimeError: No backend type associated with device type cpu
[rank0]:[W407 03:46:19.275497536 ProcessGroupNCCL.cpp:1496] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
W0407 03:46:21.143000 384969 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 385068 closing signal SIGTERM
E0407 03:46:21.207000 384969 site-packages/torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 0 (pid: 385067) of binary: /root/anaconda3/envs/llama_factory/bin/python
Traceback (most recent call last):
File "/root/anaconda3/envs/llama_factory/bin/torchrun", line 8, in <module>
sys.exit(main())
File "/root/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
return f(*args, **kwargs)
File "/root/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/distributed/run.py", line 918, in main
run(args)
File "/root/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/distributed/run.py", line 909, in run
elastic_launch(
File "/root/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 138, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/root/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
/root/Smile_L/LLaMA-Factory/src/llamafactory/launcher.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------ |
关闭 unsloth |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Reminder
System Info
开启use_unsloth后[rank2]: RuntimeError: No backend type associated with device type cpu
前序疑问:optim: paged_adamw_8bit参数配置好像并未生效,(按理说8*H100应该在7B最长时不会导致显存炸掉,但是目前好像是会有点炸,试图切换use_unsloth,出现相关错误)
1. 使用fsdp
export NCCL_DEBUG=INFO
export USE_MODELSCOPE_HUB=1
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch
--config_file examples/accelerate/fsdp_config.yaml
src/train.py examples/qwen/qwen_cpt_full.yaml
2. 具体配置如下,
model_name_or_path: Qwen/Qwen2.5-7B-Instruct
trust_remote_code: true
packing: False
method
stage: pt
do_train: true
finetuning_type: full
flash_attn: fa2
use_unsloth: True
dataset
dataset: grout_en_1
template: qwen
cutoff_len: 32768
max_samples: 50000
overwrite_cache: true
preprocessing_num_workers: 16
output
output_dir: saves/QwQ-32B/pt
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
disable_gradient_checkpointing: False
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.01
bf16: true
ddp_timeout: 180000000
optim: paged_adamw_8bit
Reproduction
Others
No response
The text was updated successfully, but these errors were encountered: