lora微调后的glm4模型不生成回答 #4568

RyanCcc114 · 2024-06-25T02:12:58Z

RyanCcc114
Jun 25, 2024

Reminder

I have read the README and searched the existing issues.

System Info

pytorch:2.1.0-cuda11.8

Reproduction

bf16: true
cutoff_len: 1024
dataset: EE_instruction_message
dataset_dir: data
ddp_timeout: 180000000
do_train: true
eval_steps: 100
eval_strategy: steps
finetuning_type: lora
flash_attn: fa2
gradient_accumulation_steps: 8
include_num_input_tokens_seen: true
learning_rate: 0.0003
logging_steps: 5
lora_alpha: 32
lora_dropout: 0.1
lora_rank: 8
lora_target: all
lr_scheduler_type: cosine
max_grad_norm: 1.0
max_samples: 100000
model_name_or_path: src/llamafactory/model/model/glm4-chat
num_train_epochs: 2.0
optim: adamw_torch
output_dir: saves/GLM-4-9B-Chat/lora/train_2024-06-24-23-30-00
packing: false
per_device_eval_batch_size: 1
per_device_train_batch_size: 1
plot_loss: true
preprocessing_num_workers: 16
report_to: none
save_steps: 100
stage: sft
template: glm4
val_size: 0.2
warmup_steps: 0.01

Expected behavior

加载微调后的glm4模型不生成回答，无论输入什么内容，模型返回的内容都是空白。
训练日志中的loss是逐步下降，并且评估阶段的loss同样下降。但是执行predict脚本生成的结果都为0。

Others

No response

yrc696 · 2024-10-04T01:04:33Z

yrc696
Oct 4, 2024

overffiting了參數調整一下，讓學習的特徵不要那麼集中。

1 reply

yrc696 Oct 4, 2024

像是max gradient, lr, batchsize等，都條小一點看看，每個模型同樣參數的情況下，微調後效果差蠻多的。

ZhuTiangang · 2025-03-21T06:25:18Z

ZhuTiangang
Mar 21, 2025

请问这个问题有解决吗。
我在自建的垂域数据集上微调也遇到了类似问题，最终训练集loss 0.5、测试集loss 0.6左右，训练集上直接推理有20%为空，测试集有一半左右为空。

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lora微调后的glm4模型不生成回答 #4568

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

lora微调后的glm4模型不生成回答 #4568

RyanCcc114 Jun 25, 2024

Reminder

System Info

Reproduction

Expected behavior

Others

Replies: 2 comments · 1 reply

yrc696 Oct 4, 2024

yrc696 Oct 4, 2024

ZhuTiangang Mar 21, 2025

RyanCcc114
Jun 25, 2024

Replies: 2 comments 1 reply

yrc696
Oct 4, 2024

ZhuTiangang
Mar 21, 2025