Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DPO support for DeepSpeed-Chat #828

Merged
merged 11 commits into from
Jan 6, 2025
528 changes: 528 additions & 0 deletions applications/DeepSpeed-Chat/training/step2_dpo_finetuning/main.py

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
### 💁For each folder, the bash scripts are examples of "facebook/opt" family.

If you want to change your model such as EleutherAI/gpt-j-6b, you may simply replace
`` --model_name_or_path facebook/opt-350m`` to ``--model_name_or_path EleutherAI/gpt-neo-125m ``.

For the models we support, please see [our landing page](./../../../README.md#-supported-models-)
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#!/bin/bash
# Copyright (c) Microsoft Corporation.
# SPDX-License-Identifier: Apache-2.0

# DeepSpeed Team
OUTPUT=$1
ZERO_STAGE=$2
if [ "$OUTPUT" == "" ]; then
OUTPUT=./output_step2_llama_7b_epoch1_lr9.65e-6
fi
if [ "$ZERO_STAGE" == "" ]; then
ZERO_STAGE=3
fi
mkdir -p $OUTPUT

deepspeed main.py \
--data_path Dahoas/rm-static \
--data_split 2,4,4 \
--model_name_or_path meta-llama/Llama-2-7b-hf \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 8 \
--max_seq_len 512 \
--learning_rate 9.65e-6 \
--weight_decay 0.1 \
--num_train_epochs 1 \
--gradient_accumulation_steps 1 \
--lr_scheduler_type cosine \
--num_warmup_steps 0 \
--seed 1234 \
--gradient_checkpointing \
--zero_stage $ZERO_STAGE \
--deepspeed \
--offload \
--output_dir $OUTPUT \
&> $OUTPUT/training.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
#!/bin/bash
# Copyright (c) Microsoft Corporation.
# SPDX-License-Identifier: Apache-2.0

# DeepSpeed Team
OUTPUT=$1
ZERO_STAGE=$2
if [ "$OUTPUT" == "" ]; then
OUTPUT=./output_step2_llama_7b_epoch1_lr9.65e-6
fi
if [ "$ZERO_STAGE" == "" ]; then
ZERO_STAGE=3
fi
mkdir -p $OUTPUT

deepspeed main.py \
--data_path Dahoas/rm-static \
--data_split 2,4,4 \
--model_name_or_path meta-llama/Llama-2-7b-hf \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 8 \
--max_seq_len 512 \
--learning_rate 9.65e-6 \
--weight_decay 0.1 \
--num_train_epochs 1 \
--gradient_accumulation_steps 1 \
--lr_scheduler_type cosine \
--num_warmup_steps 0 \
--seed 1234 \
--gradient_checkpointing \
--zero_stage $ZERO_STAGE \
--deepspeed \
--offload \
--lora_dim 128 \
--lora_module_name "layers." \
--output_dir $OUTPUT \
&> $OUTPUT/training.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
#!/bin/bash
# Copyright (c) Microsoft Corporation.
# SPDX-License-Identifier: Apache-2.0

# DeepSpeed Team
OUTPUT=$1
ZERO_STAGE=$2
if [ "$OUTPUT" == "" ]; then
OUTPUT=./output
fi
if [ "$ZERO_STAGE" == "" ]; then
ZERO_STAGE=0
fi
mkdir -p $OUTPUT

deepspeed main.py \
--data_path Dahoas/rm-static Dahoas/full-hh-rlhf Dahoas/synthetic-instruct-gptj-pairwise yitingxie/rlhf-reward-datasets \
--data_split 2,4,4 \
--model_name_or_path facebook/opt-350m \
--per_device_train_batch_size 2 \
--per_device_eval_batch_size 2 \
--max_seq_len 512 \
--learning_rate 5e-5 \
--weight_decay 0.1 \
--dropout 0.0 \
--num_train_epochs 1 \
--gradient_accumulation_steps 1 \
--lr_scheduler_type cosine \
--num_warmup_steps 0 \
--seed 1234 \
--zero_stage $ZERO_STAGE \
--deepspeed \
--output_dir $OUTPUT \
&> $OUTPUT/training.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#!/bin/bash
# Copyright (c) Microsoft Corporation.
# SPDX-License-Identifier: Apache-2.0

# DeepSpeed Team
OUTPUT=$1
ZERO_STAGE=$2
if [ "$OUTPUT" == "" ]; then
OUTPUT=./output
fi
if [ "$ZERO_STAGE" == "" ]; then
ZERO_STAGE=0
fi
mkdir -p $OUTPUT

deepspeed --num_gpus 1 main.py --model_name_or_path facebook/opt-350m \
--weight_decay 0.1 --dropout 0.0 --gradient_accumulation_steps 4 --zero_stage $ZERO_STAGE \
--enable_tensorboard \
--tensorboard_path $OUTPUT \
--deepspeed --output_dir $OUTPUT &> $OUTPUT/training.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
#!/bin/bash
# Copyright (c) Microsoft Corporation.
# SPDX-License-Identifier: Apache-2.0

# DeepSpeed Team
OUTPUT=$1
ZERO_STAGE=$2
if [ "$OUTPUT" == "" ]; then
OUTPUT=./output
fi
if [ "$ZERO_STAGE" == "" ]; then
ZERO_STAGE=0
fi
mkdir -p $OUTPUT

deepspeed main.py \
--data_path Dahoas/rm-static Dahoas/full-hh-rlhf Dahoas/synthetic-instruct-gptj-pairwise yitingxie/rlhf-reward-datasets \
--data_split 2,4,4 \
--model_name_or_path facebook/opt-350m \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--max_seq_len 512 \
--learning_rate 5e-5 \
--weight_decay 0.1 \
--num_train_epochs 1 \
--dropout 0.0 \
--gradient_accumulation_steps 1 \
--lr_scheduler_type cosine \
--num_warmup_steps 0 \
--seed 1234 \
--zero_stage $ZERO_STAGE \
--deepspeed \
--output_dir $OUTPUT \
&> $OUTPUT/training.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# DeepSpeed Characterization Script

# Contents
* [Introduction](#introduction)
* [Usage](#usage)

# Introduction
The step 2 characterization script sweeps across various training parameters. Currently, the following parameters are swept:
<pre>
Zero Stage: 2, 3
Offload: True, False
</pre>

The `run_step2_sweep.sh` script passes configuration arguments to `run_single.sh`, which can be extended to sweep beyond the parameters listed above (e.g. learning rate, weight decay, etc).

# Usage
The sweep script can be run as follows:
<pre>
DeepSpeedExamples/applications/DeepSpeed-Chat/training/step2_reward_model_finetuning$ bash training_scripts/opt/single_node/sweep/run_step2_sweep.sh
</pre>
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
#!/bin/bash
# Copyright (c) Microsoft Corporation.
# SPDX-License-Identifier: Apache-2.0

# DeepSpeed Team
ZERO_STAGE=$1
OFFLOAD=$2
OUTPUT=$3
if [ "$ZERO_STAGE" == "" ]; then
ZERO_STAGE=0
fi
if [ "$OFFLOAD" == true ]; then
OFFLOAD="--offload"
else
OFFLOAD=""
fi
if [ "$OUTPUT" == "" ]; then
OUTPUT=./output
fi
mkdir -p $OUTPUT

cmd="deepspeed main.py \
--data_path Dahoas/rm-static Dahoas/full-hh-rlhf Dahoas/synthetic-instruct-gptj-pairwise yitingxie/rlhf-reward-datasets \
--data_split 2,4,4 \
--model_name_or_path facebook/opt-350m \
--num_padding_at_beginning 1 \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--max_seq_len 512 \
--learning_rate 5e-5 \
--weight_decay 0.1 \
--num_train_epochs 1 \
--dropout 0.0 \
--gradient_accumulation_steps 1 \
--lr_scheduler_type cosine \
--num_warmup_steps 0 \
--seed 1234 \
--zero_stage $ZERO_STAGE \
--deepspeed \
--output_dir $OUTPUT \
$OFFLOAD"

echo "----------------------------- DS COMMAND -----------------------------"
echo $cmd

$cmd &> $OUTPUT/${OUTPUT}.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#!/bin/bash
# Copyright (c) Microsoft Corporation.
# SPDX-License-Identifier: Apache-2.0

# DeepSpeed Team
for z in {2..3}
do
for offload in true false
do
cmd="bash training_scripts/opt/single_node/sweep/run_single.sh \
${z} \
${offload} \
z${z}_offload_${offload}"
echo "----------------------------- CALLING SHELL SCRIPT -----------------------------"
echo $cmd
$cmd
pkill -9 python
sleep 60
echo ""
done
done