examples/pytorch/text_generation/run_train_gpt3.sh

DATA_PARALLEL_SIZE=2
TENSOR_MODEL_PARALLEL_SIZE=2

WORLD_SIZE=$(($DATA_PARALLEL_SIZE * $TENSOR_MODEL_PARALLEL_SIZE))


PYTHONPATH=. torchrun --nproc_per_node $WORLD_SIZE examples/pytorch/text_generation/finetune_text_generation.py \
    --trainer 'nlp-gpt3-trainer' \
    --work_dir './tmp' \
    --model 'damo/nlp_gpt3_text-generation_1.3B' \
    --train_dataset_name 'chinese-poetry-collection' \
    --val_dataset_name 'chinese-poetry-collection' \
    --train_split 'train' \
    --val_split 'test' \
    --preprocessor 'text-gen-jieba-tokenizer' \
    --src_txt 'text1' \
    --tgt_txt 'text2' \
    --max_epochs 3 \
    --per_device_train_batch_size 16 \
    --lr 3e-4 \
    --lr_scheduler 'noam' \
    --eval_metrics 'ppl' \
    --world_size $WORLD_SIZE \
    --tensor_model_parallel_size $TENSOR_MODEL_PARALLEL_SIZE \
    --use_megatron true \
    --use_model_config true \
    # --train_dataset_name 'DuReader_robust-QG' \ # input&output
add finetune_text_generation 1. Add TrainingArgs and cli call form for text generation task 2. Fix dp+tp finetune bug Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11929345 2023-03-10 15:08:56 +08:00			`DATA_PARALLEL_SIZE=2`
			`TENSOR_MODEL_PARALLEL_SIZE=2`

			`WORLD_SIZE=$(($DATA_PARALLEL_SIZE * $TENSOR_MODEL_PARALLEL_SIZE))`


			`PYTHONPATH=. torchrun --nproc_per_node $WORLD_SIZE examples/pytorch/text_generation/finetune_text_generation.py \`
			`--trainer 'nlp-gpt3-trainer' \`
			`--work_dir './tmp' \`
			`--model 'damo/nlp_gpt3_text-generation_1.3B' \`
Support FlexTrain and update the structure of trainer 1. Refactor training_args 2. Refactor hooks 3. Add train_id for push_to_hub 4. Support both output_dir/output_sub_dir for checkpoint_hooks 5. Support copy when hardlink fails when checkpointing 6. Support mixed dataset config file as a CLI argument 7. Add eval txt in output folder Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12384253 * support the ignorance of file pattern 2023-05-13 12:12:04 +08:00			`--train_dataset_name 'chinese-poetry-collection' \`
Re commit lossing files for flextrain Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12674746 2023-05-19 10:48:33 +08:00			`--val_dataset_name 'chinese-poetry-collection' \`
			`--train_split 'train' \`
			`--val_split 'test' \`
add finetune_text_generation 1. Add TrainingArgs and cli call form for text generation task 2. Fix dp+tp finetune bug Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11929345 2023-03-10 15:08:56 +08:00			`--preprocessor 'text-gen-jieba-tokenizer' \`
			`--src_txt 'text1' \`
			`--tgt_txt 'text2' \`
			`--max_epochs 3 \`
			`--per_device_train_batch_size 16 \`
			`--lr 3e-4 \`
			`--lr_scheduler 'noam' \`
			`--eval_metrics 'ppl' \`
			`--world_size $WORLD_SIZE \`
			`--tensor_model_parallel_size $TENSOR_MODEL_PARALLEL_SIZE \`
Add palm example Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12259265 2023-04-10 14:55:44 +08:00			`--use_megatron true \`
Support FlexTrain and update the structure of trainer 1. Refactor training_args 2. Refactor hooks 3. Add train_id for push_to_hub 4. Support both output_dir/output_sub_dir for checkpoint_hooks 5. Support copy when hardlink fails when checkpointing 6. Support mixed dataset config file as a CLI argument 7. Add eval txt in output folder Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12384253 * support the ignorance of file pattern 2023-05-13 12:12:04 +08:00			`--use_model_config true \`
			`# --train_dataset_name 'DuReader_robust-QG' \ # input&output`