mirror of
https://github.com/modelscope/modelscope.git
synced 2025-12-20 10:09:23 +01:00
1.Add getting labels from dataset in "text_classificationfinetune_text_classification.py" to simplify user's operation in flex training. Parameters "--num_labels" and "--labels" were removed in "run_train.sh". 2.In "chatglm6b / finetune.py", building dataset from file is necessary to support flex training. Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/13382745 * support getting labels from dataset in sbert text classification and building dataset from file in chatglm-6b * support getting labels from dataset in sbert text classification and building dataset from file in chatglm-6b * remove repetitive labels in a concise manner of using set * reserve parameter labels in finetune_text_classification * Merge branch 'master' of http://gitlab.alibaba-inc.com/Ali-MaaS/MaaS-lib reserve parameter labels in finetune_text_classification * Merge branch 'support_text_cls_labels_chatglm_json' reserve parameter labels in finetune_text_classification
24 lines
784 B
Bash
24 lines
784 B
Bash
PYTHONPATH=. python examples/pytorch/text_classification/finetune_text_classification.py \
|
|
--task 'text-classification' \
|
|
--model 'damo/nlp_structbert_backbone_base_std' \
|
|
--train_dataset_name 'clue' \
|
|
--val_dataset_name 'clue' \
|
|
--train_subset_name 'tnews' \
|
|
--val_subset_name 'tnews' \
|
|
--train_split 'train' \
|
|
--val_split 'validation' \
|
|
--first_sequence 'sentence' \
|
|
--label label \
|
|
--preprocessor 'sen-cls-tokenizer' \
|
|
--use_model_config True \
|
|
--max_epochs 1 \
|
|
--per_device_train_batch_size 16 \
|
|
--per_device_eval_batch_size 16 \
|
|
--eval_interval 100 \
|
|
--eval_strategy by_step \
|
|
--work_dir './tmp' \
|
|
--train_data_worker 0 \
|
|
--eval_data_worker 0 \
|
|
--lr 1e-5 \
|
|
--eval_metrics 'seq-cls-metric' \
|