250 Commits

Author SHA1 Message Date
suluyan.sly
1c6f5fe775 Merge branch 'master-github' into master-merge-github-230727
Conflicts:
       examples/pytorch/baichuan/finetune_baichuan.py
       examples/pytorch/chatglm6b/finetune.py
2023-07-27 17:29:27 +08:00
Wang Qiang
66cf72a75c Merge pull request #376 from XDUWQ/custom_diffusion
Custom method for finetuning stable diffusion
2023-07-27 10:41:38 +08:00
zeyinzi.jzyz
672c4899e9 add sd swift tuner
SD-Tuner base on Swift (LoRA/Adapter/Prompt)
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/13380798

* sd swift tuner

* fix pre-checker
2023-07-25 19:00:49 +08:00
XDUWQ
8e00d85317 fix bugs 2023-07-24 19:46:22 +08:00
XDUWQ
66795aa3ff change tests level 2023-07-19 09:41:21 +08:00
Wang Qiang
0b85979f2e Update diffusers version to 0.18.0 (#377)
* update diffusers to 0.18.0

* fix bugs
2023-07-14 19:02:52 +08:00
XDUWQ
34ab717393 custom_diffusion 2023-07-12 19:47:32 +08:00
XDUWQ
1caa45422c custom diffusion 2023-07-11 20:46:32 +08:00
wenmeng.zwm
0271b9c256 Merge branch 'master-github' into merge_master_github_0628 2023-06-28 20:27:34 +08:00
Wang Qiang
a018cd6107 Dreambooth method for finetuning stable diffusions (#339)
* Copyright

* dreambooth

* dreambooth test trainer

* fix bugs

* pre-commit

---------

Co-authored-by: 翊靖 <yijing.wq@alibaba-inc.com>
2023-06-28 20:10:28 +08:00
yuze.zyz
8f18274f75 Add teardown for tests
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12643554

* add teardown for tests

* add teardown for dialog_modeling_trainer,document_grounded_dialog_generate_trainer,document_grounded_dialog_rerank_trainer,document_grounded_dialog_retrieval_trainer,training_args,translation_evaluation_trainer,translation_trainer
2023-06-28 09:44:44 +08:00
yuze.zyz
a58be34384 Add Lora/Adapter/Prompt and support for chatglm6B and chatglm2-6B
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12770413

* add prompt and lora

* add adapter

* add prefix

* add tests

* adapter smoke test passed

* prompt test passed

* support model id in petl

* migrate chatglm6b

* add train script for chatglm6b

* move gen_kwargs to finetune.py

* add chatglm2

* add model definination
2023-06-27 14:38:18 +08:00
xingjun.wxj
0db0ec5586 Merge code from github
1. Merge(add) daily regression from github PR (daily_regression.yaml)
2. Add lora stable diffusion from github PR
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/13010802
* fix: device arg not work, rename device to ngpu (#272)

* Correcting the lora stable diffusion example script (#300)

* add vad model and punc model in README.md 

add vad model and punc model

* Merge pull request #302 from modelscope/langgz-patch-1

add vad model and punc model in README.md

* add 1.6

* modify ignore

* Merge pull request #307 from modelscope/dev_rs_16

Merge release 1.6

* undo datetime to 2099

* Merge pull request #311 from modelscope/fix_master_version

undo datetime to 2099

* add daily regression workflow

* modify workflow name

* fix cron format issue

* lora trainer

* Merge pull request #315 from liuyhwangyh/add_regression_workflow

add daily regression workflow
2023-06-21 10:22:06 +08:00
yuze.zyz
003c33648a Fix CI: test merge dataset failed
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12771691
* fix CI

* add teardown
2023-05-29 10:32:33 +08:00
yuze.zyz
febc0365de Support FlexTrain and update the structure of trainer
1. Refactor training_args
2. Refactor hooks
3. Add train_id for push_to_hub
4. Support both output_dir/output_sub_dir for checkpoint_hooks
5. Support copy when hardlink fails when checkpointing
6. Support mixed dataset config file as a CLI argument
7. Add eval txt in output folder
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12384253
* support the ignorance of file pattern
2023-05-13 12:12:04 +08:00
wanyu.wy
58df448182 Add trainer for UniTE 2023-05-11 14:41:08 +08:00
jiangnana.jnn
46072898da remove easycv codes, plugin access
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11965727

* remove easycv codes

* fix custome msdatasets import and remove metainfo

* fix pipeline imports

* fix pre-check

* fix models import

* fix pre-check

* merge master
2023-05-09 17:58:01 +08:00
bin.xue
dd4299e0c8 [to #42322933] add DFSMN model to far field KWS
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12334627
* feat: set stream mode in constructor

* fix: commit model file

* doc: add more information about DFSMN

* fix: seperation model evaluate returns None in speechbrain whose version < 0.5.12
2023-04-20 22:45:07 +08:00
hemu
672f25266f Merge branch 'master-github' into master-merge-github-0413 2023-04-13 10:32:01 +08:00
slin000111
92d7eae5b9 add token classification example and gpt3 one layer test (#268) 2023-04-13 10:21:00 +08:00
hemu
305006e18a Merge branch 'master-github' into master-merge-github-0412 2023-04-12 15:51:20 +08:00
chaojie.mcj
283517de08 add efficient tunner modules 2023-04-11 22:26:13 +08:00
ly261666
5f6cdf6814 [to #42322933] add damofd model
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12261061
2023-04-11 11:08:59 +08:00
tastelikefeet
6dd94ff2bc add first case for gpt3 test (#236) 2023-04-09 21:46:48 +08:00
kangxiaoyang.kxy
d2fde9248c update image colorization metric
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12073952

* update image colorization metric

* update documentation strings
2023-03-28 15:14:10 +08:00
xingjun.wxj
4e77f654f5 backward compatible with to_task_dataset function in adaseq repo
1. backward compatible with to_task_dataset function for DefaultTrainer in adaseq repo
2. fix registry issue for RedsImageDeblurringDataset and GoproImageDeblurringDataset
3. add ut TestCustomDatasetsCompatibility
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11981956
2023-03-14 18:34:44 +08:00
lllcho.lc
38bcd54ee4 fix bug for install detecron2
从github安装库有时会因为网络问题导致失败,故改成gitee
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11945563
2023-03-10 15:14:21 +08:00
kangxiaoyang.kxy
681afac803 add finetune for ddcolor image colorization
图像上色模型添加finetune支持,本地测试已通过:python tests/test_image_colorization_trainer.py

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11874339
2023-03-10 14:00:48 +08:00
xingjun.wxj
e02a260c93 Refactor the task_datasets module
Refactor the task_datasets module:

1. Add new module modelscope.msdatasets.dataset_cls.custom_datasets.
2. Add new function: modelscope.msdatasets.ms_dataset.MsDataset.to_custom_dataset().
2. Add calling to_custom_dataset() func in MsDataset.load() to adapt new custom_datasets module.
3. Refactor the pipeline for loading custom dataset: 
	1) Only use MsDataset.load() function to load the custom datasets.
	2) Combine MsDataset.load() with class EpochBasedTrainer.
4. Add new entry func for building datasets in EpochBasedTrainer: see modelscope.trainers.trainer.EpochBasedTrainer.build_dataset()
5. Add new func to build the custom dataset from model configuration, see: modelscope.trainers.trainer.EpochBasedTrainer.build_dataset_from_cfg()
6. Add new registry function for building custom datasets, see: modelscope.msdatasets.dataset_cls.custom_datasets.builder.build_custom_dataset()
7. Refine the class SiameseUIETrainer to adapt the new custom_datasets module.
8. Add class TorchCustomDataset as a superclass for custom datasets classes.
9. To move modules/classes/functions:
	1) Move module msdatasets.audio to custom_datasets
	2) Move module msdatasets.cv to custom_datasets
	3) Move module bad_image_detecting to custom_datasets
	4) Move module damoyolo to custom_datasets
	5) Move module face_2d_keypoints to custom_datasets
	6) Move module hand_2d_keypoints to custom_datasets
	7) Move module human_wholebody_keypoint to custom_datasets
	8) Move module image_classification to custom_datasets
	9) Move module image_inpainting to custom_datasets
	10) Move module image_portrait_enhancement to custom_datasets
	11) Move module image_quality_assessment_degradation to custom_datasets
	12) Move module image_quality_assmessment_mos to custom_datasets
	13) Move class LanguageGuidedVideoSummarizationDataset to custom_datasets
	14) Move class MGeoRankingDataset to custom_datasets
	15) Move module movie_scene_segmentation custom_datasets
	16) Move module object_detection to custom_datasets
	17) Move module referring_video_object_segmentation to custom_datasets
	18) Move module sidd_image_denoising to custom_datasets
	19) Move module video_frame_interpolation to custom_datasets
	20) Move module video_stabilization to custom_datasets
	21) Move module video_super_resolution to custom_datasets
	22) Move class GoproImageDeblurringDataset to custom_datasets
	23) Move class EasyCVBaseDataset to custom_datasets
	24) Move class ImageInstanceSegmentationCocoDataset to custom_datasets
	25) Move class RedsImageDeblurringDataset to custom_datasets
	26) Move class TextRankingDataset to custom_datasets
	27) Move class VecoDataset to custom_datasets
	28) Move class VideoSummarizationDataset to custom_datasets
10. To delete modules/functions/classes:
	1) Del module task_datasets
	2) Del to_task_dataset() in EpochBasedTrainer
	3) Del build_dataset() in EpochBasedTrainer and renew a same name function.
11. Rename class Datasets to CustomDatasets in metainfo.py

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11872747
2023-03-10 09:03:32 +08:00
hemu.zp
2b1af959d5 Convert cfg during training
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11900238
2023-03-09 22:27:44 +08:00
yuze.zyz
a0bc5549a1 trainer support parallel_groups
Design doc: https://yuque.alibaba-inc.com/suluyan.sly/yh1rvu/yx0owblyebpa2b3l?singleDoc#flU3s

1. Add parallel_group field in trainer to support DP, TP, PP.
2. Move the construction of common hooks(except optimizer/lrscheduler hook) to trainer's init method to support after_init stage.
	after_init is to support DP, TP, PP's initializing
         https://aone.alibaba-inc.com/v2/workitem#viewIdentifier=1c46ee8637e0c978f115b6f7&openWorkitemIdentifier=48099986
3. Add before_eval/after_eval stage to support model wrapping.
	to solve the order problem of apex amp & ddp wrapping.
         https://aone.alibaba-inc.com/v2/workitem#viewIdentifier=1c46ee8637e0c978f115b6f7&openWorkitemIdentifier=48099986
4. Exporter supports lazy importing.
	https://aone.alibaba-inc.com/v2/workitem#viewIdentifier=1c46ee8637e0c978f115b6f7&openWorkitemIdentifier=48122780
5. Fold all megatron imports to megatron hook.
         https://aone.alibaba-inc.com/v2/workitem#viewIdentifier=1c46ee8637e0c978f115b6f7&openWorkitemIdentifier=48099986
6. Add compile method to TorchModel ,Pipeline,Trainer to support torch2.0
	https://aone.alibaba-inc.com/v2/workitem#viewIdentifier=1c46ee8637e0c978f115b6f7&openWorkitemIdentifier=46869415
7. Fix bug: Lrscheduler builder does not support torch2.0
8. Add callbacks for trainer
	https://aone.alibaba-inc.com/v2/workitem#viewIdentifier=1c46ee8637e0c978f115b6f7&openWorkitemIdentifier=48210342
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11849932
2023-03-09 21:33:35 +08:00
yuanzhi.zyz
2ce41d4974 add ocr_reco_crnntrain
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11824395
2023-03-09 21:14:09 +08:00
bin.xue
281d3c630e [to #42322933] feat: kws support continue training from a checkpoint
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11940446

* feat: kws support continue training from a checkpoint

* log: add loading model log
2023-03-09 21:07:34 +08:00
lllcho.lc
3e70ecfb37 add action detection finetune
添加action-detection模型finetune支持

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11873651
2023-03-09 16:54:19 +08:00
xixing.tj
46799325c8 add ocr_detection_db training module
新增ocr_detection dbnet训练代码

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11892455
2023-03-09 15:11:27 +08:00
zeyinzi.jzyz
bf3a2b6c09 support vision efficient tuning finetune
## 查看改动点 ↓↓↓
### vision efficient tuning finetune
- Model模块改造成适配训练的
- Model模块在支持训练同时向下兼容之前发布的modecard
- Pipline兼容modelcard加载的preprocessor或直接定义的
- 添加 ImageClassificationPreprocessor (非mmcv版本)
- 添加 VisionEfficientTuningTrainer
- ~~添加 opencv_transforms==0.0.6~~ (以源代码引入必要)

### Modelcard
- test pipeline和trainer合并到一起
- 新增3个模型的test
- 新增demo service

### 公共组件
- ms_dataset.py: fix warning, [UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or xxx]
- preprocessor添加common:ToNumpy、Rename、Identity
- preprocessor common对于dict进行key判断再取值。
- ~~修复learning rate在iter级别变化的逻辑。~~ (本次不做了)
- ~~修复非dist状态下train data没有进行shuffle的bug。~~ (Master已有人改了)
- 修复训练时调用util中非cv包的异常 zhconv。

### 其他
- 为防止新引入的preprocessor模块在config中被原代码加载,导致在其他人做CI时会报错;所以暂时没有添加新的tag,等CR完成后,会进行打tag再rerun CI。
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11762108

* support vision efficient tuning finetune

* update test case

* update shuffle on IterableDataset

* update bitfit & sidetuning

* compatible with base trainer
2023-03-08 16:42:23 +08:00
xingjun.wxj
8298a3c31d add force redownload for nerf dataset in ut
add force redownload for nerf dataset in ut

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11915271
2023-03-08 16:23:58 +08:00
fubang.zfb
dfd1058d45 siamese uie finetune support
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11706625
2023-03-02 20:14:31 +08:00
ryan.yy
4197f72124 nerf重建加速模型 加入trainer训练模块
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11794296
2023-03-01 12:05:57 +08:00
hemu.zp
ceeb85f10f [to #42322933] Support multi-machine data and tensor parallel finetuning
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11682479
2023-02-28 18:47:17 +08:00
myf272609
51cf0b16a8 [to #42322933] add fintune support for cartoon task
人像卡通化模型增加训练支持

 Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11675597

* add fintune support for cartoon
2023-02-28 17:01:34 +08:00
yuze.zyz
7181e667f6 Refactor hooks
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11651547
2023-02-28 13:51:01 +08:00
lee.lcy
a0877216a4 fix(damoyolo): fix FileNotFoundError when using trainer.evaluate() && add work_dir and exp_name to kwargs
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11793714

    * fix(damoyolo): fix FileNotFoundError when using trainer.evaluate() && add work_dir and exp_name to kwargs

* style(damoyolo): add code annotation to ImageDetectionDamoyoloTrainer
2023-02-27 19:26:12 +08:00
yuze.zyz
90af43f749 [to #47563396]Fix bug: two ckpt hooks save in the same dir
1. Support two checkpoint hooks saving final checkpoints in two difference folders
2. Remove the check of checkpoint hooks
3. Fix a incorrect modification in UT
4. Fix bug: Checkpoint.load_checkpoint has been moved out
5. Add UT for new style configuration
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11630170
2023-02-13 16:54:02 +00:00
fuhaomin.fhm
2fced1c06f [to #42322933] Doc2Bot documentation with retrieval rerank, generation 2023-02-12 03:09:33 +00:00
shimin.ysm
9b0e302a66 refine cv_image_defrcn trainer to avoid failed
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11622570
2023-02-10 07:10:59 +00:00
yuze.zyz
ca1321f53f Support trainer prediction and fix some bugs
1. Support trainer prediction
2. Fix bug in text classification metric
3. Move load checkpoint out of checkpointhook
4. Fix bug in train progressing (inner_iter variable not correct)

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11560269
2023-02-10 06:19:37 +00:00
zhangyanzhao.zyz
e6c05a2931 sentence-embedding support finetune
sentence-embedding模型支持finetune

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11537009
2023-02-10 06:07:38 +00:00
hemu.zp
82482b3e96 update training args
Based on feat/0131/nlp_args branch, the original code review: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11408570

Support for running finetuning from the command line with training args, Compatible with the configuration optimization.
2023-02-10 05:32:21 +00:00
yuze.zyz
4dca4773db Support csanmt exporting and refactor some code
1. Support csanmt exporting to savedmodel format
2. Create a new base class for text-ranking preprocessors, and move some parameters of mgeo_ranking_preprocessor to init method
3. Avoid Model & Preprocessor classes coupled with pytorch
4. Regression test supports comparing only model output
5. Support zero-shot exporting to onnx and torchscript

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11522461
2023-02-10 05:15:04 +00:00