Commit Graph

232 Commits

Author SHA1 Message Date
hemu
672f25266f Merge branch 'master-github' into master-merge-github-0413 2023-04-13 10:32:01 +08:00
slin000111
92d7eae5b9 add token classification example and gpt3 one layer test (#268) 2023-04-13 10:21:00 +08:00
hemu
305006e18a Merge branch 'master-github' into master-merge-github-0412 2023-04-12 15:51:20 +08:00
chaojie.mcj
283517de08 add efficient tunner modules 2023-04-11 22:26:13 +08:00
ly261666
5f6cdf6814 [to #42322933] add damofd model
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12261061
2023-04-11 11:08:59 +08:00
tastelikefeet
6dd94ff2bc add first case for gpt3 test (#236) 2023-04-09 21:46:48 +08:00
kangxiaoyang.kxy
d2fde9248c update image colorization metric
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12073952

* update image colorization metric

* update documentation strings
2023-03-28 15:14:10 +08:00
xingjun.wxj
4e77f654f5 backward compatible with to_task_dataset function in adaseq repo
1. backward compatible with to_task_dataset function for DefaultTrainer in adaseq repo
2. fix registry issue for RedsImageDeblurringDataset and GoproImageDeblurringDataset
3. add ut TestCustomDatasetsCompatibility
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11981956
2023-03-14 18:34:44 +08:00
lllcho.lc
38bcd54ee4 fix bug for install detecron2
从github安装库有时会因为网络问题导致失败,故改成gitee
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11945563
2023-03-10 15:14:21 +08:00
kangxiaoyang.kxy
681afac803 add finetune for ddcolor image colorization
图像上色模型添加finetune支持,本地测试已通过:python tests/test_image_colorization_trainer.py

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11874339
2023-03-10 14:00:48 +08:00
xingjun.wxj
e02a260c93 Refactor the task_datasets module
Refactor the task_datasets module:

1. Add new module modelscope.msdatasets.dataset_cls.custom_datasets.
2. Add new function: modelscope.msdatasets.ms_dataset.MsDataset.to_custom_dataset().
2. Add calling to_custom_dataset() func in MsDataset.load() to adapt new custom_datasets module.
3. Refactor the pipeline for loading custom dataset: 
	1) Only use MsDataset.load() function to load the custom datasets.
	2) Combine MsDataset.load() with class EpochBasedTrainer.
4. Add new entry func for building datasets in EpochBasedTrainer: see modelscope.trainers.trainer.EpochBasedTrainer.build_dataset()
5. Add new func to build the custom dataset from model configuration, see: modelscope.trainers.trainer.EpochBasedTrainer.build_dataset_from_cfg()
6. Add new registry function for building custom datasets, see: modelscope.msdatasets.dataset_cls.custom_datasets.builder.build_custom_dataset()
7. Refine the class SiameseUIETrainer to adapt the new custom_datasets module.
8. Add class TorchCustomDataset as a superclass for custom datasets classes.
9. To move modules/classes/functions:
	1) Move module msdatasets.audio to custom_datasets
	2) Move module msdatasets.cv to custom_datasets
	3) Move module bad_image_detecting to custom_datasets
	4) Move module damoyolo to custom_datasets
	5) Move module face_2d_keypoints to custom_datasets
	6) Move module hand_2d_keypoints to custom_datasets
	7) Move module human_wholebody_keypoint to custom_datasets
	8) Move module image_classification to custom_datasets
	9) Move module image_inpainting to custom_datasets
	10) Move module image_portrait_enhancement to custom_datasets
	11) Move module image_quality_assessment_degradation to custom_datasets
	12) Move module image_quality_assmessment_mos to custom_datasets
	13) Move class LanguageGuidedVideoSummarizationDataset to custom_datasets
	14) Move class MGeoRankingDataset to custom_datasets
	15) Move module movie_scene_segmentation custom_datasets
	16) Move module object_detection to custom_datasets
	17) Move module referring_video_object_segmentation to custom_datasets
	18) Move module sidd_image_denoising to custom_datasets
	19) Move module video_frame_interpolation to custom_datasets
	20) Move module video_stabilization to custom_datasets
	21) Move module video_super_resolution to custom_datasets
	22) Move class GoproImageDeblurringDataset to custom_datasets
	23) Move class EasyCVBaseDataset to custom_datasets
	24) Move class ImageInstanceSegmentationCocoDataset to custom_datasets
	25) Move class RedsImageDeblurringDataset to custom_datasets
	26) Move class TextRankingDataset to custom_datasets
	27) Move class VecoDataset to custom_datasets
	28) Move class VideoSummarizationDataset to custom_datasets
10. To delete modules/functions/classes:
	1) Del module task_datasets
	2) Del to_task_dataset() in EpochBasedTrainer
	3) Del build_dataset() in EpochBasedTrainer and renew a same name function.
11. Rename class Datasets to CustomDatasets in metainfo.py

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11872747
2023-03-10 09:03:32 +08:00
hemu.zp
2b1af959d5 Convert cfg during training
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11900238
2023-03-09 22:27:44 +08:00
yuze.zyz
a0bc5549a1 trainer support parallel_groups
Design doc: https://yuque.alibaba-inc.com/suluyan.sly/yh1rvu/yx0owblyebpa2b3l?singleDoc#flU3s

1. Add parallel_group field in trainer to support DP, TP, PP.
2. Move the construction of common hooks(except optimizer/lrscheduler hook) to trainer's init method to support after_init stage.
	after_init is to support DP, TP, PP's initializing
         https://aone.alibaba-inc.com/v2/workitem#viewIdentifier=1c46ee8637e0c978f115b6f7&openWorkitemIdentifier=48099986
3. Add before_eval/after_eval stage to support model wrapping.
	to solve the order problem of apex amp & ddp wrapping.
         https://aone.alibaba-inc.com/v2/workitem#viewIdentifier=1c46ee8637e0c978f115b6f7&openWorkitemIdentifier=48099986
4. Exporter supports lazy importing.
	https://aone.alibaba-inc.com/v2/workitem#viewIdentifier=1c46ee8637e0c978f115b6f7&openWorkitemIdentifier=48122780
5. Fold all megatron imports to megatron hook.
         https://aone.alibaba-inc.com/v2/workitem#viewIdentifier=1c46ee8637e0c978f115b6f7&openWorkitemIdentifier=48099986
6. Add compile method to TorchModel ,Pipeline,Trainer to support torch2.0
	https://aone.alibaba-inc.com/v2/workitem#viewIdentifier=1c46ee8637e0c978f115b6f7&openWorkitemIdentifier=46869415
7. Fix bug: Lrscheduler builder does not support torch2.0
8. Add callbacks for trainer
	https://aone.alibaba-inc.com/v2/workitem#viewIdentifier=1c46ee8637e0c978f115b6f7&openWorkitemIdentifier=48210342
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11849932
2023-03-09 21:33:35 +08:00
yuanzhi.zyz
2ce41d4974 add ocr_reco_crnntrain
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11824395
2023-03-09 21:14:09 +08:00
bin.xue
281d3c630e [to #42322933] feat: kws support continue training from a checkpoint
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11940446

* feat: kws support continue training from a checkpoint

* log: add loading model log
2023-03-09 21:07:34 +08:00
lllcho.lc
3e70ecfb37 add action detection finetune
添加action-detection模型finetune支持

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11873651
2023-03-09 16:54:19 +08:00
xixing.tj
46799325c8 add ocr_detection_db training module
新增ocr_detection dbnet训练代码

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11892455
2023-03-09 15:11:27 +08:00
zeyinzi.jzyz
bf3a2b6c09 support vision efficient tuning finetune
## 查看改动点 ↓↓↓
### vision efficient tuning finetune
- Model模块改造成适配训练的
- Model模块在支持训练同时向下兼容之前发布的modecard
- Pipline兼容modelcard加载的preprocessor或直接定义的
- 添加 ImageClassificationPreprocessor (非mmcv版本)
- 添加 VisionEfficientTuningTrainer
- ~~添加 opencv_transforms==0.0.6~~ (以源代码引入必要)

### Modelcard
- test pipeline和trainer合并到一起
- 新增3个模型的test
- 新增demo service

### 公共组件
- ms_dataset.py: fix warning, [UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or xxx]
- preprocessor添加common:ToNumpy、Rename、Identity
- preprocessor common对于dict进行key判断再取值。
- ~~修复learning rate在iter级别变化的逻辑。~~ (本次不做了)
- ~~修复非dist状态下train data没有进行shuffle的bug。~~ (Master已有人改了)
- 修复训练时调用util中非cv包的异常 zhconv。

### 其他
- 为防止新引入的preprocessor模块在config中被原代码加载,导致在其他人做CI时会报错;所以暂时没有添加新的tag,等CR完成后,会进行打tag再rerun CI。
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11762108

* support vision efficient tuning finetune

* update test case

* update shuffle on IterableDataset

* update bitfit & sidetuning

* compatible with base trainer
2023-03-08 16:42:23 +08:00
xingjun.wxj
8298a3c31d add force redownload for nerf dataset in ut
add force redownload for nerf dataset in ut

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11915271
2023-03-08 16:23:58 +08:00
fubang.zfb
dfd1058d45 siamese uie finetune support
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11706625
2023-03-02 20:14:31 +08:00
ryan.yy
4197f72124 nerf重建加速模型 加入trainer训练模块
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11794296
2023-03-01 12:05:57 +08:00
hemu.zp
ceeb85f10f [to #42322933] Support multi-machine data and tensor parallel finetuning
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11682479
2023-02-28 18:47:17 +08:00
myf272609
51cf0b16a8 [to #42322933] add fintune support for cartoon task
人像卡通化模型增加训练支持

 Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11675597

* add fintune support for cartoon
2023-02-28 17:01:34 +08:00
yuze.zyz
7181e667f6 Refactor hooks
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11651547
2023-02-28 13:51:01 +08:00
lee.lcy
a0877216a4 fix(damoyolo): fix FileNotFoundError when using trainer.evaluate() && add work_dir and exp_name to kwargs
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11793714

    * fix(damoyolo): fix FileNotFoundError when using trainer.evaluate() && add work_dir and exp_name to kwargs

* style(damoyolo): add code annotation to ImageDetectionDamoyoloTrainer
2023-02-27 19:26:12 +08:00
yuze.zyz
90af43f749 [to #47563396]Fix bug: two ckpt hooks save in the same dir
1. Support two checkpoint hooks saving final checkpoints in two difference folders
2. Remove the check of checkpoint hooks
3. Fix a incorrect modification in UT
4. Fix bug: Checkpoint.load_checkpoint has been moved out
5. Add UT for new style configuration
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11630170
2023-02-13 16:54:02 +00:00
fuhaomin.fhm
2fced1c06f [to #42322933] Doc2Bot documentation with retrieval rerank, generation 2023-02-12 03:09:33 +00:00
shimin.ysm
9b0e302a66 refine cv_image_defrcn trainer to avoid failed
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11622570
2023-02-10 07:10:59 +00:00
yuze.zyz
ca1321f53f Support trainer prediction and fix some bugs
1. Support trainer prediction
2. Fix bug in text classification metric
3. Move load checkpoint out of checkpointhook
4. Fix bug in train progressing (inner_iter variable not correct)

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11560269
2023-02-10 06:19:37 +00:00
zhangyanzhao.zyz
e6c05a2931 sentence-embedding support finetune
sentence-embedding模型支持finetune

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11537009
2023-02-10 06:07:38 +00:00
hemu.zp
82482b3e96 update training args
Based on feat/0131/nlp_args branch, the original code review: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11408570

Support for running finetuning from the command line with training args, Compatible with the configuration optimization.
2023-02-10 05:32:21 +00:00
yuze.zyz
4dca4773db Support csanmt exporting and refactor some code
1. Support csanmt exporting to savedmodel format
2. Create a new base class for text-ranking preprocessors, and move some parameters of mgeo_ranking_preprocessor to init method
3. Avoid Model & Preprocessor classes coupled with pytorch
4. Regression test supports comparing only model output
5. Support zero-shot exporting to onnx and torchscript

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11522461
2023-02-10 05:15:04 +00:00
mulin.lyh
fd7fd38da0 fix failed case 2023-02-10 10:14:24 +08:00
shimin.ysm
2535866443 cv/image-fewshot-detection-defrcn support finetune and evaluation
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11486763

* defrcn完善评估功能,支持coco格式

* 修改格式问题

* 优化模型加载

* 优化训练测试脚本

* 修复推理时依赖数据集的问题

* 指定模型版本

* 指定model revision

* review意见修改
2023-02-09 10:43:08 +00:00
lanjinpeng.ljp
cffc1ba0e5 support DINO detection using EasyCV
支持DINO高精度目标检测模型

 Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11518805
2023-02-09 09:39:08 +00:00
leyuan.hjy
2684111bd7 Real-time object detection finetune support using easycv
实时目标检测finetune easycv支持 

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11554870

* add finetune support 

* implementation of trainer and pipeline switched to easycv

* remove old yolox code
2023-02-09 08:45:05 +00:00
tanfan.zjh
bb174351b3 refactor faq model and add MGIMN model
FAQ模型代码重构+新增FAQ MGIMN模型 

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11595371
2023-02-09 08:29:19 +00:00
hemu.zp
ce4199a783 Fix data parallel bug for mgeo evaluation
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11584808
2023-02-09 08:26:52 +00:00
wenmeng.zwm
d5ae8ae43b remove tensorboard hook as default
tensorboard has been removed from the requirements of framework.txt, so we remove tensorboard hook from default config
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11519980

    * remove tensorboard hook as default

* Merge branch 'master' into fix/remove_default_tensorboard_hook
2023-02-08 10:07:07 +00:00
xianzhe.xxz
0967ece5a0 fix damoyolo evaluater load checkpoint not matched
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11583722
2023-02-08 06:50:47 +00:00
ada.drx
7298bd2bb4 mgeo fix finetune for rerank test case and reduce UT time
* reduce UT time 
* fix finetune for rerank test case

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11563740
2023-02-07 02:55:33 +00:00
dawei.fdw
310e9c7dbf add plug mental model
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11549696

* add plug mental model code

* add test pipeline and fix annotation format bugs
2023-02-06 10:57:20 +00:00
mulin.lyh
e54694690f [to #46993990]feat: run ci cases base on code diff to reduct ci test time 2023-02-06 08:00:19 +00:00
pengteng.spt
e502e89c61 Split training and evaluating code for nearfield kws trainer
* fix judgement of fa case for certain keywords in det
 * split code so that train and evaluate can be single used
 * fix pre-commit errors

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11453810
2023-01-31 09:43:19 +00:00
shouzhou.bx
f6c884b5ec [to #42322933][BUG FIX]bug fix for hand detect ft
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11439551
2023-01-16 05:07:25 +00:00
bin.xue
854c1e6cbf [to #42322933] bugfix: separation.evaluate() failed
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11426908
2023-01-13 09:19:31 +00:00
shimin.ysm
f7930c23a0 add cv/image-defrcn-fewshot-detection
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11364804

* add model defrcn-fewshot-detection

* add requirements check
2023-01-12 12:48:38 +00:00
ada.drx
2309596161 add mgeo finetune and pipeline
MGeo is a multi-modal multi-task geographic language model.
We support 5 pipeline tasks and 1 pretrained model MGeo on maas.
In the same time, we propose GeoGLUE, a geographic evaluation benchmark. MGeo can be finetuned on GeoGLUE tasks.

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11273012


* add prov city dist feature to gis encoder

* finish mgeo fintune and pipeline

* text classification add token type id

* to_device support ModelOutput class

* update token classification model lable mask logic
2023-01-12 17:55:14 +08:00
jiangyu.xzy
c8c1b7f1a8 add asr finetune & change inference
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11403205

* support asr new models & vad-punc models
2023-01-12 16:01:54 +08:00
hemu.zp
06296c1819 [to #42322933] Fix evaluation oom
Add merge method for all metrics, parallel metrics can be merged when using data parallel. No longer save all data in the evaluation process to avoid oom.

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11399082
2023-01-12 13:02:54 +08:00