78 Commits

Author SHA1 Message Date
co63oc
c30bfeb777 Fix typos (#1328) 2025-05-08 16:10:54 +08:00
wenmeng zhou
88c60f114a remove necessary dependency of transformer (#857)
* remove necessary dependency of transformer

* fix import error

* fix typo
2024-05-15 20:44:17 +08:00
rujiao.lrj
6b09cb3d7a add model for card correction
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/14049168
2023-09-19 19:20:19 +08:00
tastelikefeet
f2e9ea4199 Replace code with swift wheel (#467) 2023-08-29 17:27:18 +08:00
hemu.zp
612f0ebbc4 fix eval RecursionError
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/13099203
2023-06-29 10:47:54 +08:00
wanyu.wy
58df448182 Add trainer for UniTE 2023-05-11 14:41:08 +08:00
chaojie.mcj
283517de08 add efficient tunner modules 2023-04-11 22:26:13 +08:00
kangxiaoyang.kxy
d2fde9248c update image colorization metric
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12073952

* update image colorization metric

* update documentation strings
2023-03-28 15:14:10 +08:00
Zhicheng Zhang
ab0d41afe2 add task chat for all chat models, like chatglm-6B (#213)
* add task chat

* update output format

---------

Co-authored-by: Zhicheng Zhang <zhangzhicheng.zzc@alibaba-inc.com>
2023-03-20 21:19:29 +08:00
kangxiaoyang.kxy
681afac803 add finetune for ddcolor image colorization
图像上色模型添加finetune支持,本地测试已通过:python tests/test_image_colorization_trainer.py

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11874339
2023-03-10 14:00:48 +08:00
yuanzhi.zyz
2ce41d4974 add ocr_reco_crnntrain
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11824395
2023-03-09 21:14:09 +08:00
lllcho.lc
3e70ecfb37 add action detection finetune
添加action-detection模型finetune支持

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11873651
2023-03-09 16:54:19 +08:00
wenmeng.zwm
99892d0812 Merge branch merge_master_github_0224 into master
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11782747
2023-02-24 17:16:38 +08:00
zhongning.hzn
8411645524 bad_image_detecting模型支持在数据集上validation功能
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11761935
2023-02-24 14:23:57 +08:00
chenxujun
adb4de753d Fix some words (#129) 2023-02-22 10:01:18 +08:00
haolan.xhl
e537f9df1d [to #42322933] add vqa-degradation
add new model for image_quality_assessment_degradation
2023-02-10 11:59:59 +00:00
yuze.zyz
ca1321f53f Support trainer prediction and fix some bugs
1. Support trainer prediction
2. Fix bug in text classification metric
3. Move load checkpoint out of checkpointhook
4. Fix bug in train progressing (inner_iter variable not correct)

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11560269
2023-02-10 06:19:37 +00:00
tanfan.zjh
bb174351b3 refactor faq model and add MGIMN model
FAQ模型代码重构+新增FAQ MGIMN模型 

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11595371
2023-02-09 08:29:19 +00:00
hemu.zp
ce4199a783 Fix data parallel bug for mgeo evaluation
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11584808
2023-02-09 08:26:52 +00:00
zhongning.hzn
6ecf76314f add image quality assessment mos(mean option score) model
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11469039
2023-02-07 14:18:27 +00:00
ada.drx
2309596161 add mgeo finetune and pipeline
MGeo is a multi-modal multi-task geographic language model.
We support 5 pipeline tasks and 1 pretrained model MGeo on maas.
In the same time, we propose GeoGLUE, a geographic evaluation benchmark. MGeo can be finetuned on GeoGLUE tasks.

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11273012


* add prov city dist feature to gis encoder

* finish mgeo fintune and pipeline

* text classification add token type id

* to_device support ModelOutput class

* update token classification model lable mask logic
2023-01-12 17:55:14 +08:00
hemu.zp
06296c1819 [to #42322933] Fix evaluation oom
Add merge method for all metrics, parallel metrics can be merged when using data parallel. No longer save all data in the evaluation process to avoid oom.

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11399082
2023-01-12 13:02:54 +08:00
huizheng.hz
466200f355 NAFNet Image Deblurring pipeline and finetune support
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11300932

* fix psnr/ssim metrics for NAFNet (image denoise)

* add subset_name when loading dataset (NAFNet image denoising)
2023-01-11 22:18:03 +08:00
wenmeng.zwm
9ce750f4a9 merge master-github and fix conflict 2023-01-10 11:12:37 +08:00
liaojie.laj
fcf6e6431f submit video frame interpolation model
增加视频插帧模型
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11188339
2023-01-10 06:57:19 +08:00
zhicheng.sc
2cb89609f0 Add video stabilization model
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11204574

* add video stabilization model
2023-01-09 21:23:26 +08:00
hemu.zp
e4be1dde72 [to #42322933] add ppl metric
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11303248
2023-01-07 22:52:55 +08:00
yichang.zyc
0c79b57fcc support batch infer
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11170755
2022-12-28 12:17:36 +08:00
huizheng.hz
cb9f1bfb8d 1230-video-super-resolution
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11090960
2022-12-28 06:18:50 +08:00
wenmeng.zwm
5a3d58ad49 Merge branch 'master-gitlab' into merge_master_internal_1207 2022-12-07 19:59:07 +08:00
hemu.zp
941dbe75cf [to #42322933] Add GPT-3 tensor parallel finetuning
Add GPT-3 tensor parallel finetuning, adjust some distributed codes to make tensor and data parallel compatible.
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10949507
2022-12-05 10:01:32 +08:00
chenxujun
99507a5cc6 Fix some words 2022-12-03 14:39:55 +08:00
yuze.zyz
bb5512d1ab [to #42322933] Refactor NLP and fix some user feedbacks
1. Abstract keys of dicts needed by nlp metric classes into the init method
2. Add Preprocessor.save_pretrained to save preprocessor information
3. Abstract the config saving function, which can lead to normally saving in the direct call of from_pretrained, and the modification of cfg one by one when training.
4. Remove SbertTokenizer and VecoTokenizer, use transformers' tokenizers instead
5. Use model/preprocessor's from_pretrained in all nlp pipeline classes.
6. Add model_kwargs and preprocessor_kwargs in all nlp pipeline classes
7. Add base classes for fill-mask and text-classification preprocessor, as a demo for later changes
8. Fix user feedback: Re-train the model in continue training scenario
9. Fix user feedback: Too many checkpoint saved
10. Simplify the nlp-trainer
11. Fix user feedback: Split the default trainer's __init__ method, which makes user easier to override
12. Add safe_get to Config class

----------------------------  Another refactor from version 36 -------------------------

13. Name all nlp transformers' preprocessors from TaskNamePreprocessor to TaskNameTransformersPreprocessor, for example:
      TextClassificationPreprocessor -> TextClassificationTransformersPreprocessor
14. Add a base class per task for all nlp tasks' preprocessors which has at least two sub-preprocessors
15. Add output classes of nlp models
16. Refactor the logic for token-classification
17. Fix bug: checkpoint_hook does not support pytorch_model.pt
18. Fix bug: Pipeline name does not match with task name, so inference will not succeed after training
       NOTE: This is just a stop bleeding solution, the root cause is the uncertainty of the relationship between models and pipelines
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10723513

    * add save_pretrained to preprocessor

* save preprocessor config in hook

* refactor label-id mapping fetching logic

* test ok on sentence-similarity

* run on finetuning

* fix bug

* pre-commit passed

* fix bug

* Merge branch 'master' into feat/refactor_config

# Conflicts:
#	modelscope/preprocessors/nlp/nlp_base.py

* add params to init

* 1. support max ckpt num 2. support ignoring others but bin file in continue training 3. add arguments to some nlp metrics

* Split trainer init impls to overridable methods

* remove some obsolete tokenizers

* unfinished

* support input params in pipeline

* fix bugs

* fix ut bug

* fix bug

* fix ut bug

* fix ut bug

* fix ut bug

* add base class for some preprocessors

* Merge commit '379867739548f394d0fa349ba07afe04adf4c8b6' into feat/refactor_config

* compatible with old code

* fix ut bug

* fix ut bugs

* fix bug

* add some comments

* fix ut bug

* add a requirement

* fix pre-commit

* Merge commit '0451b3d3cb2bebfef92ec2c227b2a3dd8d01dc6a' into feat/refactor_config

* fixbug

* Support function type in registry

* fix ut bug

* fix bug

* Merge commit '5f719e542b963f0d35457e5359df879a5eb80b82' into feat/refactor_config

# Conflicts:
#	modelscope/pipelines/nlp/multilingual_word_segmentation_pipeline.py
#	modelscope/pipelines/nlp/named_entity_recognition_pipeline.py
#	modelscope/pipelines/nlp/word_segmentation_pipeline.py
#	modelscope/utils/hub.py

* remove obsolete file

* rename init args

* rename params

* fix merge bug

* add default preprocessor config for ner-model

* move a method a util file

* remove unused config

* Fix a bug in pbar

* bestckptsaver:change default ckpt numbers to 1

* 1. Add assert to max_epoch 2. split init_dist and get_device 3. change cmp func name

* Fix bug

* fix bug

* fix bug

* unfinished refactoring

* unfinished

* uw

* uw

* uw

* uw

* Merge branch 'feat/refactor_config' into feat/refactor_trainer

# Conflicts:
#	modelscope/preprocessors/nlp/document_segmentation_preprocessor.py
#	modelscope/preprocessors/nlp/faq_question_answering_preprocessor.py
#	modelscope/preprocessors/nlp/relation_extraction_preprocessor.py
#	modelscope/preprocessors/nlp/text_generation_preprocessor.py

* uw

* uw

* unify nlp task outputs

* uw

* uw

* uw

* uw

* change the order of text cls pipeline

* refactor t5

* refactor tg task preprocessor

* fix

* unfinished

* temp

* refactor code

* unfinished

* unfinished

* unfinished

* unfinished

* uw

* Merge branch 'feat/refactor_config' into feat/refactor_trainer

* smoke test pass

* ut testing

* pre-commit passed

* Merge branch 'master' into feat/refactor_config

# Conflicts:
#	modelscope/models/nlp/bert/document_segmentation.py
#	modelscope/pipelines/nlp/__init__.py
#	modelscope/pipelines/nlp/document_segmentation_pipeline.py

* merge master

* unifnished

* Merge branch 'feat/fix_bug_pipeline_name' into feat/refactor_config

* fix bug

* fix ut bug

* support ner batch inference

* fix ut bug

* fix bug

* support batch inference on three nlp tasks

* unfinished

* fix bug

* fix bug

* Merge branch 'master' into feat/refactor_config

# Conflicts:
#	modelscope/models/base/base_model.py
#	modelscope/pipelines/nlp/conversational_text_to_sql_pipeline.py
#	modelscope/pipelines/nlp/dialog_intent_prediction_pipeline.py
#	modelscope/pipelines/nlp/dialog_modeling_pipeline.py
#	modelscope/pipelines/nlp/dialog_state_tracking_pipeline.py
#	modelscope/pipelines/nlp/document_segmentation_pipeline.py
#	modelscope/pipelines/nlp/faq_question_answering_pipeline.py
#	modelscope/pipelines/nlp/feature_extraction_pipeline.py
#	modelscope/pipelines/nlp/fill_mask_pipeline.py
#	modelscope/pipelines/nlp/information_extraction_pipeline.py
#	modelscope/pipelines/nlp/named_entity_recognition_pipeline.py
#	modelscope/pipelines/nlp/sentence_embedding_pipeline.py
#	modelscope/pipelines/nlp/summarization_pipeline.py
#	modelscope/pipelines/nlp/table_question_answering_pipeline.py
#	modelscope/pipelines/nlp/text2text_generation_pipeline.py
#	modelscope/pipelines/nlp/text_classification_pipeline.py
#	modelscope/pipelines/nlp/text_error_correction_pipeline.py
#	modelscope/pipelines/nlp/text_generation_pipeline.py
#	modelscope/pipelines/nlp/text_ranking_pipeline.py
#	modelscope/pipelines/nlp/token_classification_pipeline.py
#	modelscope/pipelines/nlp/word_segmentation_pipeline.py
#	modelscope/pipelines/nlp/zero_shot_classification_pipeline.py
#	modelscope/trainers/nlp_trainer.py

* pre-commit passed

* fix bug

* Merge branch 'master' into feat/refactor_config

# Conflicts:
#	modelscope/preprocessors/__init__.py

* fix bug

* fix bug

* fix bug

* fix bug

* fix bug

* fixbug

* pre-commit passed

* fix bug

* fixbug

* fix bug

* fix bug

* fix bug

* fix bug

* self review done

* fixbug

* fix bug

* fix bug

* fix bugs

* remove sub-token offset mapping

* fix name bug

* add some tests

* 1. support batch inference of text-generation,text2text-generation,token-classification,text-classification 2. add corresponding UTs

* add old logic back

* tmp save

* add tokenize by words logic back

* move outputs file back

* revert veco token-classification back

* fix typo

* Fix description

* Merge commit '4dd99b8f6e4e7aefe047c68a1bedd95d3ec596d6' into feat/refactor_config

* Merge branch 'master' into feat/refactor_config

# Conflicts:
#	modelscope/pipelines/builder.py
2022-11-30 23:52:17 +08:00
hemu.zp
0f0fdcae6f [to #42322933] Fix bug for mplug evaluation
修复了 mplug evaluation 使用了错误的 metrics 的问题,将部分中文处理代码独立到 utils 中,为 mplug 添加 trainer
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10612875
2022-11-08 17:58:03 +08:00
翎航
0418786cbe add five task finetune 2022-11-07 20:23:17 +08:00
yichang.zyc
e2d35fbb14 [to #42322933]clip支持finetune
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10572842
2022-10-30 21:51:11 +08:00
hemu.zp
53e9f02561 [to #42322933] Fix bug for bleu in text generation metrics.
修复了使用错误算法导致 BLEU-4 值计算结果偏小的问题
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10558494
2022-10-28 09:28:15 +08:00
yuze.zyz
212cf53318 [to #42322933] Fix some bugs
1. Add F1 score to sequence classification metric
2. Fix a bug that the evaluate method in trainer does not support a pure pytorch_model.bin
3. Fix a bug in evaluation of veco trainer 
4. Add some tips if lr_scheduler in the trainer needs a higher version torch
5. Add some comments
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10532230
2022-10-27 19:49:21 +08:00
shuying.shu
ddcb57440d [to #42322933]add fine-tune code for referring video object segmentation
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10539423
2022-10-27 19:43:54 +08:00
hemu.zp
69104c0f8a [to #42322933] Refactor text generation model outputs and fix some bugs
1. 将 single_gpu_test 与 multi_gpu_test 中的 model.forward 部分分离为 EpochBasedTrainer 中的 evaluation_step,为部分 evaluation 阶段不调用 forward 的模型提供更好的灵活性
2. 重构代码将文本生成模型 Model 层的输入输出统一为 Tensor,Tensor 到 str 的 decode 过程移动到 pipeline 中完成
3. pipeline 后处理添加对中文和中文标点与英文混杂时空格的处理,使 decode 后中英文混杂输出正确
4. 添加 TextGenerationTrainer 修复了部分模型 evaluation 过程 forward 输出单个 token 计算 metrics 的问题
5. 修复了 rouge 无法接收空字符串的问题
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10473768
2022-10-27 09:52:05 +08:00
liugao.lg
0605376135 [to #42322933]add ofa finetune
新增ofa的finetune能力
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10540701
2022-10-27 09:29:06 +08:00
yichang.zyc
62339161cd revert args of metric init
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10521235
2022-10-25 19:26:44 +08:00
yichang.zyc
6ddafb3218 [to #42322933]caption finetune done, add belu metric
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10318299
2022-10-25 12:55:41 +08:00
yuze.zyz
605cd7f44a [to #42322933] NLP 1030 Refactor
Features:
1. Refactor the directory structure of nlp models. All model files are placed into either the model folder or the task_model folder
2. Refactor all the comments to google style
3. Add detail comments to important tasks and nlp models, to list the description of the model, and its preprocessor&trainer
4. Model Exporting now supports a direct all to TorchModelExporter(no need to derive from it)
5. Refactor model save_pretrained method to support direct running(independent from trainer)
6. Remove the judgement of Model in the pipeline base class, to support outer register models running in our pipelines
7. Nlp trainer now has a NLPTrainingArguments class , user can pass arguments into the dataclass, and use it as a normal cfg_modify_fn, to simplify the operation of modify cfg.
8. Merge the BACKBONES and the MODELS, so user can get a backbone with the Model.from_pretrained call
9. Model.from_pretrained now support a task argument, so user can use a backbone and load it with a specific task class.
10. Support Preprocessor.from_pretrained method
11. Add standard return classes to important nlp tasks, so some of the pipelines and the models are independent now, the return values of the models will always be tensors, and the pipelines will take care of the conversion to numpy and the following stuffs.
12. Split the file of the nlp preprocessors, to make the dir structure more clear.

Bugs Fixing:
1. Fix a bug that lr_scheduler can be called earlier than the optimizer's step
2. Fix a bug that the direct call of Pipelines (not from pipeline(xxx)) throws error
3. Fix a bug that the trainer will not call the correct TaskDataset class
4. Fix a bug that the internal loading of dataset will throws error in the trainer class
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10490585
2022-10-25 12:26:25 +08:00
baiguan.yt
533ab3df63 [to #42322933]update msdatasets for image-portrait-enhancement training
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10453584
2022-10-21 14:54:24 +08:00
yuze.zyz
acba1786b0 [to #42322933] Fix bug in UT daily
1. Fix bugs in daily test
2. Fix a bug that the updating of lr is before the first time of updating of optimizer
    TODO this will still cause warnings when GA is above 1
3. Remove the judgement of mode in text-classification's preprocessor to fit the base trainer(Bug)
     Update some regression bins to fit the preprocessor
4. Update the regression tool to let outer code modify atol and rtol
5. Add the default metric for text-classification task
6. Remove the useless ckpt conversion method in bert to avoid the requirement of tf when loading modeling_bert
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10430764
2022-10-20 15:29:34 +08:00
huizheng.hz
c5c14ad60a [to #42322933]fix psnr/ssim metrics for NAFNet (image denoise)
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10403246
2022-10-13 22:25:57 +08:00
bin.xue
6818ffdc8e [to #42322933] feat: optimize ANS metric value
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10399100
2022-10-13 19:42:19 +08:00
huizheng.hz
922f4c589b [to #42322933]图像去噪using msdataset to load dataset
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10338265
2022-10-11 22:46:30 +08:00
ashui.cbh
69da8f91ac [to #42322933]suport image inpainting
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10111615
2022-10-11 20:49:13 +08:00