Commit Graph

45 Commits

Author SHA1 Message Date
suluyana
b68b90ba15 skip plugin 2023-07-30 00:30:30 +08:00
suluyana
9ece90ee84 skip plugin test case 2023-07-29 21:35:21 +08:00
wenmeng zhou
64203e89ee Compatibility for huggingface transformers (#391) 2023-07-24 20:53:27 +08:00
yuze.zyz
a58be34384 Add Lora/Adapter/Prompt and support for chatglm6B and chatglm2-6B
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12770413

* add prompt and lora

* add adapter

* add prefix

* add tests

* adapter smoke test passed

* prompt test passed

* support model id in petl

* migrate chatglm6b

* add train script for chatglm6b

* move gen_kwargs to finetune.py

* add chatglm2

* add model definination
2023-06-27 14:38:18 +08:00
mulin.lyh
698c794070 [to #50537864]fix: fix select case issue
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/13055098
2023-06-25 22:44:29 +08:00
hemu.zp
96c2d42f09 Add StreamingMixin
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12445731
* StreamingMixin poc

* update design

* Merge branch 'master' into feat/StreamingMixin

* add dicstr

* make postprocessor input consistent
2023-06-08 19:40:14 +08:00
mulin.lyh
7b14a0e11f Pipeline input, output and parameter normalization. 2023-05-11 11:20:01 +08:00
zhangzhicheng.zzc
04e8ddc41e fix update ast not remove origin information
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12319197
2023-04-13 16:10:07 +08:00
hemu.zp
aa561a1818 Support split and merge for megatron_base model
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12288423
2023-04-12 16:23:35 +08:00
yzhao
99e94bc2c2 Merge branch 'master-github' into master-merge-github20230310 2023-03-10 13:52:31 +08:00
zhangzhicheng.zzc
8a19e9645d [to #47860410]plugin with cli tool
1. 支持 plugin方式接入外部 repo、github repo,本地repo,并进行外部插件管理
2. 支持allow_remote方式接入modelhub repo,该类型属于model 范畴不做额外插件管理
3. 支持cli 安装plugin相关

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11775456
2023-03-09 23:07:13 +08:00
chenxujun
20b3a679e7 Fix some words (#141) 2023-03-02 11:06:56 +08:00
wenmeng.zwm
677e49eaf3 update api doc
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11582587
2023-02-10 07:48:11 +00:00
zhangzhicheng.zzc
5c73ee9f6f skip ast update test
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11609057
2023-02-09 08:36:44 +00:00
mulin.lyh
71f832da35 [to #47671666]fix: diff based ci optimize
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11574741

    * [to #47671666]fix: diff based ci optimize
2023-02-07 10:45:52 +00:00
mulin.lyh
e54694690f [to #46993990]feat: run ci cases base on code diff to reduct ci test time 2023-02-06 08:00:19 +00:00
zhangzhicheng.zzc
e20a72be07 remove function level imports index
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11537482
2023-02-04 10:26:00 +00:00
zhangzhicheng.zzc
42898badf7 [to #42322933] update ast_index logic 2023-01-11 10:43:56 +08:00
pangda
346af6773f support plugin mechanism for second-party/third-party modules 2023-01-11 10:35:09 +08:00
zhangzhicheng.zzc
a318f27247 [to #42322933] speed up the ast indexing during editing
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10907357
2022-12-02 10:06:24 +08:00
yuze.zyz
bb5512d1ab [to #42322933] Refactor NLP and fix some user feedbacks
1. Abstract keys of dicts needed by nlp metric classes into the init method
2. Add Preprocessor.save_pretrained to save preprocessor information
3. Abstract the config saving function, which can lead to normally saving in the direct call of from_pretrained, and the modification of cfg one by one when training.
4. Remove SbertTokenizer and VecoTokenizer, use transformers' tokenizers instead
5. Use model/preprocessor's from_pretrained in all nlp pipeline classes.
6. Add model_kwargs and preprocessor_kwargs in all nlp pipeline classes
7. Add base classes for fill-mask and text-classification preprocessor, as a demo for later changes
8. Fix user feedback: Re-train the model in continue training scenario
9. Fix user feedback: Too many checkpoint saved
10. Simplify the nlp-trainer
11. Fix user feedback: Split the default trainer's __init__ method, which makes user easier to override
12. Add safe_get to Config class

----------------------------  Another refactor from version 36 -------------------------

13. Name all nlp transformers' preprocessors from TaskNamePreprocessor to TaskNameTransformersPreprocessor, for example:
      TextClassificationPreprocessor -> TextClassificationTransformersPreprocessor
14. Add a base class per task for all nlp tasks' preprocessors which has at least two sub-preprocessors
15. Add output classes of nlp models
16. Refactor the logic for token-classification
17. Fix bug: checkpoint_hook does not support pytorch_model.pt
18. Fix bug: Pipeline name does not match with task name, so inference will not succeed after training
       NOTE: This is just a stop bleeding solution, the root cause is the uncertainty of the relationship between models and pipelines
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10723513

    * add save_pretrained to preprocessor

* save preprocessor config in hook

* refactor label-id mapping fetching logic

* test ok on sentence-similarity

* run on finetuning

* fix bug

* pre-commit passed

* fix bug

* Merge branch 'master' into feat/refactor_config

# Conflicts:
#	modelscope/preprocessors/nlp/nlp_base.py

* add params to init

* 1. support max ckpt num 2. support ignoring others but bin file in continue training 3. add arguments to some nlp metrics

* Split trainer init impls to overridable methods

* remove some obsolete tokenizers

* unfinished

* support input params in pipeline

* fix bugs

* fix ut bug

* fix bug

* fix ut bug

* fix ut bug

* fix ut bug

* add base class for some preprocessors

* Merge commit '379867739548f394d0fa349ba07afe04adf4c8b6' into feat/refactor_config

* compatible with old code

* fix ut bug

* fix ut bugs

* fix bug

* add some comments

* fix ut bug

* add a requirement

* fix pre-commit

* Merge commit '0451b3d3cb2bebfef92ec2c227b2a3dd8d01dc6a' into feat/refactor_config

* fixbug

* Support function type in registry

* fix ut bug

* fix bug

* Merge commit '5f719e542b963f0d35457e5359df879a5eb80b82' into feat/refactor_config

# Conflicts:
#	modelscope/pipelines/nlp/multilingual_word_segmentation_pipeline.py
#	modelscope/pipelines/nlp/named_entity_recognition_pipeline.py
#	modelscope/pipelines/nlp/word_segmentation_pipeline.py
#	modelscope/utils/hub.py

* remove obsolete file

* rename init args

* rename params

* fix merge bug

* add default preprocessor config for ner-model

* move a method a util file

* remove unused config

* Fix a bug in pbar

* bestckptsaver:change default ckpt numbers to 1

* 1. Add assert to max_epoch 2. split init_dist and get_device 3. change cmp func name

* Fix bug

* fix bug

* fix bug

* unfinished refactoring

* unfinished

* uw

* uw

* uw

* uw

* Merge branch 'feat/refactor_config' into feat/refactor_trainer

# Conflicts:
#	modelscope/preprocessors/nlp/document_segmentation_preprocessor.py
#	modelscope/preprocessors/nlp/faq_question_answering_preprocessor.py
#	modelscope/preprocessors/nlp/relation_extraction_preprocessor.py
#	modelscope/preprocessors/nlp/text_generation_preprocessor.py

* uw

* uw

* unify nlp task outputs

* uw

* uw

* uw

* uw

* change the order of text cls pipeline

* refactor t5

* refactor tg task preprocessor

* fix

* unfinished

* temp

* refactor code

* unfinished

* unfinished

* unfinished

* unfinished

* uw

* Merge branch 'feat/refactor_config' into feat/refactor_trainer

* smoke test pass

* ut testing

* pre-commit passed

* Merge branch 'master' into feat/refactor_config

# Conflicts:
#	modelscope/models/nlp/bert/document_segmentation.py
#	modelscope/pipelines/nlp/__init__.py
#	modelscope/pipelines/nlp/document_segmentation_pipeline.py

* merge master

* unifnished

* Merge branch 'feat/fix_bug_pipeline_name' into feat/refactor_config

* fix bug

* fix ut bug

* support ner batch inference

* fix ut bug

* fix bug

* support batch inference on three nlp tasks

* unfinished

* fix bug

* fix bug

* Merge branch 'master' into feat/refactor_config

# Conflicts:
#	modelscope/models/base/base_model.py
#	modelscope/pipelines/nlp/conversational_text_to_sql_pipeline.py
#	modelscope/pipelines/nlp/dialog_intent_prediction_pipeline.py
#	modelscope/pipelines/nlp/dialog_modeling_pipeline.py
#	modelscope/pipelines/nlp/dialog_state_tracking_pipeline.py
#	modelscope/pipelines/nlp/document_segmentation_pipeline.py
#	modelscope/pipelines/nlp/faq_question_answering_pipeline.py
#	modelscope/pipelines/nlp/feature_extraction_pipeline.py
#	modelscope/pipelines/nlp/fill_mask_pipeline.py
#	modelscope/pipelines/nlp/information_extraction_pipeline.py
#	modelscope/pipelines/nlp/named_entity_recognition_pipeline.py
#	modelscope/pipelines/nlp/sentence_embedding_pipeline.py
#	modelscope/pipelines/nlp/summarization_pipeline.py
#	modelscope/pipelines/nlp/table_question_answering_pipeline.py
#	modelscope/pipelines/nlp/text2text_generation_pipeline.py
#	modelscope/pipelines/nlp/text_classification_pipeline.py
#	modelscope/pipelines/nlp/text_error_correction_pipeline.py
#	modelscope/pipelines/nlp/text_generation_pipeline.py
#	modelscope/pipelines/nlp/text_ranking_pipeline.py
#	modelscope/pipelines/nlp/token_classification_pipeline.py
#	modelscope/pipelines/nlp/word_segmentation_pipeline.py
#	modelscope/pipelines/nlp/zero_shot_classification_pipeline.py
#	modelscope/trainers/nlp_trainer.py

* pre-commit passed

* fix bug

* Merge branch 'master' into feat/refactor_config

# Conflicts:
#	modelscope/preprocessors/__init__.py

* fix bug

* fix bug

* fix bug

* fix bug

* fix bug

* fixbug

* pre-commit passed

* fix bug

* fixbug

* fix bug

* fix bug

* fix bug

* fix bug

* self review done

* fixbug

* fix bug

* fix bug

* fix bugs

* remove sub-token offset mapping

* fix name bug

* add some tests

* 1. support batch inference of text-generation,text2text-generation,token-classification,text-classification 2. add corresponding UTs

* add old logic back

* tmp save

* add tokenize by words logic back

* move outputs file back

* revert veco token-classification back

* fix typo

* Fix description

* Merge commit '4dd99b8f6e4e7aefe047c68a1bedd95d3ec596d6' into feat/refactor_config

* Merge branch 'master' into feat/refactor_config

# Conflicts:
#	modelscope/pipelines/builder.py
2022-11-30 23:52:17 +08:00
hemu.zp
0f0fdcae6f [to #42322933] Fix bug for mplug evaluation
修复了 mplug evaluation 使用了错误的 metrics 的问题,将部分中文处理代码独立到 utils 中,为 mplug 添加 trainer
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10612875
2022-11-08 17:58:03 +08:00
wenmeng.zwm
535acaef5b [to #42322933]add test case to check xtcocotools availbility
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10462622

    * add test case to check xtcocotools availbility
2022-10-20 12:13:19 +08:00
hemu.zp
271e2a2a99 [to #42322933] Add gpt_neo model
1. 添加 gpt_neo 模型,因 checkpoint 归属于 Langboat 还未上传到模型库,已线下完成测试
2. 添加 text-generation task models 与 head,后续会将 gpt3,palm 等已上线文本生成模型统一为 backbone + head 结构的 task models
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10404249
2022-10-17 20:54:29 +08:00
zhangzhicheng.zzc
d721fabb34 [to #42322933]bert with sequence classification / token classification/ fill mask refactor
1.新增支持原始bert模型(非easynlp的 backbone prefix版本)
2.支持bert的在sequence classification/fill mask /token classification上的backbone head形式
3.统一了sequence classification几个任务的pipeline到一个类
4.fill mask 支持backbone head形式
5.token classification的几个子任务(ner,word seg, part of speech)的preprocessor 统一到了一起TokenClassificationPreprocessor
6. sequence classification的几个子任务(single classification, pair classification)的preprocessor 统一到了一起SequenceClassificationPreprocessor
7. 改动register中 cls的group_key 赋值位置,之前的group_key在多个decorators的情况下,会被覆盖,obj_cls的group_key信息不正确
8. 基于backbone head形式将 原本group_key和 module同名的情况尝试做调整,如下在modelscope/pipelines/nlp/sequence_classification_pipeline.py 中 
原本
 @PIPELINES.register_module(
    Tasks.sentiment_classification, module_name=Pipelines.sentiment_classification)
改成
@PIPELINES.register_module(
    Tasks.text_classification, module_name=Pipelines.sentiment_classification)
相应的configuration.json也有改动,这样的改动更符合任务和pipline(子任务)的关系。
8. 其他相应改动为支持上述功能
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10041463
2022-09-27 23:08:33 +08:00
wenmeng.zwm
6808e9a301 [to #44902099] add license for framework files
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10189613
2022-09-20 17:49:31 +08:00
wenmeng.zwm
fabb4716d4 [to #44610931] fix: add device usage when device is None or empty
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10039848

    * add device usage when device is None or empty

    * update docker env
2022-09-06 21:47:59 +08:00
jiangnana.jnn
930d55d9ad support EasyCV framework and add Segformer model
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9781849

    * support EasyCV
2022-08-26 13:58:50 +08:00
wenmeng.zwm
c72e5f4ae8 [to #43878347] skip device placement test
skip this test which will result in too much debug log for placement although debug level is canceled after this test case

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9875987
2022-08-24 15:08:22 +08:00
zhangzhicheng.zzc
5b0b54633b [to #42322933]compatible with windows path on only core parts
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9855254
2022-08-24 13:35:42 +08:00
wenmeng.zwm
aaa604cb16 [to #43878347] device placement support certain gpu
1. add device util to verify, create and place device
2. pipeline and trainer support update
3.  fix pipeline which use tf models does not place model to the right device

usage

```python
pipe = pipeline('damo/xxx', device='cpu')
pipe = pipeline('damo/xxx', device='gpu')
pipe = pipeline('damo/xxx', device='gpu:0')
pipe = pipeline('damo/xxx', device='gpu:2')
pipe = pipeline('damo/xxx', device='cuda')
pipe = pipeline('damo/xxx', device='cuda:1')
```
 Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9800672
2022-08-22 15:32:00 +08:00
wenmeng.zwm
7798a6250a [to #43112692] stardarized task name and output
1. task name and output definition: [link](https://alidocs.dingtalk.com/i/nodes/KOEmgBoGwD78vd2bAry3VndLerP9b30a?nav=spaces&navQuery=spaceId%3Dnb9XJNlZxbgrOXyA&iframeQuery=utm_source%3Dportal%26utm_medium%3Dportal_space_file_tree)

2. rearrange task definition and add more outputs definition for tasks

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9585469
2022-08-02 20:21:05 +08:00
wenmeng.zwm
d55525bfb6 [to #43112771] requirements check and lazy import support 2022-07-27 17:29:16 +08:00
wenmeng.zwm
4814b198f0 [to #43112534] taskdataset refine and auto placement for data and model
* refine taskdataset interface
 * add device placement for trainer
 * add device placement for pipeline
 * add config checker and fix model placement bug
 * fix cycling import
 * refactor model init for translation_pipeline
 * cv pipelines support kwargs


Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9463076
2022-07-23 11:08:43 +08:00
jiangnana.jnn
f3d739bea7 [to #43105545] add default config and new hooks 2022-07-19 17:41:25 +08:00
zhangzhicheng.zzc
407337fbf3 [to #42322933] add model profiling 2022-07-07 20:21:52 +08:00
wenmeng.zwm
8e51a073a6 [to #42966122] requirements enchanment and self-host repo support
* add self-hosted repo:
* add extra requirements for different field and reduce necessary requirements
* update docker file with so required by audio
* add requirements checker which will be used later when implement lazy import
* remove repeated requirements and replace opencv-python-headless with opencv-python

example usage:
```shell
pip install model_scope[all] -f https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/release/maas/repo.html
pip install model_scope[cv] -f https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/release/maas/repo.html
pip install model_scope[nlp] -f https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/release/maas/repo.html
pip install model_scope[audio] -f https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/release/maas/repo.html
pip install model_scope[multi-modal] -f https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/release/maas/repo.html

```
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9211383
2022-07-01 16:38:06 +08:00
mulin.lyh
76c6ff6329 [to #42675838]merge model hub code
合并model hub 代码
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9096493
2022-06-21 20:04:25 +08:00
yingda.chen
5786b9a0a1 [to #42322933]formalize image matting
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9019685
2022-06-15 14:06:53 +08:00
wenmeng.zwm
8a030ead72 [to #42362853] feat: rename config to configuration and remove repeated task fileds
1. rename maas_config to configuration
2. remove task field image and video, using cv instead

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9010802
2022-06-13 19:44:34 +08:00
yingda.chen
3c1ec035fd [to #42322933] refine cartoon model and add model op utitlity
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/8993758
2022-06-10 12:56:44 +08:00
wenmeng.zwm
1f6b376599 [to #42373878] refactor maaslib to modelscope
1.  refactor maaslib to modelscope
2.  fix UT error
3.  support pipeline which does not register default model

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/8988388
2022-06-09 20:16:26 +08:00
Yingda Chen
3cbdcb1d3e fix typo 2022-05-20 06:24:18 +08:00
wenmeng.zwm
5e469008fd [to #41401401] add preprocessor, model and pipeline
* add preprocessor module
 * add model base and builder
 * update task constant
 * add load image preprocessor and its dependency
 * add pipeline interface and UT covered
 * support default pipeline for task
 * add image matting pipeline
 * refine nlp tokenize interface
 * add nlp pipeline 
 * fix UT failed
 * add test for Compose

Link: https://code.aone.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/8769235

* add preprocessor module

* add test for Compose

* fix citest error

* fix abs class error

* add model base and builder

* update task constant

* add load image preprocessor and its dependency

* add pipeline interface and UT covered

* support default pipeline for task

* refine models and pipeline interface

* add pipeline folder structure

* add image matting pipeline

* refine nlp tokenize interface

* add nlp pipeline 

1.add preprossor model pipeline for nlp text classification
2. add corresponding test

Link: https://code.aone.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/8757371

* new nlp pipeline

* format pre-commit code

* update easynlp pipeline

* update model_name for easynlp pipeline; add test for maas_lib/utils/typeassert.py

* update test_typeassert.py

* refactor code

1. rename typeassert to type_assert
2. use lazy import to make easynlp dependency optional
3. refine image matting UT

* fix linter test failed

* update requirements.txt

* fix UT failed

* fix citest script to update requirements
2022-05-19 22:18:35 +08:00
wenmeng.zwm
0a756f6a0d [to #41402703] add basic modules
* add constant
 * add logger module
 * add registry and builder module
 * add fileio module
 * add requirements and setup.cfg
 * add config module and tests
 * add citest script

Link: https://code.aone.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/8718998
2022-05-17 10:15:00 +08:00