Commit Graph

16 Commits

Author SHA1 Message Date
xingjun.wang
48c0d2a9af add 1.6 2023-05-22 10:53:18 +08:00
zhangzhicheng.zzc
f88b03ab72 [to #47307896] update backbone-head
The original backbone-head abstraction was not articheted well enough, the input and output parameters of backbone and head were in the form of **kwargs, which was implicit and might cause confustion. Therefore, the following adjustments were made:
原有backbone head抽象程度不够深,backbone 以及head输入输出参数为**kwargs,比较晦涩,同时很多功能无法支持扩展,因此做了如下调整:

1. Divide the basic model based on the structure to: encoder-only model, decoder-only model, single stage model, two stage model, etc., . Now, the encoder-only model was accomplished, while others are under design
2. Derive the structed task-models from the basic model structure above: a single structed task-model is mainly used to parse the backbone/head cfg, in order to apply the correct backbone or head components, some models might adjust the forward method from the basic model
3. Add the initialization parameters, input and output parameters to head class and backbone class, in order to reduce the understanding cost.
4. Remove the original nncrf class and chang it to backbone-head form with the lstm backbone and crf head.
5. Support  `model = Model.from_pretrained('bert-based-fill-mask', task='text-classification')`, this method could correctly load the backbone even when the task is different from the original one in configuration.
6. Support loading the model through the transformer's automodel, in the case of quickly integrating the backbone model without coding
7. Unifiy the original task classes in each nlp model and the structed task-model classes, the structed task-model are largely reduce the redundant codes in the original task classed. Still under refactor
8. Support load model configuration from hf transformers config.json, if the model related configuration is missing. Only suppport NLP models
2023-02-10 06:46:47 +00:00
dingkun.ldk
000976836f 1130中文分词/词性标注新增LSTM轻量化模型
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10931263
2022-12-03 13:15:33 +08:00
yuze.zyz
bb5512d1ab [to #42322933] Refactor NLP and fix some user feedbacks
1. Abstract keys of dicts needed by nlp metric classes into the init method
2. Add Preprocessor.save_pretrained to save preprocessor information
3. Abstract the config saving function, which can lead to normally saving in the direct call of from_pretrained, and the modification of cfg one by one when training.
4. Remove SbertTokenizer and VecoTokenizer, use transformers' tokenizers instead
5. Use model/preprocessor's from_pretrained in all nlp pipeline classes.
6. Add model_kwargs and preprocessor_kwargs in all nlp pipeline classes
7. Add base classes for fill-mask and text-classification preprocessor, as a demo for later changes
8. Fix user feedback: Re-train the model in continue training scenario
9. Fix user feedback: Too many checkpoint saved
10. Simplify the nlp-trainer
11. Fix user feedback: Split the default trainer's __init__ method, which makes user easier to override
12. Add safe_get to Config class

----------------------------  Another refactor from version 36 -------------------------

13. Name all nlp transformers' preprocessors from TaskNamePreprocessor to TaskNameTransformersPreprocessor, for example:
      TextClassificationPreprocessor -> TextClassificationTransformersPreprocessor
14. Add a base class per task for all nlp tasks' preprocessors which has at least two sub-preprocessors
15. Add output classes of nlp models
16. Refactor the logic for token-classification
17. Fix bug: checkpoint_hook does not support pytorch_model.pt
18. Fix bug: Pipeline name does not match with task name, so inference will not succeed after training
       NOTE: This is just a stop bleeding solution, the root cause is the uncertainty of the relationship between models and pipelines
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10723513

    * add save_pretrained to preprocessor

* save preprocessor config in hook

* refactor label-id mapping fetching logic

* test ok on sentence-similarity

* run on finetuning

* fix bug

* pre-commit passed

* fix bug

* Merge branch 'master' into feat/refactor_config

# Conflicts:
#	modelscope/preprocessors/nlp/nlp_base.py

* add params to init

* 1. support max ckpt num 2. support ignoring others but bin file in continue training 3. add arguments to some nlp metrics

* Split trainer init impls to overridable methods

* remove some obsolete tokenizers

* unfinished

* support input params in pipeline

* fix bugs

* fix ut bug

* fix bug

* fix ut bug

* fix ut bug

* fix ut bug

* add base class for some preprocessors

* Merge commit '379867739548f394d0fa349ba07afe04adf4c8b6' into feat/refactor_config

* compatible with old code

* fix ut bug

* fix ut bugs

* fix bug

* add some comments

* fix ut bug

* add a requirement

* fix pre-commit

* Merge commit '0451b3d3cb2bebfef92ec2c227b2a3dd8d01dc6a' into feat/refactor_config

* fixbug

* Support function type in registry

* fix ut bug

* fix bug

* Merge commit '5f719e542b963f0d35457e5359df879a5eb80b82' into feat/refactor_config

# Conflicts:
#	modelscope/pipelines/nlp/multilingual_word_segmentation_pipeline.py
#	modelscope/pipelines/nlp/named_entity_recognition_pipeline.py
#	modelscope/pipelines/nlp/word_segmentation_pipeline.py
#	modelscope/utils/hub.py

* remove obsolete file

* rename init args

* rename params

* fix merge bug

* add default preprocessor config for ner-model

* move a method a util file

* remove unused config

* Fix a bug in pbar

* bestckptsaver:change default ckpt numbers to 1

* 1. Add assert to max_epoch 2. split init_dist and get_device 3. change cmp func name

* Fix bug

* fix bug

* fix bug

* unfinished refactoring

* unfinished

* uw

* uw

* uw

* uw

* Merge branch 'feat/refactor_config' into feat/refactor_trainer

# Conflicts:
#	modelscope/preprocessors/nlp/document_segmentation_preprocessor.py
#	modelscope/preprocessors/nlp/faq_question_answering_preprocessor.py
#	modelscope/preprocessors/nlp/relation_extraction_preprocessor.py
#	modelscope/preprocessors/nlp/text_generation_preprocessor.py

* uw

* uw

* unify nlp task outputs

* uw

* uw

* uw

* uw

* change the order of text cls pipeline

* refactor t5

* refactor tg task preprocessor

* fix

* unfinished

* temp

* refactor code

* unfinished

* unfinished

* unfinished

* unfinished

* uw

* Merge branch 'feat/refactor_config' into feat/refactor_trainer

* smoke test pass

* ut testing

* pre-commit passed

* Merge branch 'master' into feat/refactor_config

# Conflicts:
#	modelscope/models/nlp/bert/document_segmentation.py
#	modelscope/pipelines/nlp/__init__.py
#	modelscope/pipelines/nlp/document_segmentation_pipeline.py

* merge master

* unifnished

* Merge branch 'feat/fix_bug_pipeline_name' into feat/refactor_config

* fix bug

* fix ut bug

* support ner batch inference

* fix ut bug

* fix bug

* support batch inference on three nlp tasks

* unfinished

* fix bug

* fix bug

* Merge branch 'master' into feat/refactor_config

# Conflicts:
#	modelscope/models/base/base_model.py
#	modelscope/pipelines/nlp/conversational_text_to_sql_pipeline.py
#	modelscope/pipelines/nlp/dialog_intent_prediction_pipeline.py
#	modelscope/pipelines/nlp/dialog_modeling_pipeline.py
#	modelscope/pipelines/nlp/dialog_state_tracking_pipeline.py
#	modelscope/pipelines/nlp/document_segmentation_pipeline.py
#	modelscope/pipelines/nlp/faq_question_answering_pipeline.py
#	modelscope/pipelines/nlp/feature_extraction_pipeline.py
#	modelscope/pipelines/nlp/fill_mask_pipeline.py
#	modelscope/pipelines/nlp/information_extraction_pipeline.py
#	modelscope/pipelines/nlp/named_entity_recognition_pipeline.py
#	modelscope/pipelines/nlp/sentence_embedding_pipeline.py
#	modelscope/pipelines/nlp/summarization_pipeline.py
#	modelscope/pipelines/nlp/table_question_answering_pipeline.py
#	modelscope/pipelines/nlp/text2text_generation_pipeline.py
#	modelscope/pipelines/nlp/text_classification_pipeline.py
#	modelscope/pipelines/nlp/text_error_correction_pipeline.py
#	modelscope/pipelines/nlp/text_generation_pipeline.py
#	modelscope/pipelines/nlp/text_ranking_pipeline.py
#	modelscope/pipelines/nlp/token_classification_pipeline.py
#	modelscope/pipelines/nlp/word_segmentation_pipeline.py
#	modelscope/pipelines/nlp/zero_shot_classification_pipeline.py
#	modelscope/trainers/nlp_trainer.py

* pre-commit passed

* fix bug

* Merge branch 'master' into feat/refactor_config

# Conflicts:
#	modelscope/preprocessors/__init__.py

* fix bug

* fix bug

* fix bug

* fix bug

* fix bug

* fixbug

* pre-commit passed

* fix bug

* fixbug

* fix bug

* fix bug

* fix bug

* fix bug

* self review done

* fixbug

* fix bug

* fix bug

* fix bugs

* remove sub-token offset mapping

* fix name bug

* add some tests

* 1. support batch inference of text-generation,text2text-generation,token-classification,text-classification 2. add corresponding UTs

* add old logic back

* tmp save

* add tokenize by words logic back

* move outputs file back

* revert veco token-classification back

* fix typo

* Fix description

* Merge commit '4dd99b8f6e4e7aefe047c68a1bedd95d3ec596d6' into feat/refactor_config

* Merge branch 'master' into feat/refactor_config

# Conflicts:
#	modelscope/pipelines/builder.py
2022-11-30 23:52:17 +08:00
yzhao
3f75fcdb79 fix bug 2022-11-02 20:02:18 +08:00
Yingda Chen
e0ef60ca9b [to #42322933] skip demo test by default 2022-09-09 14:56:33 +08:00
lingcai.wl
7a49fa1cc6 [to #44657982] add unittest for demo and demotest utils
unittest for demo service
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10006180
2022-09-08 14:08:51 +08:00
yuze.zyz
fbde374659 [to #42322933] add regress tests
Add regression test for some unit tests.
Firstly, Run a baseline test to create a pickle file which contains the inputs and outputs of modules, then changes can be observed between
the latest version and the baseline file.
Some baseline files are submitted in the data/test/regression folder
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9814693
2022-08-30 23:17:07 +08:00
yuze.zyz
794e277270 [to #42322933] Fix problem in ws
1. remove comments 
2. fix a bug that ws assert failure for english input
3. add an english input test for ws
3. remove a test case which the dataset can not be visited by outer website
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9657140
2022-08-05 16:18:29 +08:00
yuze.zyz
21fa71baf0 [to #42322933] add/refactor nlp models source code and finetune
1. add sbert,veco,palm,space source code
2. support sbert sequence classification, token classification finetune
3. support veco sequence classification finetune
4. support palm nlg finetune
evaluation result: https://sheet.alibaba-inc.com/#/sheet/f7fdcc7f22bd5105 sheet:Maas
5. add ut for finetunes
6. add veco's taskdataset processor
7. add a common trainer for nlp, and a specific trainer for veco
8. merge some duplicate codes of models, preprocessors, pipelines
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9574105

    * add basic class of hook&metrics

* pre-commit passed

* change some comments

* pre commit passed

* 1. remove accuracy's groups 2. remove useless hooks 3. simplify priorities

* pre-commit passed

* fix a comment

* Merge branch 'master' into finetune_hooks_metrics

# Conflicts:
#	modelscope/metainfo.py

* pre-commit passed

* add basic class of hook&metrics

* pre-commit passed

* change some comments

* pre commit passed

* 1. remove accuracy's groups 2. remove useless hooks 3. simplify priorities

* pre-commit passed

* fix a comment

* Merge branch 'feat/finetune' of gitlab.alibaba-inc.com:Ali-MaaS/MaaS-lib into feat/finetune

* mv hooks related to modelscope/trainers/hooks

* mv priority back

* add torch mdoel base and test

* update hooks, trainer, import_util

* add torch epoch based trainer and dis utils

* add hooks

* fix warmup

* format code stype and fix warmup and add warmup unittest

* fix impls

* pre-commit check passed

* update hook and add EpochBasedTrainer

* add trainer unittest

* Merge branch 'feat/add_hooks' into feat/add_task

# Conflicts:
#	modelscope/models/base_torch.py
#	modelscope/trainers/hooks/hook.py
#	modelscope/trainers/trainer.py

* update unittest name

* rewrite taskdataset to trainer

* fix trainer and add unittest

* add unittest

* code: run to forward

* run through... but ugly code

* arrange some cls

* fix some errs

* revert some mistakes

* init check in

* Merge branch 'feat/add_hooks' into feat/add_task

# Conflicts:
#	modelscope/trainers/trainer.py

* test with bigger epoch and size

* add the default metrics class

* move build metrics code to a method

* merge add_task

* merge origin add_task

* add device initialization

* remove preprocessor arg for bool

* add task models

* move metric collect logic to metrics class

* pre-commit passed

* fix cr comments

* precommit passed

* add task models

* Merge remote-tracking branch 'origin/feat/add_task' into feat/backbone_head

* add comment

* change comment formats.

* fix comments

* fix ut bug

* fix comments

* add wrapper check

* fix comments

* pre commit passed

* fix cr comments

* solve a loop import problem

* fix ut bug

* fix ut errors

* change dummydataset to msdataset

* precommit passed

* merge add task

* backbone-head is build, model is not correctly loaded

* model load states matched

* result matched

* lint

* add veco/palm_v2 code

* merge master

* merge master success running

* add repr model name level

* Merge branch 'feat/veco_palm' into feat/finetune_sbert_veco

* model test for training

* add token-classification metric add formal ut

* fix running bug

* finetune and pipeline are working with backbone-head

* add nli

* add missing code

* finetune and pipeline are working with backbone-head

* Merge branch 'feat/backbone_head' of http://gitlab.alibaba-inc.com/Ali-MaaS/MaaS-lib into feat/backbone_head

* add a test repo for pr

* remove merge conflicted file

* remove merge conflicted file 1

* lint check

* import error

* none type bug fix

* forward input unpacking or dict bug

* move head into models, add build_backbone with registry, no base method

* merge master

* feat: 1. add interleave dataset method 2. support multiple dataset in trainer.build_dataset 3. support 3 sub tasks in sequence_classification task

* unfinished

* update the task model structure in NLP field

* merge master

* update by comments

* keep the default model id as current on production

* unfinished

* unfinished

* veco can run

* Merge remote-tracking branch 'origin/master' into feat/backbone_head

* add taskmodel for module management

* remove forward_input_is_dict

* unfinished

* token classification started

* update base model structure

* move space to backbone

* remove 'type' in build_from_cfg method

* test update

* bug fix

* on tesing, mess code

* Merge branch 'feat/backbone_head' into feat/refactor_nlp_730

# Conflicts:
#	modelscope/metrics/builder.py
#	modelscope/models/__init__.py
#	modelscope/models/nlp/__init__.py
#	modelscope/preprocessors/nlp.py
#	modelscope/trainers/trainer.py
#	requirements/multi-modal.txt

* add missing merge

* add sofa source code

* refactor

* add veco task dataset

* add veco task dataset

* pre-commit passed

* fix bug of log

* add some features

* merge master

* bug fix

* refine nlp models

* fix the training error

* unfinished

* refactor pipeline

* Merge branch 'feat/backbone_head' into feat/refactor_nlp_730

# Conflicts:
#	modelscope/metrics/builder.py
#	modelscope/models/nlp/__init__.py
#	modelscope/models/nlp/backbones/structbert/modeling_sbert.py
#	modelscope/models/nlp/palm_v2/palm_for_text_generation.py
#	modelscope/preprocessors/base.py
#	modelscope/preprocessors/nlp.py
#	modelscope/trainers/trainer.py

* Merge commit 'ab04ceafc5453ce7daa9aa09e37a55f703072a10' into feat/refactor_nlp_730

# Conflicts:
#	modelscope/metainfo.py
#	modelscope/metrics/builder.py
#	modelscope/models/__init__.py
#	modelscope/models/base/base_torch_model.py
#	modelscope/models/nlp/__init__.py
#	modelscope/models/nlp/backbones/space/model/intent_unified_transformer.py
#	modelscope/models/nlp/backbones/space/model/model_base.py
#	modelscope/models/nlp/palm_v2/palm_for_text_generation.py
#	modelscope/models/nlp/sbert_for_sequence_classification.py
#	modelscope/models/nlp/sequence_classification.py
#	modelscope/models/nlp/space/__init__.py
#	modelscope/models/nlp/space_for_dialog_intent_prediction.py
#	modelscope/models/nlp/space_for_dialog_modeling.py
#	modelscope/models/nlp/space_for_dialog_state_tracking.py
#	modelscope/models/nlp/task_model.py
#	modelscope/pipelines/nlp/sentiment_classification_pipeline.py
#	modelscope/preprocessors/base.py
#	modelscope/preprocessors/nlp.py
#	modelscope/trainers/trainer.py

* revert changes

* unify sentnece classification postprocess

* revert some changes, move some model files

* pipeline first case run through

* ws pipeline passed

* Merge branch 'feat/refactor_nlp_730' into feat/finetune_sbert_veco

* finetune

* revert code

* revert some code

* ws finetune started, only the accuracy is weird

* Merge branch 'feat/veco_taskdataset' into feat/finetune_sbert_veco

# Conflicts:
#	modelscope/task_datasets/veco_dataset.py
#	tests/taskdataset/test_veco_dataset.py

* veco+nli finetune started

* Merge branch 'master' into feat/finetune_sbert_veco

# Conflicts:
#	modelscope/models/nlp/sbert_for_sequence_classification.py
#	modelscope/models/nlp/sbert_for_token_classification.py
#	modelscope/models/nlp/sbert_for_zero_shot_classification.py
#	modelscope/models/nlp/space/space_for_dialog_intent_prediction.py
#	modelscope/models/nlp/space/space_for_dialog_modeling.py
#	modelscope/trainers/trainer.py

* add trainer for nlp

* trainer: dataset params passed into preprocessor

* test passed by nlptrainer

* fix some bugs

* fix some bugs

* add backbone/head subclass

* fix regression bugs

* fix bug in token-cls finetune

* support cfg modification

* fix bug

* fix bug

* update requirements

* add some comments and fix some t

* add some comments and revert a argument

* split to two test files

* revert code

* fixbug in precessor

(cherry picked from commit 7a648d096ef8500c694d3255dabe29e6f4bfc3e5)

* fix ut bug

* support sbert models

* unfinished

* Merge branch 'feat/finetune_sbert_veco' into sly_tmp_veco_finetune

# Conflicts:
#	tests/trainers/test_finetune_sequence_classification.py

* fixbug in veco

* fix bug

* fixbug

* correct running params

* remove useless files

* add palm finetuning with cnn_dailymail dataset

* copy space model from sofa

* Merge branch 'feat/finetune_sbert_veco' of gitlab.alibaba-inc.com:Ali-MaaS/MaaS-lib into feat/finetune_sbert_veco

* Merge branch 'master' into feat/finetune_sbert_veco

# Conflicts:
#	modelscope/metrics/__init__.py
#	modelscope/models/__init__.py
#	modelscope/models/nlp/__init__.py
#	modelscope/models/nlp/backbones/__init__.py
#	modelscope/models/nlp/backbones/structbert/modeling_sbert.py
#	modelscope/models/nlp/heads/__init__.py
#	modelscope/models/nlp/masked_language.py
#	modelscope/models/nlp/palm_v2/palm_for_text_generation.py
#	modelscope/models/nlp/sbert_for_nli.py
#	modelscope/models/nlp/sbert_for_sentence_similarity.py
#	modelscope/models/nlp/sbert_for_sentiment_classification.py
#	modelscope/models/nlp/sbert_for_sequence_classification.py
#	modelscope/models/nlp/sbert_for_token_classification.py
#	modelscope/models/nlp/sbert_for_zero_shot_classification.py
#	modelscope/models/nlp/sequence_classification.py
#	modelscope/models/nlp/space/space_for_dialog_intent_prediction.py
#	modelscope/models/nlp/space/space_for_dialog_modeling.py
#	modelscope/models/nlp/space/space_for_dialog_state_tracking.py
#	modelscope/models/nlp/structbert/adv_utils.py
#	modelscope/models/nlp/structbert/configuration_sbert.py
#	modelscope/models/nlp/task_models/task_model.py
#	modelscope/pipelines/__init__.py
#	modelscope/pipelines/nlp/__init__.py
#	modelscope/pipelines/nlp/fill_mask_pipeline.py
#	modelscope/pipelines/nlp/named_entity_recognition_pipeline.py
#	modelscope/pipelines/nlp/nli_pipeline.py
#	modelscope/pipelines/nlp/sentence_similarity_pipeline.py
#	modelscope/pipelines/nlp/sentiment_classification_pipeline.py
#	modelscope/pipelines/nlp/text_generation_pipeline.py
#	modelscope/pipelines/nlp/word_segmentation_pipeline.py
#	modelscope/pipelines/nlp/zero_shot_classification_pipeline.py
#	modelscope/preprocessors/nlp.py
#	modelscope/task_datasets/__init__.py
#	modelscope/trainers/trainer.py
#	modelscope/trainers/utils/inference.py
#	modelscope/utils/file_utils.py
#	requirements/nlp.txt
#	tests/pipelines/test_nli.py
#	tests/pipelines/test_sentence_similarity.py
#	tests/pipelines/test_sentiment_classification.py

* fix imports

* mark backbone in their own modeling

* pre-commit check passed

* pre-commit passed, remove roberta model

* fix a bug in ast import

* skip all finetune uts

* fix bugs

* pre-commit passed

* bug fixed

* bug fixed

* bug fixed

* bug fixed

* fix ut bug

* fix bug

* fix ut bug

* fix bug

* fix bug

* fixbugs

* fixbug

* revert veco

* revert veco because of core dump

* fix palm bug

* revert veco

* revert mistaken code

* add a test print

* pre-commit check

* test exception

* add test code

* for test

* fix bug and test

* remove test code

* remove useless file

* 1. fix some bugs 2. add backbone ut

* Merge branch 'master' into feat/finetune_refactor_730

# Conflicts:
#	modelscope/metainfo.py
#	modelscope/metrics/sequence_classification_metric.py
#	modelscope/models/nlp/__init__.py
#	modelscope/models/nlp/task_models/task_model.py
#	modelscope/preprocessors/__init__.py
#	modelscope/preprocessors/nlp.py
#	modelscope/trainers/trainer.py
#	modelscope/trainers/utils/inference.py
#	modelscope/utils/file_utils.py
#	tests/trainers/test_trainer_with_nlp.py

* pre-commit passed

* revert files

* increase test level

* unregister models

* fix bugs

* fix cr comments

* fix bug in backbone-head

* add sbert backbone

* fix bug

* add test for token-cls-metric

* pre-commit passed

* fix ut comments

* revert normal tokenizer to fast tokenizer

* Merge branch 'master' into feat/finetune_refactor_730

# Conflicts:
#	modelscope/models/nlp/__init__.py
#	modelscope/models/nlp/backbones/__init__.py
#	modelscope/models/nlp/backbones/structbert/__init__.py
#	modelscope/models/nlp/masked_language.py
#	modelscope/models/nlp/palm_v2/palm_for_text_generation.py
#	modelscope/models/nlp/sbert_for_sequence_classification.py
#	modelscope/models/nlp/sbert_for_token_classification.py
#	modelscope/models/nlp/sbert_for_zero_shot_classification.py
#	modelscope/pipelines/nlp/text_generation_pipeline.py
#	modelscope/preprocessors/nlp.py
#	modelscope/trainers/trainer.py
#	modelscope/trainers/utils/inference.py

* fix merge bugs

* pre commit passed

* fix bug

* fix bug

* fix bug

* fix bug from master

* add print

* fix ut bug

* fix bug

* Merge branch 'master' into feat/finetune_refactor_730

* skip task model test
2022-08-03 18:38:41 +08:00
wenmeng.zwm
d55525bfb6 [to #43112771] requirements check and lazy import support 2022-07-27 17:29:16 +08:00
zhangzhicheng.zzc
cf194ef6cd [to #42322933] nlp preprocessor refactor
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9269314

    * init

* token to ids

* add model

* model forward ready

* add intent

* intent preprocessor ready

* intent success

* merge master

* test with model hub

* add flake8

* update

* update

* update

* Merge branch 'master' into nlp/space/gen

* delete file about gen

* init

* fix flake8 bug

* [to #42322933] init

* bug fix

* [to #42322933] init

* update pipeline registry info

* Merge remote-tracking branch 'origin/master' into feat/nli

* [to #42322933] init

* [to #42322933] init

* modify forward

* [to #42322933] init

* generation ready

* init

* Merge branch 'master' into feat/zero_shot_classification

# Conflicts:
#	modelscope/preprocessors/__init__.py

* [to #42322933] bugfix

* [to #42322933] pre commit fix

* fill mask

* registry multi models on model and pipeline

* add tests

* test level >= 0

* local gen ready

* merge with master

* dialog modeling ready

* fix comments: rename and refactor AliceMindMLM; adjust pipeline

* space intent and modeling(generation) are ready

* bug fix

* add dep

* add dep

* support dst data processor

* merge with nlp/space/dst

* merge with master

* Merge remote-tracking branch 'origin' into feat/fill_mask

Conflicts:
	modelscope/models/nlp/__init__.py
	modelscope/pipelines/builder.py
	modelscope/pipelines/outputs.py
	modelscope/preprocessors/nlp.py
	requirements/nlp.txt

* merge with master

* merge with master 2/2

* fix comments

* fix isort for pre-commit check

* allow params pass to pipeline's __call__ method

* Merge remote-tracking branch 'origin/master' into feat/zero_shot_classification

* merge with nli task

* merge with sentiment_classification

* merge with zero_shot_classfication

* merge with fill_mask

* merge with space

* merge with master head

* Merge remote-tracking branch 'origin' into feat/fill_mask

Conflicts:
	modelscope/utils/constant.py

* fix: pipeline module_name from model_type to 'fill_mask' & fix merge bug

* unfiinished change

* fix bug

* unfinished

* unfinished

* revise modelhub dependency

* Merge branch 'feat/nlp_refactor' of http://gitlab.alibaba-inc.com/Ali-MaaS/MaaS-lib into feat/nlp_refactor

* add eval() to pipeline call

* add test level

* ut run passed

* add default args

* tmp

* merge master

* all ut passed

* remove an useless enum

* revert a mis modification

* revert a mis modification

* Merge commit 'ace8af92465f7d772f035aebe98967726655f12c' into feat/nlp

* commit 'ace8af92465f7d772f035aebe98967726655f12c':
  [to #42322933] Add cv-action-recongnition-pipeline to maas lib
  [to #42463204]  support Pil.Image for image_captioning_pipeline
  [to #42670107] restore pydataset test
  [to #42322933] add create if not exist and add(back) create model example         Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9130661
  [to #41474818]fix: fix errors in task name definition

# Conflicts:
#	modelscope/pipelines/builder.py
#	modelscope/utils/constant.py

* Merge branch 'feat/nlp' into feat/nlp_refactor

* feat/nlp:
  [to #42322933] Add cv-action-recongnition-pipeline to maas lib
  [to #42463204]  support Pil.Image for image_captioning_pipeline
  [to #42670107] restore pydataset test
  [to #42322933] add create if not exist and add(back) create model example         Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9130661
  [to #41474818]fix: fix errors in task name definition

# Conflicts:
#	modelscope/pipelines/builder.py

* fix compile bug

* refactor space

* Merge branch 'feat/nlp_refactor' of http://gitlab.alibaba-inc.com/Ali-MaaS/MaaS-lib into feat/nlp_refactor

* Merge remote-tracking branch 'origin' into feat/fill_mask

* fix

* pre-commit lint

* lint file

* lint file

* lint file

* update modelhub dependency

* lint file

* ignore dst_processor temporary

* solve comment: 1. change MaskedLMModelBase to MaskedLanguageModelBase 2. remove a useless import

* recommit

* remove MaskedLanguageModel from __all__

* Merge commit '1a0d4af55a2eee69d89633874890f50eda8f8700' into feat/nlp_refactor

* commit '1a0d4af55a2eee69d89633874890f50eda8f8700':
  [to #42322933] test level check         Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9143809
  [to #42322933] update nlp models name in metainfo         Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9134657

# Conflicts:
#	modelscope/metainfo.py

* update

* revert pipeline params update

* remove zeroshot

* update sequence classfication outpus

* merge with fill mask

* Merge remote-tracking branch 'origin' into feat/fill_mask

* fix

* fix flake8 warning of dst

* Merge remote-tracking branch 'origin/feat/fill_mask' into feat/nlp

* merge with master

* remove useless test.py

* Merge remote-tracking branch 'origin/master' into feat/nlp

* remove unformatted space trainer

* revise based on comment except chinease comment

* skip ci blocking

* translation pipeline

* csanmt model for translation pipeline

* update

* update

* update builder.py

* change Chinese notes of space3.0 into English

* translate chinese comment to english

* add space to metainfo

* update casnmt_translation

* update csanmt transformer

* merge with master

* update csanmt translation

* update lint

* update metainfo.py

* Update translation_pipeline.py

* Update builder.py

* fix: 1. make csanmt derived from Model 2. add kwargs to prevent from call error

* pre-commit check

* temp exclue flake8

* temp ignore translation files

* fix bug

* pre-commit passed

* fixbug

* fixbug

* revert pre commit ignorance

* pre-commit passed

* fix bug

* merge with master

* add missing setting

* merge with master

* add outputs

* modify test level

* modify chinese comment

* remove useless doc

* space outputs normalization

* Merge remote-tracking branch 'origin/master' into nlp/translation

* update translation_pipeline.py

* Merge remote-tracking branch 'origin/master' into feat/nlp

* Merge remote-tracking branch 'origin/master' into nlp/translation

* add new __init__ method

* add new __init__ method

* update output format

* Merge remote-tracking branch 'origin/master' into feat/nlp

* update output format

* merge with master

* merge with nlp/translate

* update the translation comment

* update the translation comment

* Merge branch 'nlp/translation' into feat/nlp

* Merge remote-tracking branch 'origin/master' into feat/nlp

* Merge remote-tracking branch 'origin/master' into feat/nlp

* nlp preprocessor refactor

* add get_model_type in util.hub

* update the default preprocessor args

* update the fill mask preprocessor

* bug typo fixed
2022-07-05 20:40:48 +08:00
zhangzhicheng.zzc
5b98cc1513 [to #42322933]新增:nli,sentiment_classification,dialog_intent,dialog_modeling
添加了,nli,sentiment_classification, dialog_intent, dialog_modeling几个pipeline。同时加入了nlp里面sequence classification一些简单的抽象。
去掉了zero_shot_classification
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9159089
2022-07-01 23:13:03 +08:00
yingda.chen
1a0d4af55a [to #42322933] test level check
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9143809
2022-06-23 16:55:48 +08:00
mulin.lyh
76c6ff6329 [to #42675838]merge model hub code
合并model hub 代码
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9096493
2022-06-21 20:04:25 +08:00
zhangzhicheng.zzc
eb3209a79a [to #42322933]中文分词
chinese word segmentation
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9051491

    * add word segmentation

* Merge branch 'master' of http://gitlab.alibaba-inc.com/Ali-MaaS/MaaS-lib

* test with model hub

* merge with master

* update some description and test levels

* adding purge logic in test

* merge with master

* update variables definition

* generic word segmentation model as token classification model

* add output check
2022-06-17 14:00:31 +08:00