modelscope

mirror of https://github.com/modelscope/modelscope.git synced 2025-12-17 16:57:42 +01:00

Author	SHA1	Message	Date
xingjun.wang	48c0d2a9af	add 1.6	2023-05-22 10:53:18 +08:00
yuze.zyz	bb5512d1ab	[to #42322933 ] Refactor NLP and fix some user feedbacks 1. Abstract keys of dicts needed by nlp metric classes into the init method 2. Add Preprocessor.save_pretrained to save preprocessor information 3. Abstract the config saving function, which can lead to normally saving in the direct call of from_pretrained, and the modification of cfg one by one when training. 4. Remove SbertTokenizer and VecoTokenizer, use transformers' tokenizers instead 5. Use model/preprocessor's from_pretrained in all nlp pipeline classes. 6. Add model_kwargs and preprocessor_kwargs in all nlp pipeline classes 7. Add base classes for fill-mask and text-classification preprocessor, as a demo for later changes 8. Fix user feedback: Re-train the model in continue training scenario 9. Fix user feedback: Too many checkpoint saved 10. Simplify the nlp-trainer 11. Fix user feedback: Split the default trainer's __init__ method, which makes user easier to override 12. Add safe_get to Config class ---------------------------- Another refactor from version 36 ------------------------- 13. Name all nlp transformers' preprocessors from TaskNamePreprocessor to TaskNameTransformersPreprocessor, for example: TextClassificationPreprocessor -> TextClassificationTransformersPreprocessor 14. Add a base class per task for all nlp tasks' preprocessors which has at least two sub-preprocessors 15. Add output classes of nlp models 16. Refactor the logic for token-classification 17. Fix bug: checkpoint_hook does not support pytorch_model.pt 18. Fix bug: Pipeline name does not match with task name, so inference will not succeed after training NOTE: This is just a stop bleeding solution, the root cause is the uncertainty of the relationship between models and pipelines Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10723513 * add save_pretrained to preprocessor * save preprocessor config in hook * refactor label-id mapping fetching logic * test ok on sentence-similarity * run on finetuning * fix bug * pre-commit passed * fix bug * Merge branch 'master' into feat/refactor_config # Conflicts: # modelscope/preprocessors/nlp/nlp_base.py * add params to init * 1. support max ckpt num 2. support ignoring others but bin file in continue training 3. add arguments to some nlp metrics * Split trainer init impls to overridable methods * remove some obsolete tokenizers * unfinished * support input params in pipeline * fix bugs * fix ut bug * fix bug * fix ut bug * fix ut bug * fix ut bug * add base class for some preprocessors * Merge commit '379867739548f394d0fa349ba07afe04adf4c8b6' into feat/refactor_config * compatible with old code * fix ut bug * fix ut bugs * fix bug * add some comments * fix ut bug * add a requirement * fix pre-commit * Merge commit '0451b3d3cb2bebfef92ec2c227b2a3dd8d01dc6a' into feat/refactor_config * fixbug * Support function type in registry * fix ut bug * fix bug * Merge commit '5f719e542b963f0d35457e5359df879a5eb80b82' into feat/refactor_config # Conflicts: # modelscope/pipelines/nlp/multilingual_word_segmentation_pipeline.py # modelscope/pipelines/nlp/named_entity_recognition_pipeline.py # modelscope/pipelines/nlp/word_segmentation_pipeline.py # modelscope/utils/hub.py * remove obsolete file * rename init args * rename params * fix merge bug * add default preprocessor config for ner-model * move a method a util file * remove unused config * Fix a bug in pbar * bestckptsaver:change default ckpt numbers to 1 * 1. Add assert to max_epoch 2. split init_dist and get_device 3. change cmp func name * Fix bug * fix bug * fix bug * unfinished refactoring * unfinished * uw * uw * uw * uw * Merge branch 'feat/refactor_config' into feat/refactor_trainer # Conflicts: # modelscope/preprocessors/nlp/document_segmentation_preprocessor.py # modelscope/preprocessors/nlp/faq_question_answering_preprocessor.py # modelscope/preprocessors/nlp/relation_extraction_preprocessor.py # modelscope/preprocessors/nlp/text_generation_preprocessor.py * uw * uw * unify nlp task outputs * uw * uw * uw * uw * change the order of text cls pipeline * refactor t5 * refactor tg task preprocessor * fix * unfinished * temp * refactor code * unfinished * unfinished * unfinished * unfinished * uw * Merge branch 'feat/refactor_config' into feat/refactor_trainer * smoke test pass * ut testing * pre-commit passed * Merge branch 'master' into feat/refactor_config # Conflicts: # modelscope/models/nlp/bert/document_segmentation.py # modelscope/pipelines/nlp/__init__.py # modelscope/pipelines/nlp/document_segmentation_pipeline.py * merge master * unifnished * Merge branch 'feat/fix_bug_pipeline_name' into feat/refactor_config * fix bug * fix ut bug * support ner batch inference * fix ut bug * fix bug * support batch inference on three nlp tasks * unfinished * fix bug * fix bug * Merge branch 'master' into feat/refactor_config # Conflicts: # modelscope/models/base/base_model.py # modelscope/pipelines/nlp/conversational_text_to_sql_pipeline.py # modelscope/pipelines/nlp/dialog_intent_prediction_pipeline.py # modelscope/pipelines/nlp/dialog_modeling_pipeline.py # modelscope/pipelines/nlp/dialog_state_tracking_pipeline.py # modelscope/pipelines/nlp/document_segmentation_pipeline.py # modelscope/pipelines/nlp/faq_question_answering_pipeline.py # modelscope/pipelines/nlp/feature_extraction_pipeline.py # modelscope/pipelines/nlp/fill_mask_pipeline.py # modelscope/pipelines/nlp/information_extraction_pipeline.py # modelscope/pipelines/nlp/named_entity_recognition_pipeline.py # modelscope/pipelines/nlp/sentence_embedding_pipeline.py # modelscope/pipelines/nlp/summarization_pipeline.py # modelscope/pipelines/nlp/table_question_answering_pipeline.py # modelscope/pipelines/nlp/text2text_generation_pipeline.py # modelscope/pipelines/nlp/text_classification_pipeline.py # modelscope/pipelines/nlp/text_error_correction_pipeline.py # modelscope/pipelines/nlp/text_generation_pipeline.py # modelscope/pipelines/nlp/text_ranking_pipeline.py # modelscope/pipelines/nlp/token_classification_pipeline.py # modelscope/pipelines/nlp/word_segmentation_pipeline.py # modelscope/pipelines/nlp/zero_shot_classification_pipeline.py # modelscope/trainers/nlp_trainer.py * pre-commit passed * fix bug * Merge branch 'master' into feat/refactor_config # Conflicts: # modelscope/preprocessors/__init__.py * fix bug * fix bug * fix bug * fix bug * fix bug * fixbug * pre-commit passed * fix bug * fixbug * fix bug * fix bug * fix bug * fix bug * self review done * fixbug * fix bug * fix bug * fix bugs * remove sub-token offset mapping * fix name bug * add some tests * 1. support batch inference of text-generation,text2text-generation,token-classification,text-classification 2. add corresponding UTs * add old logic back * tmp save * add tokenize by words logic back * move outputs file back * revert veco token-classification back * fix typo * Fix description * Merge commit '4dd99b8f6e4e7aefe047c68a1bedd95d3ec596d6' into feat/refactor_config * Merge branch 'master' into feat/refactor_config # Conflicts: # modelscope/pipelines/builder.py	2022-11-30 23:52:17 +08:00
yuze.zyz	605cd7f44a	[to #42322933 ] NLP 1030 Refactor Features: 1. Refactor the directory structure of nlp models. All model files are placed into either the model folder or the task_model folder 2. Refactor all the comments to google style 3. Add detail comments to important tasks and nlp models, to list the description of the model, and its preprocessor&trainer 4. Model Exporting now supports a direct all to TorchModelExporter(no need to derive from it) 5. Refactor model save_pretrained method to support direct running(independent from trainer) 6. Remove the judgement of Model in the pipeline base class, to support outer register models running in our pipelines 7. Nlp trainer now has a NLPTrainingArguments class , user can pass arguments into the dataclass, and use it as a normal cfg_modify_fn, to simplify the operation of modify cfg. 8. Merge the BACKBONES and the MODELS, so user can get a backbone with the Model.from_pretrained call 9. Model.from_pretrained now support a task argument, so user can use a backbone and load it with a specific task class. 10. Support Preprocessor.from_pretrained method 11. Add standard return classes to important nlp tasks, so some of the pipelines and the models are independent now, the return values of the models will always be tensors, and the pipelines will take care of the conversion to numpy and the following stuffs. 12. Split the file of the nlp preprocessors, to make the dir structure more clear. Bugs Fixing: 1. Fix a bug that lr_scheduler can be called earlier than the optimizer's step 2. Fix a bug that the direct call of Pipelines (not from pipeline(xxx)) throws error 3. Fix a bug that the trainer will not call the correct TaskDataset class 4. Fix a bug that the internal loading of dataset will throws error in the trainer class Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10490585	2022-10-25 12:26:25 +08:00
yuze.zyz	707cbef013	[to #42322933 ]Fix bug in daily UT Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10491891	2022-10-22 23:25:18 +08:00
zhangzhicheng.zzc	d721fabb34	[to #42322933 ]bert with sequence classification / token classification/ fill mask refactor 1.新增支持原始bert模型（非easynlp的 backbone prefix版本） 2.支持bert的在sequence classification/fill mask /token classification上的backbone head形式 3.统一了sequence classification几个任务的pipeline到一个类 4.fill mask 支持backbone head形式 5.token classification的几个子任务（ner，word seg， part of speech）的preprocessor 统一到了一起TokenClassificationPreprocessor 6. sequence classification的几个子任务（single classification， pair classification）的preprocessor 统一到了一起SequenceClassificationPreprocessor 7. 改动register中 cls的group_key 赋值位置，之前的group_key在多个decorators的情况下，会被覆盖，obj_cls的group_key信息不正确 8. 基于backbone head形式将原本group_key和 module同名的情况尝试做调整，如下在modelscope/pipelines/nlp/sequence_classification_pipeline.py 中原本 @PIPELINES.register_module( Tasks.sentiment_classification, module_name=Pipelines.sentiment_classification) 改成 @PIPELINES.register_module( Tasks.text_classification, module_name=Pipelines.sentiment_classification) 相应的configuration.json也有改动，这样的改动更符合任务和pipline（子任务）的关系。 8. 其他相应改动为支持上述功能 Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10041463	2022-09-27 23:08:33 +08:00
Yingda Chen	e0ef60ca9b	[to #42322933 ] skip demo test by default	2022-09-09 14:56:33 +08:00
lingcai.wl	7a49fa1cc6	[to #44657982 ] add unittest for demo and demotest utils unittest for demo service Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10006180	2022-09-08 14:08:51 +08:00
yingda.chen	45620dbc7f	[to #42322933 ]clean up test level Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9662182 * clean up test level	2022-08-06 12:22:17 +08:00
wenmeng.zwm	d55525bfb6	[to #43112771 ] requirements check and lazy import support	2022-07-27 17:29:16 +08:00
wenmeng.zwm	4814b198f0	[to #43112534 ] taskdataset refine and auto placement for data and model * refine taskdataset interface * add device placement for trainer * add device placement for pipeline * add config checker and fix model placement bug * fix cycling import * refactor model init for translation_pipeline * cv pipelines support kwargs Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9463076	2022-07-23 11:08:43 +08:00
feiwu.yfw	2c3875c0e1	[to #43299989 ] Fix msdataset * fix msdataset Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9436292 * fix msdataset	2022-07-20 16:38:15 +08:00
feiwu.yfw	5da470fd5d	[to #42791465 , #42779255 , #42777959 , #42757844 , #42756050 , #42746916 , #42743595 , #42791863 ] fix: fix msdataset Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9174075 * fix msdataset	2022-06-28 20:40:57 +08:00
yingda.chen	6702b29e21	[to #42794773 ]rename pydataset to msdataset Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9165402	2022-06-27 11:09:38 +08:00
Yingda Chen	b6e3fd80b0	Revert "[to #42794773 ] rename pydataset to msdataset" This reverts commit `c8e2e6de0e`.	2022-06-25 08:50:28 +08:00
Yingda Chen	c8e2e6de0e	[to #42794773 ] rename pydataset to msdataset	2022-06-25 08:36:48 +08:00
yingda.chen	e7571a566f	[to #42322933 ] skip dataset test for now Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9155405	2022-06-24 11:47:28 +08:00
yingda.chen	1a0d4af55a	[to #42322933 ] test level check Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9143809	2022-06-23 16:55:48 +08:00
wenmeng.zwm	e288cf076e	[to #42362853 ] refactor pipeline and standardize module_name * using get_model to validate hub path * support reading pipeline info from configuration file * add metainfo const * update model type and pipeline type and fix UT * relax requimrent for protobuf * skip two dataset tests due to temporal failure Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9118154	2022-06-22 14:15:32 +08:00
mulin.lyh	76c6ff6329	[to #42675838 ]merge model hub code 合并model hub 代码 Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9096493	2022-06-21 20:04:25 +08:00
feiwu.yfw	c7238a470b	[to #42670107 ]pydataset fetch data from datahub * pydataset fetch data from datahub Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9060856	2022-06-21 11:10:28 +08:00
wenmeng.zwm	c59833c7ee	[to #42461396 ] feat: test_level support * add test level support * update develop doc Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9021354	2022-06-15 14:53:49 +08:00
yingda.chen	b31c86aa0e	[to #42409340 ] add hub specifier Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9005038	2022-06-13 14:15:54 +08:00
wenmeng.zwm	1f6b376599	[to #42373878 ] refactor maaslib to modelscope 1. refactor maaslib to modelscope 2. fix UT error 3. support pipeline which does not register default model Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/8988388	2022-06-09 20:16:26 +08:00
wenmeng.zwm	dd00195814	[to #42362853 ] add default model support and fix circular import 1. add default model support 2. fix circular import 3. temporarily skip ofa and palm test which costs too much time Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/8981076	2022-06-09 16:57:33 +08:00
yingda.chen	0d840d519c	[to #42339763 ] move pydataset into maas_lib Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/8974892	2022-06-09 10:14:48 +08:00
yingda.chen	e3b8ec3bf1	[to #42339559 ] support multiple models Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/8972440 * [to #42339559] support multiple models	2022-06-08 21:27:14 +08:00
feiwu.yfw	235880f300	[to #42339763 ] merge pydataset into maas-lib * merge pydataset to the repo Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/8955999	2022-06-08 18:29:39 +08:00
yingda.chen	d6868ddffe	[to #42323743 ] retain local cached model files by default Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/8963687	2022-06-08 14:22:23 +08:00
yingda.chen	e075ad2245	[to #42322515 ]support plain pipeline for bert Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/8945177 * support plain pipeline for bert	2022-06-08 11:29:25 +08:00
yingda.chen	f8eb699f7f	refine tests and examples Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/8898823	2022-06-01 10:20:53 +08:00
wenmeng.zwm	1d01a78c2b	fix: UT error Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/8899458 * fix: UT error	2022-06-01 09:16:39 +08:00
yingda.chen	5995cc4607	add PyDataset support Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/8868644	2022-05-31 18:27:19 +08:00
wenmeng.zwm	25a2028b54	[to #41401401 ] modelhub and Trainer support * add trainer interface * add trainer script * add model init support for pipelineadd pipeline tutorial and fix bugs * add text classification evaluation to maas lib * add quickstart and prepare env doc * relax requirements for torch and sentencepiece * merge release/0.1 and fix conflict * modelhub support for model and pipeline Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/8868339	2022-05-30 11:53:53 +08:00
wenmeng.zwm	cb416edc2a	[to #41669377 ] add pipeline tutorial and fix bugs 1. add pipleine tutorial 2. fix bugs when using pipeline with certain model and preprocessor Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/8810524	2022-05-24 17:14:58 +08:00
wenmeng.zwm	5e469008fd	[to #41401401 ] add preprocessor, model and pipeline * add preprocessor module * add model base and builder * update task constant * add load image preprocessor and its dependency * add pipeline interface and UT covered * support default pipeline for task * add image matting pipeline * refine nlp tokenize interface * add nlp pipeline * fix UT failed * add test for Compose Link: https://code.aone.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/8769235 * add preprocessor module * add test for Compose * fix citest error * fix abs class error * add model base and builder * update task constant * add load image preprocessor and its dependency * add pipeline interface and UT covered * support default pipeline for task * refine models and pipeline interface * add pipeline folder structure * add image matting pipeline * refine nlp tokenize interface * add nlp pipeline 1.add preprossor model pipeline for nlp text classification 2. add corresponding test Link: https://code.aone.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/8757371 * new nlp pipeline * format pre-commit code * update easynlp pipeline * update model_name for easynlp pipeline; add test for maas_lib/utils/typeassert.py * update test_typeassert.py * refactor code 1. rename typeassert to type_assert 2. use lazy import to make easynlp dependency optional 3. refine image matting UT * fix linter test failed * update requirements.txt * fix UT failed * fix citest script to update requirements	2022-05-19 22:18:35 +08:00

35 Commits