Commit Graph

5 Commits

Author SHA1 Message Date
yuze.zyz
e6320f29d3 Small features:
1. Exporting: Support text-classification of bert and tensorflow2.0 models, test cases have been added.
2. Downloading of preprocessor.from_pretrained will ignores some large files which not needed by extension file name.
3. Move sentence-piece-preprocessor to the subclass of text-generation-preprocessor and keep the original name for compatibility.
4. Remove some useless codes in nlp-trainer and trainer.
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11206922
2023-01-09 21:22:07 +08:00
yuze.zyz
4b7e8e89aa [to #42322933] Fix some bugs when downgrade the version of some dependencies
1. Fix bug in model exporting
2. Skip some long trainings in test level 2
3. Refine some comments
4. Fix a bug that mode is not correct when saving checkpoints
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10564716
2022-10-28 21:44:33 +08:00
yuze.zyz
605cd7f44a [to #42322933] NLP 1030 Refactor
Features:
1. Refactor the directory structure of nlp models. All model files are placed into either the model folder or the task_model folder
2. Refactor all the comments to google style
3. Add detail comments to important tasks and nlp models, to list the description of the model, and its preprocessor&trainer
4. Model Exporting now supports a direct all to TorchModelExporter(no need to derive from it)
5. Refactor model save_pretrained method to support direct running(independent from trainer)
6. Remove the judgement of Model in the pipeline base class, to support outer register models running in our pipelines
7. Nlp trainer now has a NLPTrainingArguments class , user can pass arguments into the dataclass, and use it as a normal cfg_modify_fn, to simplify the operation of modify cfg.
8. Merge the BACKBONES and the MODELS, so user can get a backbone with the Model.from_pretrained call
9. Model.from_pretrained now support a task argument, so user can use a backbone and load it with a specific task class.
10. Support Preprocessor.from_pretrained method
11. Add standard return classes to important nlp tasks, so some of the pipelines and the models are independent now, the return values of the models will always be tensors, and the pipelines will take care of the conversion to numpy and the following stuffs.
12. Split the file of the nlp preprocessors, to make the dir structure more clear.

Bugs Fixing:
1. Fix a bug that lr_scheduler can be called earlier than the optimizer's step
2. Fix a bug that the direct call of Pipelines (not from pipeline(xxx)) throws error
3. Fix a bug that the trainer will not call the correct TaskDataset class
4. Fix a bug that the internal loading of dataset will throws error in the trainer class
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10490585
2022-10-25 12:26:25 +08:00
yuze.zyz
707cbef013 [to #42322933]Fix bug in daily UT
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10491891
2022-10-22 23:25:18 +08:00
yuze.zyz
c2b1ff8389 [to #42322933] Add exporter module for onnx,ts and other formats.
1. Add exporter module
2. Move collate_fn out of the base pipeline class for reusing.
3. Add dummy inputs method in nlp tokenization preprocessor base class
4. Support Mapping in tensor numpify and detaching.
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10037704
2022-09-21 14:25:06 +08:00