[to #43726282] fix bugs and refine docs

1. remove pai-easynlp temporarily due to its hard dependency on scipy==1.5.4
2. fix sentiment classification output
3. update quickstart and trainer doc

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9646399
This commit is contained in:
wenmeng.zwm
2022-08-04 22:38:31 +08:00
parent 845cc869ca
commit 49192f94be
5 changed files with 12 additions and 22 deletions

View File

@@ -1,7 +1,7 @@
# 快速开始
ModelScope Library目前支持tensorflowpytorch深度学习框架进行模型训练、推理 在Python 3.7+, Pytorch 1.8+, Tensorflow1.13-1.15Tensorflow 2.x上测试可运行。
ModelScope Library目前支持tensorflowpytorch深度学习框架进行模型训练、推理 在Python 3.7+, Pytorch 1.8+, Tensorflow1.15Tensorflow 2.x上测试可运行。
注: 当前630版本 `语音相关`的功能仅支持 python3.7,tensorflow1.13-1.15的`linux`环境使用。 其他功能可以在windows、mac上安装使用。
注: `语音相关`的功能仅支持 python3.7,tensorflow1.15的`linux`环境使用。 其他功能可以在windows、mac上安装使用。
## python环境配置
首先,参考[文档](https://docs.anaconda.com/anaconda/install/) 安装配置Anaconda环境

View File

@@ -8,22 +8,10 @@ Modelscope提供了众多预训练模型你可以使用其中任意一个
在开始Finetuning前需要准备一个数据集用以训练和评估详细可以参考数据集使用教程。
`临时写法`我们通过数据集接口创建一个虚假的dataset
```python
from datasets import Dataset
dataset_dict = {
'sentence1': [
'This is test sentence1-1', 'This is test sentence2-1',
'This is test sentence3-1'
],
'sentence2': [
'This is test sentence1-2', 'This is test sentence2-2',
'This is test sentence3-2'
],
'label': [0, 1, 1]
}
train_dataset = MsDataset.from_hf_dataset(Dataset.from_dict(dataset_dict))
eval_dataset = MsDataset.from_hf_dataset(Dataset.from_dict(dataset_dict))
train_dataset = MsDataset.load'afqmc_small', namespace='modelscope', split='train')
eval_dataset = MsDataset.load('afqmc_small', namespace='modelscope', split='validation')
```
### 训练
ModelScope把所有训练相关的配置信息全部放到了模型仓库下的`configuration.json`因此我们只需要创建Trainer加载配置文件传入数据集即可完成训练。

View File

@@ -141,7 +141,7 @@ class Trainers(object):
Holds the standard trainer name to use for identifying different trainer.
This should be used to register trainers.
For a general Trainer, you can use easynlp-trainer/ofa-trainer.
For a general Trainer, you can use EpochBasedTrainer.
For a model specific Trainer, you can use ${ModelName}-${Task}-trainer.
"""

View File

@@ -214,10 +214,10 @@ TASK_OUTPUTS = {
Tasks.nli: [OutputKeys.SCORES, OutputKeys.LABELS],
# sentiment classification result for single sample
# {
# "labels": ["happy", "sad", "calm", "angry"],
# "scores": [0.9, 0.1, 0.05, 0.05]
# }
# {
# 'scores': [0.07183828949928284, 0.9281617403030396],
# 'labels': ['1', '0']
# }
Tasks.sentiment_classification: [OutputKeys.SCORES, OutputKeys.LABELS],
# zero-shot classification result for single sample

View File

@@ -1,6 +1,8 @@
en_core_web_sm>=2.3.5
fairseq>=0.10.2
pai-easynlp
# temporarily remove pai-easynl due to its hard dependency scipy==1.5.4
# will be added back
# pai-easynlp
# rough-score was just recently updated from 0.0.4 to 0.0.7
# which introduced compatability issues that are being investigated
rouge_score<=0.0.4