[to #42322933] add/refactor nlp models source code and finetune

1. add sbert,veco,palm,space source code 2. support sbert sequence classification, token classification finetune 3. support veco sequence classification finetune 4. support palm nlg finetune evaluation result: https://sheet.alibaba-inc.com/#/sheet/f7fdcc7f22bd5105 sheet:Maas 5. add ut for finetunes 6. add veco's taskdataset processor 7. add a common trainer for nlp, and a specific trainer for veco 8. merge some duplicate codes of models, preprocessors, pipelines Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9574105 * add basic class of hook&metrics * pre-commit passed * change some comments * pre commit passed * 1. remove accuracy's groups 2. remove useless hooks 3. simplify priorities * pre-commit passed * fix a comment * Merge branch 'master' into finetune_hooks_metrics # Conflicts: # modelscope/metainfo.py * pre-commit passed * add basic class of hook&metrics * pre-commit passed * change some comments * pre commit passed * 1. remove accuracy's groups 2. remove useless hooks 3. simplify priorities * pre-commit passed * fix a comment * Merge branch 'feat/finetune' of gitlab.alibaba-inc.com:Ali-MaaS/MaaS-lib into feat/finetune * mv hooks related to modelscope/trainers/hooks * mv priority back * add torch mdoel base and test * update hooks, trainer, import_util * add torch epoch based trainer and dis utils * add hooks * fix warmup * format code stype and fix warmup and add warmup unittest * fix impls * pre-commit check passed * update hook and add EpochBasedTrainer * add trainer unittest * Merge branch 'feat/add_hooks' into feat/add_task # Conflicts: # modelscope/models/base_torch.py # modelscope/trainers/hooks/hook.py # modelscope/trainers/trainer.py * update unittest name * rewrite taskdataset to trainer * fix trainer and add unittest * add unittest * code: run to forward * run through... but ugly code * arrange some cls * fix some errs * revert some mistakes * init check in * Merge branch 'feat/add_hooks' into feat/add_task # Conflicts: # modelscope/trainers/trainer.py * test with bigger epoch and size * add the default metrics class * move build metrics code to a method * merge add_task * merge origin add_task * add device initialization * remove preprocessor arg for bool * add task models * move metric collect logic to metrics class * pre-commit passed * fix cr comments * precommit passed * add task models * Merge remote-tracking branch 'origin/feat/add_task' into feat/backbone_head * add comment * change comment formats. * fix comments * fix ut bug * fix comments * add wrapper check * fix comments * pre commit passed * fix cr comments * solve a loop import problem * fix ut bug * fix ut errors * change dummydataset to msdataset * precommit passed * merge add task * backbone-head is build, model is not correctly loaded * model load states matched * result matched * lint * add veco/palm_v2 code * merge master * merge master success running * add repr model name level * Merge branch 'feat/veco_palm' into feat/finetune_sbert_veco * model test for training * add token-classification metric add formal ut * fix running bug * finetune and pipeline are working with backbone-head * add nli * add missing code * finetune and pipeline are working with backbone-head * Merge branch 'feat/backbone_head' of http://gitlab.alibaba-inc.com/Ali-MaaS/MaaS-lib into feat/backbone_head * add a test repo for pr * remove merge conflicted file * remove merge conflicted file 1 * lint check * import error * none type bug fix * forward input unpacking or dict bug * move head into models, add build_backbone with registry, no base method * merge master * feat: 1. add interleave dataset method 2. support multiple dataset in trainer.build_dataset 3. support 3 sub tasks in sequence_classification task * unfinished * update the task model structure in NLP field * merge master * update by comments * keep the default model id as current on production * unfinished * unfinished * veco can run * Merge remote-tracking branch 'origin/master' into feat/backbone_head * add taskmodel for module management * remove forward_input_is_dict * unfinished * token classification started * update base model structure * move space to backbone * remove 'type' in build_from_cfg method * test update * bug fix * on tesing, mess code * Merge branch 'feat/backbone_head' into feat/refactor_nlp_730 # Conflicts: # modelscope/metrics/builder.py # modelscope/models/__init__.py # modelscope/models/nlp/__init__.py # modelscope/preprocessors/nlp.py # modelscope/trainers/trainer.py # requirements/multi-modal.txt * add missing merge * add sofa source code * refactor * add veco task dataset * add veco task dataset * pre-commit passed * fix bug of log * add some features * merge master * bug fix * refine nlp models * fix the training error * unfinished * refactor pipeline * Merge branch 'feat/backbone_head' into feat/refactor_nlp_730 # Conflicts: # modelscope/metrics/builder.py # modelscope/models/nlp/__init__.py # modelscope/models/nlp/backbones/structbert/modeling_sbert.py # modelscope/models/nlp/palm_v2/palm_for_text_generation.py # modelscope/preprocessors/base.py # modelscope/preprocessors/nlp.py # modelscope/trainers/trainer.py * Merge commit 'ab04ceafc5453ce7daa9aa09e37a55f703072a10' into feat/refactor_nlp_730 # Conflicts: # modelscope/metainfo.py # modelscope/metrics/builder.py # modelscope/models/__init__.py # modelscope/models/base/base_torch_model.py # modelscope/models/nlp/__init__.py # modelscope/models/nlp/backbones/space/model/intent_unified_transformer.py # modelscope/models/nlp/backbones/space/model/model_base.py # modelscope/models/nlp/palm_v2/palm_for_text_generation.py # modelscope/models/nlp/sbert_for_sequence_classification.py # modelscope/models/nlp/sequence_classification.py # modelscope/models/nlp/space/__init__.py # modelscope/models/nlp/space_for_dialog_intent_prediction.py # modelscope/models/nlp/space_for_dialog_modeling.py # modelscope/models/nlp/space_for_dialog_state_tracking.py # modelscope/models/nlp/task_model.py # modelscope/pipelines/nlp/sentiment_classification_pipeline.py # modelscope/preprocessors/base.py # modelscope/preprocessors/nlp.py # modelscope/trainers/trainer.py * revert changes * unify sentnece classification postprocess * revert some changes, move some model files * pipeline first case run through * ws pipeline passed * Merge branch 'feat/refactor_nlp_730' into feat/finetune_sbert_veco * finetune * revert code * revert some code * ws finetune started, only the accuracy is weird * Merge branch 'feat/veco_taskdataset' into feat/finetune_sbert_veco # Conflicts: # modelscope/task_datasets/veco_dataset.py # tests/taskdataset/test_veco_dataset.py * veco+nli finetune started * Merge branch 'master' into feat/finetune_sbert_veco # Conflicts: # modelscope/models/nlp/sbert_for_sequence_classification.py # modelscope/models/nlp/sbert_for_token_classification.py # modelscope/models/nlp/sbert_for_zero_shot_classification.py # modelscope/models/nlp/space/space_for_dialog_intent_prediction.py # modelscope/models/nlp/space/space_for_dialog_modeling.py # modelscope/trainers/trainer.py * add trainer for nlp * trainer: dataset params passed into preprocessor * test passed by nlptrainer * fix some bugs * fix some bugs * add backbone/head subclass * fix regression bugs * fix bug in token-cls finetune * support cfg modification * fix bug * fix bug * update requirements * add some comments and fix some t * add some comments and revert a argument * split to two test files * revert code * fixbug in precessor (cherry picked from commit 7a648d096ef8500c694d3255dabe29e6f4bfc3e5) * fix ut bug * support sbert models * unfinished * Merge branch 'feat/finetune_sbert_veco' into sly_tmp_veco_finetune # Conflicts: # tests/trainers/test_finetune_sequence_classification.py * fixbug in veco * fix bug * fixbug * correct running params * remove useless files * add palm finetuning with cnn_dailymail dataset * copy space model from sofa * Merge branch 'feat/finetune_sbert_veco' of gitlab.alibaba-inc.com:Ali-MaaS/MaaS-lib into feat/finetune_sbert_veco * Merge branch 'master' into feat/finetune_sbert_veco # Conflicts: # modelscope/metrics/__init__.py # modelscope/models/__init__.py # modelscope/models/nlp/__init__.py # modelscope/models/nlp/backbones/__init__.py # modelscope/models/nlp/backbones/structbert/modeling_sbert.py # modelscope/models/nlp/heads/__init__.py # modelscope/models/nlp/masked_language.py # modelscope/models/nlp/palm_v2/palm_for_text_generation.py # modelscope/models/nlp/sbert_for_nli.py # modelscope/models/nlp/sbert_for_sentence_similarity.py # modelscope/models/nlp/sbert_for_sentiment_classification.py # modelscope/models/nlp/sbert_for_sequence_classification.py # modelscope/models/nlp/sbert_for_token_classification.py # modelscope/models/nlp/sbert_for_zero_shot_classification.py # modelscope/models/nlp/sequence_classification.py # modelscope/models/nlp/space/space_for_dialog_intent_prediction.py # modelscope/models/nlp/space/space_for_dialog_modeling.py # modelscope/models/nlp/space/space_for_dialog_state_tracking.py # modelscope/models/nlp/structbert/adv_utils.py # modelscope/models/nlp/structbert/configuration_sbert.py # modelscope/models/nlp/task_models/task_model.py # modelscope/pipelines/__init__.py # modelscope/pipelines/nlp/__init__.py # modelscope/pipelines/nlp/fill_mask_pipeline.py # modelscope/pipelines/nlp/named_entity_recognition_pipeline.py # modelscope/pipelines/nlp/nli_pipeline.py # modelscope/pipelines/nlp/sentence_similarity_pipeline.py # modelscope/pipelines/nlp/sentiment_classification_pipeline.py # modelscope/pipelines/nlp/text_generation_pipeline.py # modelscope/pipelines/nlp/word_segmentation_pipeline.py # modelscope/pipelines/nlp/zero_shot_classification_pipeline.py # modelscope/preprocessors/nlp.py # modelscope/task_datasets/__init__.py # modelscope/trainers/trainer.py # modelscope/trainers/utils/inference.py # modelscope/utils/file_utils.py # requirements/nlp.txt # tests/pipelines/test_nli.py # tests/pipelines/test_sentence_similarity.py # tests/pipelines/test_sentiment_classification.py * fix imports * mark backbone in their own modeling * pre-commit check passed * pre-commit passed, remove roberta model * fix a bug in ast import * skip all finetune uts * fix bugs * pre-commit passed * bug fixed * bug fixed * bug fixed * bug fixed * fix ut bug * fix bug * fix ut bug * fix bug * fix bug * fixbugs * fixbug * revert veco * revert veco because of core dump * fix palm bug * revert veco * revert mistaken code * add a test print * pre-commit check * test exception * add test code * for test * fix bug and test * remove test code * remove useless file * 1. fix some bugs 2. add backbone ut * Merge branch 'master' into feat/finetune_refactor_730 # Conflicts: # modelscope/metainfo.py # modelscope/metrics/sequence_classification_metric.py # modelscope/models/nlp/__init__.py # modelscope/models/nlp/task_models/task_model.py # modelscope/preprocessors/__init__.py # modelscope/preprocessors/nlp.py # modelscope/trainers/trainer.py # modelscope/trainers/utils/inference.py # modelscope/utils/file_utils.py # tests/trainers/test_trainer_with_nlp.py * pre-commit passed * revert files * increase test level * unregister models * fix bugs * fix cr comments * fix bug in backbone-head * add sbert backbone * fix bug * add test for token-cls-metric * pre-commit passed * fix ut comments * revert normal tokenizer to fast tokenizer * Merge branch 'master' into feat/finetune_refactor_730 # Conflicts: # modelscope/models/nlp/__init__.py # modelscope/models/nlp/backbones/__init__.py # modelscope/models/nlp/backbones/structbert/__init__.py # modelscope/models/nlp/masked_language.py # modelscope/models/nlp/palm_v2/palm_for_text_generation.py # modelscope/models/nlp/sbert_for_sequence_classification.py # modelscope/models/nlp/sbert_for_token_classification.py # modelscope/models/nlp/sbert_for_zero_shot_classification.py # modelscope/pipelines/nlp/text_generation_pipeline.py # modelscope/preprocessors/nlp.py # modelscope/trainers/trainer.py # modelscope/trainers/utils/inference.py * fix merge bugs * pre commit passed * fix bug * fix bug * fix bug * fix bug from master * add print * fix ut bug * fix bug * Merge branch 'master' into feat/finetune_refactor_730 * skip task model test
2025-12-16 08:17:45 +01:00 · 2022-08-03 18:38:41 +08:00
parent fb51e580b3
commit 21fa71baf0
124 changed files with 8555 additions and 1909 deletions
--- a/configs/nlp/sbert_sentence_similarity.json
+++ b/configs/nlp/sbert_sentence_similarity.json
@@ -2,7 +2,7 @@
    "framework": "pytorch",
    "task": "sentence-similarity",
    "preprocessor": {
-      "type": "bert-seq-cls-tokenizer-finetune",
+      "type": "sen-sim-tokenizer",
      "first_sequence": "sentence1",
      "second_sequence": "sentence2"
    },
--- a/modelscope/hub/utils/utils.py
+++ b/modelscope/hub/utils/utils.py
@@ -4,7 +4,7 @@ from modelscope.hub.constants import (DEFAULT_MODELSCOPE_DOMAIN,
                                      DEFAULT_MODELSCOPE_GROUP,
                                      MODEL_ID_SEPARATOR,
                                      MODELSCOPE_URL_SCHEME)
-from modelscope.utils.utils import get_default_cache_dir
+from modelscope.utils.file_utils import get_default_cache_dir


 def model_id_to_group_owner_name(model_id):
--- a/modelscope/metainfo.py
+++ b/modelscope/metainfo.py
@@ -53,6 +53,10 @@ class TaskModels(object):
 class Heads(object):
    # nlp heads
    text_classification = 'text-classification'
+    # mlm
+    bert_mlm = 'bert-mlm'
+    # roberta mlm
+    roberta_mlm = 'roberta-mlm'


 class Pipelines(object):
@@ -137,7 +141,7 @@ class Trainers(object):
        Holds the standard trainer name to use for identifying different trainer.
    This should be used to register trainers.

-        For a general Trainer, you can use easynlp-trainer/ofa-trainer/sofa-trainer.
+        For a general Trainer, you can use easynlp-trainer/ofa-trainer.
        For a model specific Trainer, you can use ${ModelName}-${Task}-trainer.
    """

@@ -179,6 +183,8 @@ class Preprocessors(object):
    sbert_token_cls_tokenizer = 'sbert-token-cls-tokenizer'
    zero_shot_cls_tokenizer = 'zero-shot-cls-tokenizer'
    text_error_correction = 'text-error-correction'
+    word_segment_text_to_label_preprocessor = 'word-segment-text-to-label-preprocessor'
+    fill_mask = 'fill-mask'

    # audio preprocessor
    linear_aec_fbank = 'linear-aec-fbank'
@@ -204,7 +210,7 @@ class Metrics(object):
    # metric for image instance segmentation task
    image_ins_seg_coco_metric = 'image-ins-seg-coco-metric'
    # metrics for sequence classification task
-    seq_cls_metric = 'seq_cls_metric'
+    seq_cls_metric = 'seq-cls-metric'
    # metrics for token-classification task
    token_cls_metric = 'token-cls-metric'
    # metrics for text-generation task
--- a/modelscope/metrics/init.py
+++ b/modelscope/metrics/init.py
@@ -13,6 +13,7 @@ if TYPE_CHECKING:
    from .image_portrait_enhancement_metric import ImagePortraitEnhancementMetric
    from .sequence_classification_metric import SequenceClassificationMetric
    from .text_generation_metric import TextGenerationMetric
+    from .token_classification_metric import TokenClassificationMetric

 else:
    _import_structure = {
@@ -26,6 +27,7 @@ else:
        ['ImagePortraitEnhancementMetric'],
        'sequence_classification_metric': ['SequenceClassificationMetric'],
        'text_generation_metric': ['TextGenerationMetric'],
+        'token_classification_metric': ['TokenClassificationMetric'],
    }

    import sys
--- a/modelscope/metrics/base.py
+++ b/modelscope/metrics/base.py
@@ -10,6 +10,9 @@ class Metric(ABC):
    complex metrics for a specific task with or without other Metric subclasses.
    """

+    def __init__(self, trainer=None, *args, **kwargs):
+        self.trainer = trainer
+
    @abstractmethod
    def add(self, outputs: Dict, inputs: Dict):
        """ Append logits and labels within an eval loop.
--- a/modelscope/metrics/builder.py
+++ b/modelscope/metrics/builder.py
@@ -20,7 +20,9 @@ class MetricKeys(object):
 task_default_metrics = {
    Tasks.image_segmentation: [Metrics.image_ins_seg_coco_metric],
    Tasks.sentence_similarity: [Metrics.seq_cls_metric],
+    Tasks.nli: [Metrics.seq_cls_metric],
    Tasks.sentiment_classification: [Metrics.seq_cls_metric],
+    Tasks.token_classification: [Metrics.token_cls_metric],
    Tasks.text_generation: [Metrics.text_gen_metric],
    Tasks.image_denoising: [Metrics.image_denoise_metric],
    Tasks.image_color_enhancement: [Metrics.image_color_enhance_metric],
--- a/modelscope/metrics/sequence_classification_metric.py
+++ b/modelscope/metrics/sequence_classification_metric.py
@@ -17,14 +17,14 @@ class SequenceClassificationMetric(Metric):
    """The metric computation class for sequence classification classes.
    """

-    label_name = 'labels'
-
-    def __init__(self):
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
        self.preds = []
        self.labels = []

    def add(self, outputs: Dict, inputs: Dict):
-        ground_truths = inputs[self.label_name]
+        label_name = OutputKeys.LABEL if OutputKeys.LABEL in inputs else OutputKeys.LABELS
+        ground_truths = inputs[label_name]
        eval_results = outputs[OutputKeys.LOGITS]
        self.preds.append(
            torch_nested_numpify(torch_nested_detach(eval_results)))
--- a/modelscope/metrics/token_classification_metric.py
+++ b/modelscope/metrics/token_classification_metric.py
@@ -0,0 +1,123 @@
+import importlib
+from typing import Dict, List, Optional, Union
+
+import numpy as np
+
+from modelscope.outputs import OutputKeys
+from ..metainfo import Metrics
+from ..utils.registry import default_group
+from ..utils.tensor_utils import torch_nested_detach, torch_nested_numpify
+from .base import Metric
+from .builder import METRICS, MetricKeys
+
+
+@METRICS.register_module(
+    group_key=default_group, module_name=Metrics.token_cls_metric)
+class TokenClassificationMetric(Metric):
+    """
+    The metric computation class for token-classification task.
+    Args:
+        return_entity_level_metrics (bool, *optional*):
+            Whether to return every label's detail metrics, default False.
+    """
+
+    def add(self, outputs: Dict, inputs: Dict):
+        label_name = OutputKeys.LABEL if OutputKeys.LABEL in inputs else OutputKeys.LABELS
+        ground_truths = inputs[label_name]
+        eval_results = outputs[OutputKeys.LOGITS]
+        self.preds.append(
+            torch_nested_numpify(torch_nested_detach(eval_results)))
+        self.labels.append(
+            torch_nested_numpify(torch_nested_detach(ground_truths)))
+
+    def __init__(self, return_entity_level_metrics=False, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.return_entity_level_metrics = return_entity_level_metrics
+        self.preds = []
+        self.labels = []
+
+    def evaluate(self):
+        self.id2label = {
+            id: label
+            for label, id in self.trainer.label2id.items()
+        }
+        self.preds = np.concatenate(self.preds, axis=0)
+        self.labels = np.concatenate(self.labels, axis=0)
+        predictions = np.argmax(self.preds, axis=-1)
+
+        true_predictions = [[
+            self.id2label[p] for (p, lb) in zip(prediction, label)
+            if lb != -100
+        ] for prediction, label in zip(predictions, self.labels)]
+        true_labels = [[
+            self.id2label[lb] for (p, lb) in zip(prediction, label)
+            if lb != -100
+        ] for prediction, label in zip(predictions, self.labels)]
+
+        results = self._compute(
+            predictions=true_predictions, references=true_labels)
+        if self.return_entity_level_metrics:
+            final_results = {}
+            for key, value in results.items():
+                if isinstance(value, dict):
+                    for n, v in value.items():
+                        final_results[f'{key}_{n}'] = v
+                else:
+                    final_results[key] = value
+            return final_results
+        else:
+            return {
+                MetricKeys.PRECISION: results[MetricKeys.PRECISION],
+                MetricKeys.RECALL: results[MetricKeys.RECALL],
+                MetricKeys.F1: results[MetricKeys.F1],
+                MetricKeys.ACCURACY: results[MetricKeys.ACCURACY],
+            }
+
+    @staticmethod
+    def _compute(
+        predictions,
+        references,
+        suffix: bool = False,
+        scheme: Optional[str] = None,
+        mode: Optional[str] = None,
+        sample_weight: Optional[List[int]] = None,
+        zero_division: Union[str, int] = 'warn',
+    ):
+        from seqeval.metrics import accuracy_score, classification_report
+        if scheme is not None:
+            try:
+                scheme_module = importlib.import_module('seqeval.scheme')
+                scheme = getattr(scheme_module, scheme)
+            except AttributeError:
+                raise ValueError(
+                    f'Scheme should be one of [IOB1, IOB2, IOE1, IOE2, IOBES, BILOU], got {scheme}'
+                )
+        report = classification_report(
+            y_true=references,
+            y_pred=predictions,
+            suffix=suffix,
+            output_dict=True,
+            scheme=scheme,
+            mode=mode,
+            sample_weight=sample_weight,
+            zero_division=zero_division,
+        )
+        report.pop('macro avg')
+        report.pop('weighted avg')
+        overall_score = report.pop('micro avg')
+
+        scores = {
+            type_name: {
+                MetricKeys.PRECISION: score['precision'],
+                MetricKeys.RECALL: score['recall'],
+                MetricKeys.F1: score['f1-score'],
+                'number': score['support'],
+            }
+            for type_name, score in report.items()
+        }
+        scores[MetricKeys.PRECISION] = overall_score['precision']
+        scores[MetricKeys.RECALL] = overall_score['recall']
+        scores[MetricKeys.F1] = overall_score['f1-score']
+        scores[MetricKeys.ACCURACY] = accuracy_score(
+            y_true=references, y_pred=predictions)
+        return scores
--- a/modelscope/models/base/base_model.py
+++ b/modelscope/models/base/base_model.py
@@ -10,6 +10,8 @@ from modelscope.hub.snapshot_download import snapshot_download
 from modelscope.models.builder import build_model
 from modelscope.utils.config import Config
 from modelscope.utils.constant import DEFAULT_MODEL_REVISION, ModelFile
+from modelscope.utils.file_utils import func_receive_dict_inputs
+from modelscope.utils.hub import parse_label_mapping
 from modelscope.utils.logger import get_logger

 logger = get_logger()
@@ -69,6 +71,7 @@ class Model(ABC):
    def from_pretrained(cls,
                        model_name_or_path: str,
                        revision: Optional[str] = DEFAULT_MODEL_REVISION,
+                        cfg_dict: Config = None,
                        *model_args,
                        **kwargs):
        """ Instantiate a model from local directory or remote model repo. Note
@@ -87,25 +90,25 @@ class Model(ABC):
                )
            local_model_dir = snapshot_download(model_name_or_path, revision)
        logger.info(f'initialize model from {local_model_dir}')
-        cfg = Config.from_file(
-            osp.join(local_model_dir, ModelFile.CONFIGURATION))
+        if cfg_dict is not None:
+            cfg = cfg_dict
+        else:
+            cfg = Config.from_file(
+                osp.join(local_model_dir, ModelFile.CONFIGURATION))
        task_name = cfg.task
        model_cfg = cfg.model
-        assert hasattr(
-            cfg, 'pipeline'), 'pipeline config is missing from config file.'
-        pipeline_cfg = cfg.pipeline
        # TODO @wenmeng.zwm may should manually initialize model after model building

        if hasattr(model_cfg, 'model_type') and not hasattr(model_cfg, 'type'):
            model_cfg.type = model_cfg.model_type

        model_cfg.model_dir = local_model_dir
-
        for k, v in kwargs.items():
            model_cfg[k] = v
        model = build_model(
            model_cfg, task_name=task_name, default_args=kwargs)

        # dynamically add pipeline info to model for pipeline inference
-        model.pipeline = pipeline_cfg
+        if hasattr(cfg, 'pipeline'):
+            model.pipeline = cfg.pipeline
        return model
--- a/modelscope/models/base/base_torch_model.py
+++ b/modelscope/models/base/base_torch_model.py
@@ -5,6 +5,7 @@ from typing import Any, Dict, Optional, Union
 import torch
 from torch import nn

+from modelscope.utils.file_utils import func_receive_dict_inputs
 from modelscope.utils.logger import get_logger
 from .base_model import Model

@@ -20,6 +21,13 @@ class TorchModel(Model, torch.nn.Module):
        super().__init__(model_dir, *args, **kwargs)
        torch.nn.Module.__init__(self)

+    def __call__(self, input: Dict[str,
+                                   torch.Tensor]) -> Dict[str, torch.Tensor]:
+        if func_receive_dict_inputs(self.forward):
+            return self.postprocess(self.forward(input))
+        else:
+            return self.postprocess(self.forward(**input))
+
    def forward(self, inputs: Dict[str,
                                   torch.Tensor]) -> Dict[str, torch.Tensor]:
        raise NotImplementedError
@@ -50,6 +58,3 @@ class TorchModel(Model, torch.nn.Module):
        elif isinstance(module, nn.LayerNorm):
            module.bias.data.zero_()
            module.weight.data.fill_(1.0)
-
-    def compute_loss(self, outputs: Dict[str, Any], labels):
-        raise NotImplementedError()
--- a/modelscope/models/nlp/init.py
+++ b/modelscope/models/nlp/init.py
@@ -4,32 +4,26 @@ from typing import TYPE_CHECKING
 from modelscope.utils.import_utils import LazyImportModule

 if TYPE_CHECKING:
-    from .backbones import (SbertModel, SpaceGenerator, SpaceModelBase,
-                            GPT3Model)
+    from .backbones import SbertModel
    from .heads import SequenceClassificationHead
    from .bert_for_sequence_classification import BertForSequenceClassification
    from .csanmt_for_translation import CsanmtForTranslation
    from .masked_language import (StructBertForMaskedLM, VecoForMaskedLM,
                                  BertForMaskedLM)
    from .nncrf_for_named_entity_recognition import TransformerCRFForNamedEntityRecognition
-    from .palm_for_text_generation import PalmForTextGeneration
-    from .sbert_for_nli import SbertForNLI
-    from .sbert_for_sentence_similarity import SbertForSentenceSimilarity
-    from .sbert_for_sentiment_classification import SbertForSentimentClassification
-    from .sbert_for_token_classification import SbertForTokenClassification
-    from .sbert_for_zero_shot_classification import SbertForZeroShotClassification
-    from .sequence_classification import SequenceClassificationModel
-    from .space_for_dialog_intent_prediction import SpaceForDialogIntent
-    from .space_for_dialog_modeling import SpaceForDialogModeling
-    from .space_for_dialog_state_tracking import SpaceForDialogStateTracking
-    from .task_model import SingleBackboneTaskModelBase
+    from .palm_v2 import PalmForTextGeneration
+    from .token_classification import SbertForTokenClassification
+    from .sequence_classification import VecoForSequenceClassification, SbertForSequenceClassification
+    from .space import SpaceForDialogIntent
+    from .space import SpaceForDialogModeling
+    from .space import SpaceForDialogStateTracking
+    from .task_models.task_model import SingleBackboneTaskModelBase
    from .bart_for_text_error_correction import BartForTextErrorCorrection
-    from .gpt3_for_text_generation import GPT3ForTextGeneration
+    from .gpt3 import GPT3ForTextGeneration

 else:
    _import_structure = {
-        'backbones':
-        ['SbertModel', 'SpaceGenerator', 'SpaceModelBase', 'GPT3Model'],
+        'backbones': ['SbertModel'],
        'heads': ['SequenceClassificationHead'],
        'csanmt_for_translation': ['CsanmtForTranslation'],
        'bert_for_sequence_classification': ['BertForSequenceClassification'],
@@ -37,21 +31,17 @@ else:
        ['StructBertForMaskedLM', 'VecoForMaskedLM', 'BertForMaskedLM'],
        'nncrf_for_named_entity_recognition':
        ['TransformerCRFForNamedEntityRecognition'],
-        'palm_for_text_generation': ['PalmForTextGeneration'],
-        'sbert_for_nli': ['SbertForNLI'],
-        'sbert_for_sentence_similarity': ['SbertForSentenceSimilarity'],
-        'sbert_for_sentiment_classification':
-        ['SbertForSentimentClassification'],
-        'sbert_for_token_classification': ['SbertForTokenClassification'],
-        'sbert_for_zero_shot_classification':
-        ['SbertForZeroShotClassification'],
-        'sequence_classification': ['SequenceClassificationModel'],
-        'space_for_dialog_intent_prediction': ['SpaceForDialogIntent'],
-        'space_for_dialog_modeling': ['SpaceForDialogModeling'],
-        'space_for_dialog_state_tracking': ['SpaceForDialogStateTracking'],
+        'palm_v2': ['PalmForTextGeneration'],
+        'token_classification': ['SbertForTokenClassification'],
+        'sequence_classification':
+        ['VecoForSequenceClassification', 'SbertForSequenceClassification'],
+        'space': [
+            'SpaceForDialogIntent', 'SpaceForDialogModeling',
+            'SpaceForDialogStateTracking'
+        ],
        'task_model': ['SingleBackboneTaskModelBase'],
        'bart_for_text_error_correction': ['BartForTextErrorCorrection'],
-        'gpt3_for_text_generation': ['GPT3ForTextGeneration'],
+        'gpt3': ['GPT3ForTextGeneration'],
    }

    import sys
--- a/modelscope/models/nlp/backbones/init.py
+++ b/modelscope/models/nlp/backbones/init.py
@@ -4,14 +4,10 @@ from typing import TYPE_CHECKING
 from modelscope.utils.import_utils import LazyImportModule

 if TYPE_CHECKING:
-    from .space import SpaceGenerator, SpaceModelBase
    from .structbert import SbertModel
-    from .gpt3 import GPT3Model
 else:
    _import_structure = {
-        'space': ['SpaceGenerator', 'SpaceModelBase'],
        'structbert': ['SbertModel'],
-        'gpt3': ['GPT3Model']
    }

    import sys
--- a/modelscope/models/nlp/backbones/space/init.py
+++ b/modelscope/models/nlp/backbones/space/init.py
@@ -1,2 +0,0 @@
-from .model.generator import Generator as SpaceGenerator
-from .model.model_base import SpaceModelBase
--- a/modelscope/models/nlp/backbones/space/model/init.py
+++ b/modelscope/models/nlp/backbones/space/model/init.py
@@ -1,3 +0,0 @@
-from .gen_unified_transformer import GenUnifiedTransformer
-from .intent_unified_transformer import IntentUnifiedTransformer
-from .unified_transformer import UnifiedTransformer
--- a/modelscope/models/nlp/backbones/structbert.py
+++ b/modelscope/models/nlp/backbones/structbert.py
@@ -0,0 +1,54 @@
+from transformers import PreTrainedModel
+
+from modelscope.metainfo import Models
+from modelscope.models.base import TorchModel
+from modelscope.models.builder import BACKBONES
+from modelscope.models.nlp.structbert import SbertConfig
+from modelscope.models.nlp.structbert import SbertModel as SbertModelTransform
+from modelscope.utils.constant import Fields
+from modelscope.utils.logger import get_logger
+
+logger = get_logger(__name__)
+
+
+@BACKBONES.register_module(Fields.nlp, module_name=Models.structbert)
+class SbertModel(TorchModel, SbertModelTransform):
+
+    def __init__(self, model_dir=None, add_pooling_layer=True, **config):
+        """
+        Args:
+            model_dir (str, optional): The model checkpoint directory. Defaults to None.
+            add_pooling_layer (bool, optional): to decide if pool the output from hidden layer. Defaults to True.
+        """
+        config = SbertConfig(**config)
+        super().__init__(model_dir)
+        self.config = config
+        SbertModelTransform.__init__(self, config, add_pooling_layer)
+
+    def extract_sequence_outputs(self, outputs):
+        return outputs['last_hidden_state']
+
+    def extract_pooled_outputs(self, outputs):
+        return outputs['pooler_output']
+
+    def forward(
+        self,
+        input_ids=None,
+        attention_mask=None,
+        token_type_ids=None,
+        position_ids=None,
+        head_mask=None,
+        inputs_embeds=None,
+        encoder_hidden_states=None,
+        encoder_attention_mask=None,
+        past_key_values=None,
+        use_cache=None,
+        output_attentions=None,
+        output_hidden_states=None,
+        return_dict=None,
+    ):
+        return SbertModelTransform.forward(
+            self, input_ids, attention_mask, token_type_ids, position_ids,
+            head_mask, inputs_embeds, encoder_hidden_states,
+            encoder_attention_mask, past_key_values, use_cache,
+            output_attentions, output_hidden_states, return_dict)
--- a/modelscope/models/nlp/backbones/structbert/init.py
+++ b/modelscope/models/nlp/backbones/structbert/init.py
@@ -1,19 +0,0 @@
-# Copyright (c) Alibaba, Inc. and its affiliates.
-from typing import TYPE_CHECKING
-
-from modelscope.utils.import_utils import LazyImportModule
-
-if TYPE_CHECKING:
-    from .modeling_sbert import SbertModel
-else:
-    _import_structure = {'modeling_sbert': ['SbertModel']}
-
-    import sys
-
-    sys.modules[__name__] = LazyImportModule(
-        __name__,
-        globals()['__file__'],
-        _import_structure,
-        module_spec=__spec__,
-        extra_objects={},
-    )
--- a/modelscope/models/nlp/backbones/structbert/modeling_sbert.py
+++ b/modelscope/models/nlp/backbones/structbert/modeling_sbert.py
@@ -1,815 +0,0 @@
-import math
-from dataclasses import dataclass
-from typing import Optional, Tuple, Union
-
-import torch
-import torch.utils.checkpoint
-from packaging import version
-from torch import nn
-from transformers import PreTrainedModel
-from transformers.activations import ACT2FN
-from transformers.modeling_outputs import (
-    BaseModelOutputWithPastAndCrossAttentions,
-    BaseModelOutputWithPoolingAndCrossAttentions, ModelOutput)
-from transformers.modeling_utils import (apply_chunking_to_forward,
-                                         find_pruneable_heads_and_indices,
-                                         prune_linear_layer)
-
-from modelscope.metainfo import Models
-from modelscope.models.base import TorchModel
-from modelscope.models.builder import BACKBONES
-from modelscope.utils.constant import Fields
-from modelscope.utils.logger import get_logger
-from .configuration_sbert import SbertConfig
-
-logger = get_logger(__name__)
-
-
-@BACKBONES.register_module(Fields.nlp, module_name=Models.structbert)
-class SbertModel(TorchModel, PreTrainedModel):
-    """
-
-    The model can behave as an encoder (with only self-attention) as well as a decoder, in which case a layer of
-    cross-attention is added between the self-attention layers, following the architecture described in `Attention is
-    all you need <https://arxiv.org/abs/1706.03762>`__ by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit,
-    Llion Jones, Aidan N. Gomez, Lukasz Kaiser and Illia Polosukhin.
-
-    To behave as an decoder the model needs to be initialized with the :obj:`is_decoder` argument of the configuration
-    set to :obj:`True`. To be used in a Seq2Seq model, the model needs to initialized with both :obj:`is_decoder`
-    argument and :obj:`add_cross_attention` set to :obj:`True`; an :obj:`encoder_hidden_states` is then expected as an
-    input to the forward pass.
-    """
-
-    def __init__(self, model_dir=None, add_pooling_layer=True, **config):
-        """
-        Args:
-            model_dir (str, optional): The model checkpoint directory. Defaults to None.
-            add_pooling_layer (bool, optional): to decide if pool the output from hidden layer. Defaults to True.
-        """
-        config = SbertConfig(**config)
-        super().__init__(model_dir)
-        self.config = config
-
-        self.embeddings = SbertEmbeddings(config)
-        self.encoder = SbertEncoder(config)
-
-        self.pooler = SbertPooler(config) if add_pooling_layer else None
-        self.init_weights()
-
-    def get_input_embeddings(self):
-        return self.embeddings.word_embeddings
-
-    def set_input_embeddings(self, value):
-        self.embeddings.word_embeddings = value
-
-    def _prune_heads(self, heads_to_prune):
-        """
-        Prunes heads of the model. heads_to_prune: dict of {layer_num: list of heads to prune in this layer} See base
-        class PreTrainedModel
-        """
-        for layer, heads in heads_to_prune.items():
-            self.encoder.layer[layer].attention.prune_heads(heads)
-
-    def forward(self,
-                input_ids=None,
-                attention_mask=None,
-                token_type_ids=None,
-                position_ids=None,
-                head_mask=None,
-                inputs_embeds=None,
-                encoder_hidden_states=None,
-                encoder_attention_mask=None,
-                past_key_values=None,
-                use_cache=None,
-                output_attentions=None,
-                output_hidden_states=None,
-                return_dict=None,
-                **kwargs):
-        r"""
-        encoder_hidden_states  (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`
-        , `optional`):
-            Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if
-            the model is configured as a decoder.
-        encoder_attention_mask (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):
-            Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in
-            the cross-attention if the model is configured as a decoder. Mask values selected in ``[0, 1]``:
-
-            - 1 for tokens that are **not masked**,
-            - 0 for tokens that are **masked**.
-        past_key_values (:obj:`tuple(tuple(torch.FloatTensor))` of length :obj:`config.n_layers`
-        with each tuple having 4 tensors of shape :obj:`(batch_size, num_heads,
-        sequence_length - 1, embed_size_per_head)`):
-            Contains precomputed key and value hidden states of the attention blocks. Can be used to speed up decoding.
-
-            If :obj:`past_key_values` are used, the user can optionally input only the last :obj:`decoder_input_ids`
-            (those that don't have their past key value states given to this model) of shape :obj:`(batch_size, 1)`
-            instead of all :obj:`decoder_input_ids` of shape :obj:`(batch_size, sequence_length)`.
-        use_cache (:obj:`bool`, `optional`):
-            If set to :obj:`True`, :obj:`past_key_values` key value states are returned and can be used to speed up
-            decoding (see :obj:`past_key_values`).
-        """
-
-        output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
-        output_hidden_states = (
-            output_hidden_states if output_hidden_states is not None else
-            self.config.output_hidden_states)
-        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
-
-        if self.config.is_decoder:
-            use_cache = use_cache if use_cache is not None else self.config.use_cache
-        else:
-            use_cache = False
-
-        if input_ids is not None and inputs_embeds is not None:
-            raise ValueError(
-                'You cannot specify both input_ids and inputs_embeds at the same time'
-            )
-        elif input_ids is not None:
-            input_shape = input_ids.size()
-        elif inputs_embeds is not None:
-            input_shape = inputs_embeds.size()[:-1]
-        else:
-            raise ValueError(
-                'You have to specify either input_ids or inputs_embeds')
-
-        batch_size, seq_length = input_shape
-        device = input_ids.device if input_ids is not None else inputs_embeds.device
-
-        # past_key_values_length
-        past_key_values_length = past_key_values[0][0].shape[
-            2] if past_key_values is not None else 0
-
-        if attention_mask is None:
-            attention_mask = torch.ones(
-                ((batch_size, seq_length + past_key_values_length)),
-                device=device)
-
-        if token_type_ids is None:
-            if hasattr(self.embeddings, 'token_type_ids'):
-                buffered_token_type_ids = self.embeddings.token_type_ids[:, :
-                                                                         seq_length]
-                buffered_token_type_ids_expanded = buffered_token_type_ids.expand(
-                    batch_size, seq_length)
-                token_type_ids = buffered_token_type_ids_expanded
-            else:
-                token_type_ids = torch.zeros(
-                    input_shape, dtype=torch.long, device=device)
-
-        # We can provide a self-attention mask of dimensions [batch_size, from_seq_length, to_seq_length]
-        # ourselves in which case we just need to make it broadcastable to all heads.
-        extended_attention_mask: torch.Tensor = self.get_extended_attention_mask(
-            attention_mask, input_shape, device)
-
-        # If a 2D or 3D attention mask is provided for the cross-attention
-        # we need to make broadcastable to [batch_size, num_heads, seq_length, seq_length]
-        if self.config.is_decoder and encoder_hidden_states is not None:
-            encoder_batch_size, encoder_sequence_length, _ = encoder_hidden_states.size(
-            )
-            encoder_hidden_shape = (encoder_batch_size,
-                                    encoder_sequence_length)
-            if encoder_attention_mask is None:
-                encoder_attention_mask = torch.ones(
-                    encoder_hidden_shape, device=device)
-            encoder_extended_attention_mask = self.invert_attention_mask(
-                encoder_attention_mask)
-        else:
-            encoder_extended_attention_mask = None
-
-        # Prepare head mask if needed
-        # 1.0 in head_mask indicate we keep the head
-        # attention_probs has shape bsz x n_heads x N x N
-        # input head_mask has shape [num_heads] or [num_hidden_layers x num_heads]
-        # and head_mask is converted to shape [num_hidden_layers x batch x num_heads x seq_length x seq_length]
-        head_mask = self.get_head_mask(head_mask,
-                                       self.config.num_hidden_layers)
-
-        embedding_output, orignal_embeds = self.embeddings(
-            input_ids=input_ids,
-            position_ids=position_ids,
-            token_type_ids=token_type_ids,
-            inputs_embeds=inputs_embeds,
-            past_key_values_length=past_key_values_length,
-            return_inputs_embeds=True,
-        )
-        encoder_outputs = self.encoder(
-            embedding_output,
-            attention_mask=extended_attention_mask,
-            head_mask=head_mask,
-            encoder_hidden_states=encoder_hidden_states,
-            encoder_attention_mask=encoder_extended_attention_mask,
-            past_key_values=past_key_values,
-            use_cache=use_cache,
-            output_attentions=output_attentions,
-            output_hidden_states=output_hidden_states,
-            return_dict=return_dict,
-        )
-        sequence_output = encoder_outputs[0]
-        pooled_output = self.pooler(
-            sequence_output) if self.pooler is not None else None
-
-        if not return_dict:
-            return (sequence_output,
-                    pooled_output) + encoder_outputs[1:] + (orignal_embeds, )
-
-        return BaseModelOutputWithPoolingAndCrossAttentionsWithEmbedding(
-            last_hidden_state=sequence_output,
-            pooler_output=pooled_output,
-            past_key_values=encoder_outputs.past_key_values,
-            hidden_states=encoder_outputs.hidden_states,
-            attentions=encoder_outputs.attentions,
-            cross_attentions=encoder_outputs.cross_attentions,
-            embedding_output=orignal_embeds)
-
-    def extract_sequence_outputs(self, outputs):
-        return outputs['last_hidden_state']
-
-    def extract_pooled_outputs(self, outputs):
-        return outputs['pooler_output']
-
-
-class SbertEmbeddings(nn.Module):
-    """Construct the embeddings from word, position and token_type embeddings."""
-
-    def __init__(self, config):
-        super().__init__()
-        self.word_embeddings = nn.Embedding(
-            config.vocab_size,
-            config.hidden_size,
-            padding_idx=config.pad_token_id)
-        self.position_embeddings = nn.Embedding(config.max_position_embeddings,
-                                                config.hidden_size)
-        self.token_type_embeddings = nn.Embedding(config.type_vocab_size,
-                                                  config.hidden_size)
-
-        # self.LayerNorm is not snake-cased to stick with TensorFlow model variable name and be able to load
-        # any TensorFlow checkpoint file
-        self.LayerNorm = nn.LayerNorm(
-            config.hidden_size, eps=config.layer_norm_eps)
-        self.dropout = nn.Dropout(config.hidden_dropout_prob)
-        # position_ids (1, len position emb) is contiguous in memory and exported when serialized
-        self.position_embedding_type = getattr(config,
-                                               'position_embedding_type',
-                                               'absolute')
-        self.register_buffer(
-            'position_ids',
-            torch.arange(config.max_position_embeddings).expand((1, -1)))
-        if version.parse(torch.__version__) > version.parse('1.6.0'):
-            self.register_buffer(
-                'token_type_ids',
-                torch.zeros(
-                    self.position_ids.size(),
-                    dtype=torch.long,
-                    device=self.position_ids.device),
-                persistent=False,
-            )
-
-    def forward(self,
-                input_ids=None,
-                token_type_ids=None,
-                position_ids=None,
-                inputs_embeds=None,
-                past_key_values_length=0,
-                return_inputs_embeds=False):
-        if input_ids is not None:
-            input_shape = input_ids.size()
-        else:
-            input_shape = inputs_embeds.size()[:-1]
-
-        seq_length = input_shape[1]
-
-        if position_ids is None:
-            position_ids = self.position_ids[:,
-                                             past_key_values_length:seq_length
-                                             + past_key_values_length]
-
-        # Setting the token_type_ids to the registered buffer in constructor where it is all zeros, which usually occurs
-        # when its auto-generated, registered buffer helps users when tracing the model without passing token_type_ids
-        # issue #5664
-        if token_type_ids is None:
-            if hasattr(self, 'token_type_ids'):
-                buffered_token_type_ids = self.token_type_ids[:, :seq_length]
-                buffered_token_type_ids_expanded = buffered_token_type_ids.expand(
-                    input_shape[0], seq_length)
-                token_type_ids = buffered_token_type_ids_expanded
-            else:
-                token_type_ids = torch.zeros(
-                    input_shape,
-                    dtype=torch.long,
-                    device=self.position_ids.device)
-
-        if inputs_embeds is None:
-            inputs_embeds = self.word_embeddings(input_ids)
-        token_type_embeddings = self.token_type_embeddings(token_type_ids)
-
-        embeddings = inputs_embeds + token_type_embeddings
-        if self.position_embedding_type == 'absolute':
-            position_embeddings = self.position_embeddings(position_ids)
-            embeddings += position_embeddings
-        embeddings = self.LayerNorm(embeddings)
-        embeddings = self.dropout(embeddings)
-        if not return_inputs_embeds:
-            return embeddings
-        else:
-            return embeddings, inputs_embeds
-
-
-class SbertSelfAttention(nn.Module):
-
-    def __init__(self, config):
-        super().__init__()
-        if config.hidden_size % config.num_attention_heads != 0 and not hasattr(
-                config, 'embedding_size'):
-            raise ValueError(
-                f'The hidden size ({config.hidden_size}) is not a multiple of the number of attention '
-                f'heads ({config.num_attention_heads})')
-
-        self.num_attention_heads = config.num_attention_heads
-        self.attention_head_size = int(config.hidden_size
-                                       / config.num_attention_heads)
-        self.all_head_size = self.num_attention_heads * self.attention_head_size
-
-        self.query = nn.Linear(config.hidden_size, self.all_head_size)
-        self.key = nn.Linear(config.hidden_size, self.all_head_size)
-        self.value = nn.Linear(config.hidden_size, self.all_head_size)
-
-        self.dropout = nn.Dropout(config.attention_probs_dropout_prob)
-        self.position_embedding_type = getattr(config,
-                                               'position_embedding_type',
-                                               'absolute')
-        if self.position_embedding_type == 'relative_key' or self.position_embedding_type == 'relative_key_query':
-            self.max_position_embeddings = config.max_position_embeddings
-            self.distance_embedding = nn.Embedding(
-                2 * config.max_position_embeddings - 1,
-                self.attention_head_size)
-
-        self.is_decoder = config.is_decoder
-
-    def transpose_for_scores(self, x):
-        new_x_shape = x.size()[:-1] + (self.num_attention_heads,
-                                       self.attention_head_size)
-        x = x.view(*new_x_shape)
-        return x.permute(0, 2, 1, 3)
-
-    def forward(
-        self,
-        hidden_states,
-        attention_mask=None,
-        head_mask=None,
-        encoder_hidden_states=None,
-        encoder_attention_mask=None,
-        past_key_value=None,
-        output_attentions=False,
-    ):
-        mixed_query_layer = self.query(hidden_states)
-
-        # If this is instantiated as a cross-attention module, the keys
-        # and values come from an encoder; the attention mask needs to be
-        # such that the encoder's padding tokens are not attended to.
-        is_cross_attention = encoder_hidden_states is not None
-
-        if is_cross_attention and past_key_value is not None:
-            # reuse k,v, cross_attentions
-            key_layer = past_key_value[0]
-            value_layer = past_key_value[1]
-            attention_mask = encoder_attention_mask
-        elif is_cross_attention:
-            key_layer = self.transpose_for_scores(
-                self.key(encoder_hidden_states))
-            value_layer = self.transpose_for_scores(
-                self.value(encoder_hidden_states))
-            attention_mask = encoder_attention_mask
-        elif past_key_value is not None:
-            key_layer = self.transpose_for_scores(self.key(hidden_states))
-            value_layer = self.transpose_for_scores(self.value(hidden_states))
-            key_layer = torch.cat([past_key_value[0], key_layer], dim=2)
-            value_layer = torch.cat([past_key_value[1], value_layer], dim=2)
-        else:
-            key_layer = self.transpose_for_scores(self.key(hidden_states))
-            value_layer = self.transpose_for_scores(self.value(hidden_states))
-
-        query_layer = self.transpose_for_scores(mixed_query_layer)
-
-        if self.is_decoder:
-            # if cross_attention save Tuple(torch.Tensor, torch.Tensor) of all cross attention key/value_states.
-            # Further calls to cross_attention layer can then reuse all cross-attention
-            # key/value_states (first "if" case)
-            # if uni-directional self-attention (decoder) save Tuple(torch.Tensor, torch.Tensor) of
-            # all previous decoder key/value_states. Further calls to uni-directional self-attention
-            # can concat previous decoder key/value_states to current projected key/value_states (third "elif" case)
-            # if encoder bi-directional self-attention `past_key_value` is always `None`
-            past_key_value = (key_layer, value_layer)
-
-        # Take the dot product between "query" and "key" to get the raw attention scores.
-        attention_scores = torch.matmul(query_layer,
-                                        key_layer.transpose(-1, -2))
-
-        if self.position_embedding_type == 'relative_key' or self.position_embedding_type == 'relative_key_query':
-            seq_length = hidden_states.size()[1]
-            position_ids_l = torch.arange(
-                seq_length, dtype=torch.long,
-                device=hidden_states.device).view(-1, 1)
-            position_ids_r = torch.arange(
-                seq_length, dtype=torch.long,
-                device=hidden_states.device).view(1, -1)
-            distance = position_ids_l - position_ids_r
-            positional_embedding = self.distance_embedding(
-                distance + self.max_position_embeddings - 1)
-            positional_embedding = positional_embedding.to(
-                dtype=query_layer.dtype)  # fp16 compatibility
-
-            if self.position_embedding_type == 'relative_key':
-                relative_position_scores = torch.einsum(
-                    'bhld,lrd->bhlr', query_layer, positional_embedding)
-                attention_scores = attention_scores + relative_position_scores
-            elif self.position_embedding_type == 'relative_key_query':
-                relative_position_scores_query = torch.einsum(
-                    'bhld,lrd->bhlr', query_layer, positional_embedding)
-                relative_position_scores_key = torch.einsum(
-                    'bhrd,lrd->bhlr', key_layer, positional_embedding)
-                attention_scores = attention_scores + relative_position_scores_query + relative_position_scores_key
-
-        attention_scores = attention_scores / math.sqrt(
-            self.attention_head_size)
-        if attention_mask is not None:
-            # Apply the attention mask is (precomputed for all layers in SbertModel forward() function)
-            attention_scores = attention_scores + attention_mask
-
-        # Normalize the attention scores to probabilities.
-        attention_probs = nn.Softmax(dim=-1)(attention_scores)
-
-        # This is actually dropping out entire tokens to attend to, which might
-        # seem a bit unusual, but is taken from the original Transformer paper.
-        attention_probs = self.dropout(attention_probs)
-
-        # Mask heads if we want to
-        if head_mask is not None:
-            attention_probs = attention_probs * head_mask
-
-        context_layer = torch.matmul(attention_probs, value_layer)
-
-        context_layer = context_layer.permute(0, 2, 1, 3).contiguous()
-        new_context_layer_shape = context_layer.size()[:-2] + (
-            self.all_head_size, )
-        context_layer = context_layer.view(*new_context_layer_shape)
-
-        outputs = (context_layer,
-                   attention_probs) if output_attentions else (context_layer, )
-
-        if self.is_decoder:
-            outputs = outputs + (past_key_value, )
-        return outputs
-
-
-class SbertSelfOutput(nn.Module):
-
-    def __init__(self, config):
-        super().__init__()
-        self.dense = nn.Linear(config.hidden_size, config.hidden_size)
-        self.LayerNorm = nn.LayerNorm(
-            config.hidden_size, eps=config.layer_norm_eps)
-        self.dropout = nn.Dropout(config.hidden_dropout_prob)
-
-    def forward(self, hidden_states, input_tensor):
-        hidden_states = self.dense(hidden_states)
-        hidden_states = self.dropout(hidden_states)
-        hidden_states = self.LayerNorm(hidden_states + input_tensor)
-        return hidden_states
-
-
-class SbertAttention(nn.Module):
-
-    def __init__(self, config):
-        super().__init__()
-        self.self = SbertSelfAttention(config)
-        self.output = SbertSelfOutput(config)
-        self.pruned_heads = set()
-
-    def prune_heads(self, heads):
-        if len(heads) == 0:
-            return
-        heads, index = find_pruneable_heads_and_indices(
-            heads, self.self.num_attention_heads,
-            self.self.attention_head_size, self.pruned_heads)
-
-        # Prune linear layers
-        self.self.query = prune_linear_layer(self.self.query, index)
-        self.self.key = prune_linear_layer(self.self.key, index)
-        self.self.value = prune_linear_layer(self.self.value, index)
-        self.output.dense = prune_linear_layer(self.output.dense, index, dim=1)
-
-        # Update hyper params and store pruned heads
-        self.self.num_attention_heads = self.self.num_attention_heads - len(
-            heads)
-        self.self.all_head_size = self.self.attention_head_size * self.self.num_attention_heads
-        self.pruned_heads = self.pruned_heads.union(heads)
-
-    def forward(
-        self,
-        hidden_states,
-        attention_mask=None,
-        head_mask=None,
-        encoder_hidden_states=None,
-        encoder_attention_mask=None,
-        past_key_value=None,
-        output_attentions=False,
-    ):
-        self_outputs = self.self(
-            hidden_states,
-            attention_mask,
-            head_mask,
-            encoder_hidden_states,
-            encoder_attention_mask,
-            past_key_value,
-            output_attentions,
-        )
-        attention_output = self.output(self_outputs[0], hidden_states)
-        outputs = (attention_output,
-                   ) + self_outputs[1:]  # add attentions if we output them
-        return outputs
-
-
-class SbertIntermediate(nn.Module):
-
-    def __init__(self, config):
-        super().__init__()
-        self.dense = nn.Linear(config.hidden_size, config.intermediate_size)
-        if isinstance(config.hidden_act, str):
-            self.intermediate_act_fn = ACT2FN[config.hidden_act]
-        else:
-            self.intermediate_act_fn = config.hidden_act
-
-    def forward(self, hidden_states):
-        hidden_states = self.dense(hidden_states)
-        hidden_states = self.intermediate_act_fn(hidden_states)
-        return hidden_states
-
-
-class SbertOutput(nn.Module):
-
-    def __init__(self, config):
-        super().__init__()
-        self.dense = nn.Linear(config.intermediate_size, config.hidden_size)
-        self.LayerNorm = nn.LayerNorm(
-            config.hidden_size, eps=config.layer_norm_eps)
-        self.dropout = nn.Dropout(config.hidden_dropout_prob)
-
-    def forward(self, hidden_states, input_tensor):
-        hidden_states = self.dense(hidden_states)
-        hidden_states = self.dropout(hidden_states)
-        hidden_states = self.LayerNorm(hidden_states + input_tensor)
-        return hidden_states
-
-
-class SbertLayer(nn.Module):
-
-    def __init__(self, config):
-        super().__init__()
-        self.chunk_size_feed_forward = config.chunk_size_feed_forward
-        self.seq_len_dim = 1
-        self.attention = SbertAttention(config)
-        self.is_decoder = config.is_decoder
-        self.add_cross_attention = config.add_cross_attention
-        if self.add_cross_attention:
-            if not self.is_decoder:
-                raise ValueError(
-                    f'{self} should be used as a decoder model if cross attention is added'
-                )
-            self.crossattention = SbertAttention(config)
-        self.intermediate = SbertIntermediate(config)
-        self.output = SbertOutput(config)
-
-    def forward(
-        self,
-        hidden_states,
-        attention_mask=None,
-        head_mask=None,
-        encoder_hidden_states=None,
-        encoder_attention_mask=None,
-        past_key_value=None,
-        output_attentions=False,
-    ):
-        # decoder uni-directional self-attention cached key/values tuple is at positions 1,2
-        self_attn_past_key_value = past_key_value[:
-                                                  2] if past_key_value is not None else None
-        self_attention_outputs = self.attention(
-            hidden_states,
-            attention_mask,
-            head_mask,
-            output_attentions=output_attentions,
-            past_key_value=self_attn_past_key_value,
-        )
-        attention_output = self_attention_outputs[0]
-
-        # if decoder, the last output is tuple of self-attn cache
-        if self.is_decoder:
-            outputs = self_attention_outputs[1:-1]
-            present_key_value = self_attention_outputs[-1]
-        else:
-            outputs = self_attention_outputs[
-                1:]  # add self attentions if we output attention weights
-
-        cross_attn_present_key_value = None
-        if self.is_decoder and encoder_hidden_states is not None:
-            if not hasattr(self, 'crossattention'):
-                raise ValueError(
-                    f'If `encoder_hidden_states` are passed, {self} has to be instantiated'
-                    f'with cross-attention layers by setting `config.add_cross_attention=True`'
-                )
-
-            # cross_attn cached key/values tuple is at positions 3,4 of past_key_value tuple
-            cross_attn_past_key_value = past_key_value[
-                -2:] if past_key_value is not None else None
-            cross_attention_outputs = self.crossattention(
-                attention_output,
-                attention_mask,
-                head_mask,
-                encoder_hidden_states,
-                encoder_attention_mask,
-                cross_attn_past_key_value,
-                output_attentions,
-            )
-            attention_output = cross_attention_outputs[0]
-            outputs = outputs + cross_attention_outputs[
-                1:-1]  # add cross attentions if we output attention weights
-
-            # add cross-attn cache to positions 3,4 of present_key_value tuple
-            cross_attn_present_key_value = cross_attention_outputs[-1]
-            present_key_value = present_key_value + cross_attn_present_key_value
-
-        layer_output = apply_chunking_to_forward(self.feed_forward_chunk,
-                                                 self.chunk_size_feed_forward,
-                                                 self.seq_len_dim,
-                                                 attention_output)
-        outputs = (layer_output, ) + outputs
-
-        # if decoder, return the attn key/values as the last output
-        if self.is_decoder:
-            outputs = outputs + (present_key_value, )
-
-        return outputs
-
-    def feed_forward_chunk(self, attention_output):
-        intermediate_output = self.intermediate(attention_output)
-        layer_output = self.output(intermediate_output, attention_output)
-        return layer_output
-
-
-class SbertEncoder(nn.Module):
-
-    def __init__(self, config):
-        super().__init__()
-        self.config = config
-        self.layer = nn.ModuleList(
-            [SbertLayer(config) for _ in range(config.num_hidden_layers)])
-        self.gradient_checkpointing = False
-
-    def forward(
-        self,
-        hidden_states,
-        attention_mask=None,
-        head_mask=None,
-        encoder_hidden_states=None,
-        encoder_attention_mask=None,
-        past_key_values=None,
-        use_cache=None,
-        output_attentions=False,
-        output_hidden_states=False,
-        return_dict=True,
-    ):
-        all_hidden_states = () if output_hidden_states else None
-        all_self_attentions = () if output_attentions else None
-        all_cross_attentions = (
-        ) if output_attentions and self.config.add_cross_attention else None
-
-        next_decoder_cache = () if use_cache else None
-        for i, layer_module in enumerate(self.layer):
-            if output_hidden_states:
-                all_hidden_states = all_hidden_states + (hidden_states, )
-
-            layer_head_mask = head_mask[i] if head_mask is not None else None
-            past_key_value = past_key_values[
-                i] if past_key_values is not None else None
-
-            if self.gradient_checkpointing and self.training:
-
-                if use_cache:
-                    logger.warning(
-                        '`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...'
-                    )
-                    use_cache = False
-
-                def create_custom_forward(module):
-
-                    def custom_forward(*inputs):
-                        return module(*inputs, past_key_value,
-                                      output_attentions)
-
-                    return custom_forward
-
-                layer_outputs = torch.utils.checkpoint.checkpoint(
-                    create_custom_forward(layer_module),
-                    hidden_states,
-                    attention_mask,
-                    layer_head_mask,
-                    encoder_hidden_states,
-                    encoder_attention_mask,
-                )
-            else:
-                layer_outputs = layer_module(
-                    hidden_states,
-                    attention_mask,
-                    layer_head_mask,
-                    encoder_hidden_states,
-                    encoder_attention_mask,
-                    past_key_value,
-                    output_attentions,
-                )
-
-            hidden_states = layer_outputs[0]
-            if use_cache:
-                next_decoder_cache += (layer_outputs[-1], )
-            if output_attentions:
-                all_self_attentions = all_self_attentions + (
-                    layer_outputs[1], )
-                if self.config.add_cross_attention:
-                    all_cross_attentions = all_cross_attentions + (
-                        layer_outputs[2], )
-
-        if output_hidden_states:
-            all_hidden_states = all_hidden_states + (hidden_states, )
-
-        if not return_dict:
-            return tuple(v for v in [
-                hidden_states,
-                next_decoder_cache,
-                all_hidden_states,
-                all_self_attentions,
-                all_cross_attentions,
-            ] if v is not None)
-        return BaseModelOutputWithPastAndCrossAttentions(
-            last_hidden_state=hidden_states,
-            past_key_values=next_decoder_cache,
-            hidden_states=all_hidden_states,
-            attentions=all_self_attentions,
-            cross_attentions=all_cross_attentions,
-        )
-
-
-class SbertPooler(nn.Module):
-
-    def __init__(self, config):
-        super().__init__()
-        self.dense = nn.Linear(config.hidden_size, config.hidden_size)
-        self.activation = nn.Tanh()
-
-    def forward(self, hidden_states):
-        # We "pool" the model by simply taking the hidden state corresponding
-        # to the first token.
-        first_token_tensor = hidden_states[:, 0]
-        pooled_output = self.dense(first_token_tensor)
-        pooled_output = self.activation(pooled_output)
-        return pooled_output
-
-
-@dataclass
-class SbertForPreTrainingOutput(ModelOutput):
-    """
-    Output type of :class:`~structbert.utils.BertForPreTraining`.
-
-    Args:
-        loss (`optional`, returned when ``labels`` is provided, ``torch.FloatTensor`` of shape :obj:`(1,)`):
-            Total loss as the sum of the masked language modeling loss and the next sequence prediction
-            (classification) loss.
-        prediction_logits (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, config.vocab_size)`):
-            Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
-        seq_relationship_logits (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, 2)`):
-            Prediction scores of the next sequence prediction (classification) head (scores of True/False continuation
-            before SoftMax).
-        hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when
-        ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):
-            Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer)
-            of shape :obj:`(batch_size, sequence_length, hidden_size)`.
-
-            Hidden-states of the model at the output of each layer plus the initial embedding outputs.
-        attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when
-        ``output_attentions=True`` is passed or when ``config.output_attentions=True``):
-            Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape :obj:`(batch_size, num_heads,
-            sequence_length, sequence_length)`.
-
-            Attentions weights after the attention softmax, used to compute the weighted average in the self-attention
-            heads.
-    """
-
-    loss: Optional[torch.FloatTensor] = None
-    prediction_logits: torch.FloatTensor = None
-    seq_relationship_logits: torch.FloatTensor = None
-    hidden_states: Optional[Tuple[torch.FloatTensor]] = None
-    attentions: Optional[Tuple[torch.FloatTensor]] = None
-
-
-@dataclass
-class BaseModelOutputWithPoolingAndCrossAttentionsWithEmbedding(
-        BaseModelOutputWithPoolingAndCrossAttentions):
-    embedding_output: torch.FloatTensor = None
-    logits: Optional[Union[tuple, torch.FloatTensor]] = None
-    kwargs: dict = None
--- a/modelscope/models/nlp/backbones/gpt3/init.py
+++ b/modelscope/models/nlp/backbones/gpt3/init.py
@@ -6,10 +6,12 @@ from modelscope.utils.import_utils import LazyImportModule
 if TYPE_CHECKING:
    from .configuration_gpt3 import GPT3Config
    from .modeling_gpt3 import GPT3Model
+    from .gpt3_for_text_generation import GPT3ForTextGeneration
 else:
    _import_structure = {
        'configuration_gpt3': ['GPT3Config'],
-        'modeling_gpt3': ['GPT3Model']
+        'modeling_gpt3': ['GPT3Model'],
+        'gpt3_for_text_generation': ['GPT3ForTextGeneration'],
    }

    import sys
--- a/modelscope/models/nlp/backbones/gpt3/configuration_gpt3.py
+++ b/modelscope/models/nlp/backbones/gpt3/configuration_gpt3.py
--- a/modelscope/models/nlp/gpt3/gpt3_for_text_generation.py
+++ b/modelscope/models/nlp/gpt3/gpt3_for_text_generation.py
@@ -20,7 +20,7 @@ class GPT3ForTextGeneration(TorchModel):
        """
        super().__init__(model_dir, *args, **kwargs)

-        from modelscope.models.nlp import GPT3Model
+        from modelscope.models.nlp.gpt3 import GPT3Model
        from transformers import BertTokenizer

        self.model = GPT3Model.from_pretrained(model_dir)
--- a/modelscope/models/nlp/backbones/gpt3/modeling_gpt3.py
+++ b/modelscope/models/nlp/backbones/gpt3/modeling_gpt3.py
--- a/modelscope/models/nlp/heads/init.py
+++ b/modelscope/models/nlp/heads/init.py
@@ -5,9 +5,11 @@ from modelscope.utils.import_utils import LazyImportModule

 if TYPE_CHECKING:
    from .sequence_classification_head import SequenceClassificationHead
+    from .torch_pretrain_head import BertMLMHead, RobertaMLMHead
 else:
    _import_structure = {
-        'sequence_classification_head': ['SequenceClassificationHead']
+        'sequence_classification_head': ['SequenceClassificationHead'],
+        'torch_pretrain_head': ['BertMLMHead', 'RobertaMLMHead'],
    }

    import sys
--- a/modelscope/models/nlp/heads/sequence_classification_head.py
+++ b/modelscope/models/nlp/heads/sequence_classification_head.py
@@ -1,5 +1,4 @@
-import importlib
-from typing import Dict, List, Optional, Union
+from typing import Dict

 import torch
 import torch.nn.functional as F
--- a/modelscope/models/nlp/heads/torch_pretrain_head.py
+++ b/modelscope/models/nlp/heads/torch_pretrain_head.py
@@ -0,0 +1,26 @@
+from typing import Dict
+
+import torch
+from transformers.models.bert.modeling_bert import BertOnlyMLMHead
+from transformers.models.roberta.modeling_roberta import RobertaLMHead
+
+from modelscope.metainfo import Heads
+from modelscope.models.base import TorchHead
+from modelscope.models.builder import HEADS
+from modelscope.utils.constant import Tasks
+
+
+@HEADS.register_module(Tasks.fill_mask, module_name=Heads.bert_mlm)
+class BertMLMHead(BertOnlyMLMHead, TorchHead):
+
+    def compute_loss(self, outputs: Dict[str, torch.Tensor],
+                     labels) -> Dict[str, torch.Tensor]:
+        raise NotImplementedError()
+
+
+@HEADS.register_module(Tasks.fill_mask, module_name=Heads.roberta_mlm)
+class RobertaMLMHead(RobertaLMHead, TorchHead):
+
+    def compute_loss(self, outputs: Dict[str, torch.Tensor],
+                     labels) -> Dict[str, torch.Tensor]:
+        raise NotImplementedError()
--- a/modelscope/models/nlp/masked_language.py
+++ b/modelscope/models/nlp/masked_language.py
@@ -1,72 +1,115 @@
-from typing import Dict
+from typing import Any, Dict, Optional, Union

 import numpy as np
+from transformers import BertForMaskedLM as BertForMaskedLMTransformer

 from modelscope.metainfo import Models
-from modelscope.models import TorchModel
-from modelscope.models.base import Tensor
+from modelscope.models.base import TorchModel
 from modelscope.models.builder import MODELS
+from modelscope.models.nlp.structbert import SbertForMaskedLM
+from modelscope.models.nlp.veco import \
+    VecoForMaskedLM as VecoForMaskedLMTransformer
+from modelscope.outputs import OutputKeys
 from modelscope.utils.constant import Tasks

 __all__ = ['BertForMaskedLM', 'StructBertForMaskedLM', 'VecoForMaskedLM']


-class MaskedLanguageModelBase(TorchModel):
-
-    def __init__(self, model_dir: str, *args, **kwargs):
-        super().__init__(model_dir, *args, **kwargs)
-        self.model = self.build_model()
-
-    def build_model(self):
-        raise NotImplementedError()
-
-    def train(self):
-        return self.model.train()
-
-    def eval(self):
-        return self.model.eval()
-
-    @property
-    def config(self):
-        if hasattr(self.model, 'config'):
-            return self.model.config
-        return None
-
-    def forward(self, input: Dict[str, Tensor]) -> Dict[str, np.ndarray]:
-        """return the result by the model
-
-        Args:
-            input (Dict[str, Any]): the preprocessed data
-
-        Returns:
-            Dict[str, np.ndarray]: results
-        """
-        rst = self.model(
-            input_ids=input['input_ids'],
-            attention_mask=input['attention_mask'],
-            token_type_ids=input['token_type_ids'])
-        return {'logits': rst['logits'], 'input_ids': input['input_ids']}
-
-
@MODELS.register_module(Tasks.fill_mask, module_name=Models.structbert)
-class StructBertForMaskedLM(MaskedLanguageModelBase):
+class StructBertForMaskedLM(TorchModel, SbertForMaskedLM):

-    def build_model(self):
-        from sofa import SbertForMaskedLM
-        return SbertForMaskedLM.from_pretrained(self.model_dir)
+    def __init__(self, config, model_dir):
+        super(TorchModel, self).__init__(model_dir)
+        SbertForMaskedLM.__init__(self, config)

+    def forward(self,
+                input_ids=None,
+                attention_mask=None,
+                token_type_ids=None,
+                position_ids=None,
+                head_mask=None,
+                labels=None):
+        output = SbertForMaskedLM.forward(
+            self,
+            input_ids=input_ids,
+            attention_mask=attention_mask,
+            token_type_ids=token_type_ids,
+            position_ids=position_ids,
+            head_mask=head_mask,
+            labels=labels)
+        output[OutputKeys.INPUT_IDS] = input_ids
+        return output

-@MODELS.register_module(Tasks.fill_mask, module_name=Models.veco)
-class VecoForMaskedLM(MaskedLanguageModelBase):
-
-    def build_model(self):
-        from sofa import VecoForMaskedLM
-        return VecoForMaskedLM.from_pretrained(self.model_dir)
+    @classmethod
+    def _instantiate(cls, **kwargs):
+        model_dir = kwargs.get('model_dir')
+        return super(SbertForMaskedLM, StructBertForMaskedLM).from_pretrained(
+            pretrained_model_name_or_path=model_dir, model_dir=model_dir)


@MODELS.register_module(Tasks.fill_mask, module_name=Models.bert)
-class BertForMaskedLM(MaskedLanguageModelBase):
+class BertForMaskedLM(TorchModel, BertForMaskedLMTransformer):

-    def build_model(self):
-        from transformers import BertForMaskedLM
-        return BertForMaskedLM.from_pretrained(self.model_dir)
+    def __init__(self, config, model_dir):
+        super(TorchModel, self).__init__(model_dir)
+        BertForMaskedLMTransformer.__init__(self, config)
+
+    def forward(self,
+                input_ids=None,
+                attention_mask=None,
+                token_type_ids=None,
+                position_ids=None,
+                head_mask=None,
+                labels=None):
+        output = BertForMaskedLMTransformer.forward(
+            self,
+            input_ids=input_ids,
+            attention_mask=attention_mask,
+            token_type_ids=token_type_ids,
+            position_ids=position_ids,
+            head_mask=head_mask,
+            labels=labels)
+        output[OutputKeys.INPUT_IDS] = input_ids
+        return output
+
+    @classmethod
+    def _instantiate(cls, **kwargs):
+        model_dir = kwargs.get('model_dir')
+        return super(BertForMaskedLMTransformer,
+                     BertForMaskedLM).from_pretrained(
+                         pretrained_model_name_or_path=model_dir,
+                         model_dir=model_dir)
+
+
+@MODELS.register_module(Tasks.fill_mask, module_name=Models.veco)
+class VecoForMaskedLM(TorchModel, VecoForMaskedLMTransformer):
+
+    def __init__(self, config, model_dir):
+        super(TorchModel, self).__init__(model_dir)
+        VecoForMaskedLMTransformer.__init__(self, config)
+
+    def forward(self,
+                input_ids=None,
+                attention_mask=None,
+                token_type_ids=None,
+                position_ids=None,
+                head_mask=None,
+                labels=None):
+        output = VecoForMaskedLMTransformer.forward(
+            self,
+            input_ids=input_ids,
+            attention_mask=attention_mask,
+            token_type_ids=token_type_ids,
+            position_ids=position_ids,
+            head_mask=head_mask,
+            labels=labels)
+        output[OutputKeys.INPUT_IDS] = input_ids
+        return output
+
+    @classmethod
+    def _instantiate(cls, **kwargs):
+        model_dir = kwargs.get('model_dir')
+        return super(VecoForMaskedLMTransformer,
+                     VecoForMaskedLM).from_pretrained(
+                         pretrained_model_name_or_path=model_dir,
+                         model_dir=model_dir)
--- a/modelscope/models/nlp/palm_v2/init.py
+++ b/modelscope/models/nlp/palm_v2/init.py
@@ -0,0 +1,43 @@
+# Copyright 2021-2022 The Alibaba DAMO NLP Team Authors.
+# All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from typing import TYPE_CHECKING
+
+from modelscope.utils.import_utils import LazyImportModule
+
+if TYPE_CHECKING:
+    from .configuration_palm import PalmConfig
+    from .modeling_palm import (
+        AbsSummarizer,
+        PalmForConditionalGeneration,
+        Translator,
+    )
+    from .palm_for_text_generation import PalmForTextGeneration
+else:
+    _import_structure = {
+        'configuration_palm': ['PalmConfig'],
+        'modeling_palm':
+        ['AbsSummarizer', 'PalmForConditionalGeneration', 'Translator'],
+        'palm_for_text_generation': ['PalmForTextGeneration'],
+    }
+
+    import sys
+
+    sys.modules[__name__] = LazyImportModule(
+        __name__,
+        globals()['__file__'],
+        _import_structure,
+        module_spec=__spec__,
+        extra_objects={},
+    )
--- a/modelscope/models/nlp/palm_v2/configuration_palm.py
+++ b/modelscope/models/nlp/palm_v2/configuration_palm.py
@@ -0,0 +1,116 @@
+# Copyright 2021-2022 The Alibaba DAMO NLP Team Authors.
+# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.
+# Copyright (c) 2018, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+""" PALM model configuration """
+
+from transformers.configuration_utils import PretrainedConfig
+
+from modelscope.utils import logger as logging
+
+logger = logging.get_logger(__name__)
+
+
+class PalmConfig(PretrainedConfig):
+    r"""
+    Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used to control the model
+    outputs. Read the documentation from :class:`~transformers.PretrainedConfig` for more information.
+
+
+    Args:
+        vocab_size (:obj:`int`, `optional`, defaults to 30522):
+            Vocabulary size of the BERT model. Defines the number of different tokens that can be represented by the
+            :obj:`inputs_ids` passed when calling :class:`~transformers.BertModel` or
+            :class:`~transformers.TFBertModel`.
+        hidden_size (:obj:`int`, `optional`, defaults to 768):
+            Dimensionality of the encoder layers and the pooler layer.
+        num_hidden_layers (:obj:`int`, `optional`, defaults to 12):
+            Number of hidden layers in the Transformer encoder.
+        num_attention_heads (:obj:`int`, `optional`, defaults to 12):
+            Number of attention heads for each attention layer in the Transformer encoder.
+        intermediate_size (:obj:`int`, `optional`, defaults to 3072):
+            Dimensionality of the "intermediate" (often named feed-forward) layer in the Transformer encoder.
+        hidden_act (:obj:`str` or :obj:`Callable`, `optional`, defaults to :obj:`"gelu"`):
+            The non-linear activation function (function or string) in the encoder and pooler. If string,
+            :obj:`"gelu"`, :obj:`"relu"`, :obj:`"silu"` and :obj:`"gelu_new"` are supported.
+        hidden_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
+            The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
+        attention_probs_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
+            The dropout ratio for the attention probabilities.
+        max_position_embeddings (:obj:`int`, `optional`, defaults to 512):
+            The maximum sequence length that this model might ever be used with. Typically set this to something large
+            just in case (e.g., 512 or 1024 or 2048).
+        type_vocab_size (:obj:`int`, `optional`, defaults to 2):
+            The vocabulary size of the :obj:`token_type_ids` passed when calling :class:`~transformers.BertModel` or
+            :class:`~transformers.TFBertModel`.
+        initializer_range (:obj:`float`, `optional`, defaults to 0.02):
+            The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
+        layernorm_epsilon (:obj:`float`, `optional`, defaults to 1e-12):
+            The epsilon used by the layer normalization layers.
+        dec_hidden_layers (:obj:`int`, `optional`, defaults to 12):
+            Number of hidden layers in the Transformer decoder.
+        attn_separate (:obj:`bool`, `optional`, defaults to false):
+            Whether or not to separate the q, k, v of attention.
+
+    Examples::
+
+        >>> from modelscope.models.nlp.palm_v2 import PalmForConditionalGeneration, PalmConfig
+        >>> configuration = PalmConfig()
+
+        >>> # Initializing a model from the configuration
+        >>> model = PalmForConditionalGeneration(configuration)
+
+        >>> # Accessing the model configuration
+        >>> configuration = model.config
+    """
+    model_type = 'palm'
+
+    def __init__(self,
+                 encoder='roberta',
+                 encoder_pth='roberta-base',
+                 max_pos=512,
+                 share_emb=False,
+                 dec_layers=12,
+                 dec_hidden_size=768,
+                 dec_heads=8,
+                 dec_ff_size=3072,
+                 dec_dropout=0.2,
+                 use_bert_emb=True,
+                 label_smoothing=0.1,
+                 alpha=0.95,
+                 beam_size=5,
+                 min_length=40,
+                 max_length=130,
+                 sample_topk=False,
+                 block_trigram=False,
+                 **kwargs):
+        super().__init__(**kwargs)
+        self.encoder = encoder
+        self.encoder_pth = encoder_pth
+        self.max_pos = max_pos
+        self.share_emb = share_emb
+        self.dec_layers = dec_layers
+        self.dec_hidden_size = dec_hidden_size
+        self.dec_heads = dec_heads
+        self.dec_ff_size = dec_ff_size
+        self.dec_dropout = dec_dropout
+        self.use_bert_emb = use_bert_emb
+        self.label_smoothing = label_smoothing
+        # Translator
+        self.alpha = alpha
+        self.beam_size = beam_size
+        self.min_length = min_length
+        self.max_length = max_length
+        self.sample_topk = sample_topk
+        self.block_trigram = block_trigram
--- a/modelscope/models/nlp/palm_v2/dureader_eval.py
+++ b/modelscope/models/nlp/palm_v2/dureader_eval.py
@@ -0,0 +1,872 @@
+# ==============================================================================
+# Copyright 2017 Baidu.com, Inc. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""
+This module computes evaluation metrics for DuReader dataset.
+"""
+
+import argparse
+import copy
+import math
+import re
+import sys
+import zipfile
+from collections import Counter, defaultdict
+
+import json
+import numpy as np
+from rouge import Rouge
+
+EMPTY = ''
+YESNO_LABELS = set(['Yes', 'No', 'Depends'])
+
+
+def my_lcs(string, sub):
+    """
+    Calculates longest common subsequence for a pair of tokenized strings
+    :param string : list of str : tokens from a string split using whitespace
+    :param sub : list of str : shorter string, also split using whitespace
+    :returns: length (list of int): length of the longest common subsequence between the two strings
+
+    Note: my_lcs only gives length of the longest common subsequence, not the actual LCS
+    """
+    if (len(string) < len(sub)):
+        sub, string = string, sub
+
+    lengths = [[0 for i in range(0,
+                                 len(sub) + 1)]
+               for j in range(0,
+                              len(string) + 1)]
+
+    for j in range(1, len(sub) + 1):
+        for i in range(1, len(string) + 1):
+            if (string[i - 1] == sub[j - 1]):
+                lengths[i][j] = lengths[i - 1][j - 1] + 1
+            else:
+                lengths[i][j] = max(lengths[i - 1][j], lengths[i][j - 1])
+
+    return lengths[len(string)][len(sub)]
+
+
+class Bleu:
+
+    def __init__(self, n=4):
+        # default compute Blue score up to 4
+        self._n = n
+        self._hypo_for_image = {}
+        self.ref_for_image = {}
+
+    def compute_score(self, gts, res):
+        assert (list(gts.keys()) == list(res.keys()))
+        imgIds = list(gts.keys())
+
+        bleu_scorer = BleuScorer(n=self._n)
+        for id in imgIds:
+            hypo = res[id]
+            ref = gts[id]
+
+            # Sanity check.
+            assert (type(hypo) is list)
+            assert (len(hypo) == 1)
+            assert (type(ref) is list)
+            assert (len(ref) >= 1)
+
+            bleu_scorer += (hypo[0], ref)
+
+        score, scores = bleu_scorer.compute_score(option='closest', verbose=1)
+        return score, scores
+
+    def method(self):
+        return 'Bleu'
+
+
+def precook(s, n=4, out=False):
+    """Takes a string as input and returns an object that can be given to
+    either cook_refs or cook_test. This is optional: cook_refs and cook_test
+    can take string arguments as well."""
+    words = s.split()
+    counts = defaultdict(int)
+    for k in range(1, n + 1):
+        for i in range(len(words) - k + 1):
+            ngram = tuple(words[i:i + k])
+            counts[ngram] += 1
+    return (len(words), counts)
+
+
+def cook_refs(refs, eff=None, n=4):  # lhuang: oracle will call with "average"
+    '''Takes a list of reference sentences for a single segment
+    and returns an object that encapsulates everything that BLEU
+    needs to know about them.'''
+
+    reflen = []
+    maxcounts = {}
+    for ref in refs:
+        rl, counts = precook(ref, n)
+        reflen.append(rl)
+        for (ngram, count) in counts.items():
+            maxcounts[ngram] = max(maxcounts.get(ngram, 0), count)
+
+    # Calculate effective reference sentence length.
+    if eff == 'shortest':
+        reflen = min(reflen)
+    elif eff == 'average':
+        reflen = float(sum(reflen)) / len(reflen)
+
+    # lhuang: N.B.: leave reflen computaiton to the very end!!
+
+    # lhuang: N.B.: in case of "closest", keep a list of reflens!! (bad design)
+
+    return reflen, maxcounts
+
+
+def cook_test(test, xxx_todo_changeme, eff=None, n=4):
+    '''Takes a test sentence and returns an object that
+    encapsulates everything that BLEU needs to know about it.'''
+    (reflen, refmaxcounts) = xxx_todo_changeme
+    testlen, counts = precook(test, n, True)
+
+    result = {}
+
+    # Calculate effective reference sentence length.
+
+    if eff == 'closest':
+        result['reflen'] = min((abs(ref - testlen), ref) for ref in reflen)[1]
+    else:  # i.e., "average" or "shortest" or None
+        result['reflen'] = reflen
+
+    result['testlen'] = testlen
+
+    result['guess'] = [max(0, testlen - k + 1) for k in range(1, n + 1)]
+
+    result['correct'] = [0] * n
+    for (ngram, count) in counts.items():
+        result['correct'][len(ngram) - 1] += min(
+            refmaxcounts.get(ngram, 0), count)
+
+    return result
+
+
+class BleuScorer(object):
+    """Bleu scorer.
+    """
+
+    __slots__ = 'n', 'crefs', 'ctest', '_score', '_ratio', '_testlen', '_reflen', 'special_reflen'
+
+    # special_reflen is used in oracle (proportional effective ref len for a node).
+
+    def copy(self):
+        ''' copy the refs.'''
+        new = BleuScorer(n=self.n)
+        new.ctest = copy.copy(self.ctest)
+        new.crefs = copy.copy(self.crefs)
+        new._score = None
+        return new
+
+    def __init__(self, test=None, refs=None, n=4, special_reflen=None):
+        ''' singular instance '''
+
+        self.n = n
+        self.crefs = []
+        self.ctest = []
+        self.cook_append(test, refs)
+        self.special_reflen = special_reflen
+
+    def cook_append(self, test, refs):
+        '''called by constructor and __iadd__ to avoid creating new instances.'''
+
+        if refs is not None:
+            self.crefs.append(cook_refs(refs))
+            if test is not None:
+                cooked_test = cook_test(test, self.crefs[-1])
+                self.ctest.append(cooked_test)  # N.B.: -1
+            else:
+                self.ctest.append(
+                    None)  # lens of crefs and ctest have to match
+
+        self._score = None  # need to recompute
+
+    def ratio(self, option=None):
+        self.compute_score(option=option)
+        return self._ratio
+
+    def score_ratio(self, option=None):
+        '''return (bleu, len_ratio) pair'''
+        return (self.fscore(option=option), self.ratio(option=option))
+
+    def score_ratio_str(self, option=None):
+        return '%.4f (%.2f)' % self.score_ratio(option)
+
+    def reflen(self, option=None):
+        self.compute_score(option=option)
+        return self._reflen
+
+    def testlen(self, option=None):
+        self.compute_score(option=option)
+        return self._testlen
+
+    def retest(self, new_test):
+        if type(new_test) is str:
+            new_test = [new_test]
+        assert len(new_test) == len(self.crefs), new_test
+        self.ctest = []
+        for t, rs in zip(new_test, self.crefs):
+            self.ctest.append(cook_test(t, rs))
+        self._score = None
+
+        return self
+
+    def rescore(self, new_test):
+        ''' replace test(s) with new test(s), and returns the new score.'''
+
+        return self.retest(new_test).compute_score()
+
+    def size(self):
+        assert len(self.crefs) == len(
+            self.ctest), 'refs/test mismatch! %d<>%d' % (len(
+                self.crefs), len(self.ctest))
+        return len(self.crefs)
+
+    def __iadd__(self, other):
+        '''add an instance (e.g., from another sentence).'''
+
+        if type(other) is tuple:
+            # avoid creating new BleuScorer instances
+            self.cook_append(other[0], other[1])
+        else:
+            assert self.compatible(other), 'incompatible BLEUs.'
+            self.ctest.extend(other.ctest)
+            self.crefs.extend(other.crefs)
+            self._score = None  # need to recompute
+
+        return self
+
+    def compatible(self, other):
+        return isinstance(other, BleuScorer) and self.n == other.n
+
+    def single_reflen(self, option='average'):
+        return self._single_reflen(self.crefs[0][0], option)
+
+    def _single_reflen(self, reflens, option=None, testlen=None):
+
+        if option == 'shortest':
+            reflen = min(reflens)
+        elif option == 'average':
+            reflen = float(sum(reflens)) / len(reflens)
+        elif option == 'closest':
+            reflen = min((abs(ref - testlen), ref) for ref in reflens)[1]
+        else:
+            assert False, 'unsupported reflen option %s' % option
+
+        return reflen
+
+    def recompute_score(self, option=None, verbose=0):
+        self._score = None
+        return self.compute_score(option, verbose)
+
+    def compute_score(self, option=None, verbose=0):
+        n = self.n
+        small = 1e-9
+        tiny = 1e-15  # so that if guess is 0 still return 0
+        bleu_list = [[] for _ in range(n)]
+
+        if self._score is not None:
+            return self._score
+
+        if option is None:
+            option = 'average' if len(self.crefs) == 1 else 'closest'
+
+        self._testlen = 0
+        self._reflen = 0
+        totalcomps = {
+            'testlen': 0,
+            'reflen': 0,
+            'guess': [0] * n,
+            'correct': [0] * n
+        }
+
+        # for each sentence
+        for comps in self.ctest:
+            testlen = comps['testlen']
+            self._testlen += testlen
+
+            if self.special_reflen is None:  # need computation
+                reflen = self._single_reflen(comps['reflen'], option, testlen)
+            else:
+                reflen = self.special_reflen
+
+            self._reflen += reflen
+
+            for key in ['guess', 'correct']:
+                for k in range(n):
+                    totalcomps[key][k] += comps[key][k]
+
+            # append per image bleu score
+            bleu = 1.
+            for k in range(n):
+                bleu *= (float(comps['correct'][k]) + tiny) / (
+                    float(comps['guess'][k]) + small)
+                bleu_list[k].append(bleu**(1. / (k + 1)))
+            ratio = (testlen + tiny) / (reflen + small
+                                        )  # N.B.: avoid zero division
+            if ratio < 1:
+                for k in range(n):
+                    bleu_list[k][-1] *= math.exp(1 - 1 / ratio)
+
+            if verbose > 1:
+                print(comps, reflen)
+
+        totalcomps['reflen'] = self._reflen
+        totalcomps['testlen'] = self._testlen
+
+        bleus = []
+        bleu = 1.
+        for k in range(n):
+            bleu *= float(totalcomps['correct'][k] + tiny) / (
+                totalcomps['guess'][k] + small)
+            bleus.append(bleu**(1. / (k + 1)))
+        ratio = (self._testlen + tiny) / (self._reflen + small
+                                          )  # N.B.: avoid zero division
+        if ratio < 1:
+            for k in range(n):
+                bleus[k] *= math.exp(1 - 1 / ratio)
+
+        if verbose > 0:
+            print(totalcomps)
+            print('ratio:', ratio)
+
+        self._score = bleus
+        return self._score, bleu_list
+
+
+def normalize(s):
+    """
+    Normalize strings to space joined chars.
+
+    Args:
+        s: a list of strings.
+
+    Returns:
+        A list of normalized strings.
+    """
+    if not s:
+        return s
+    normalized = []
+    for ss in s:
+        tokens = [c for c in list(ss) if len(c.strip()) != 0]
+        normalized.append(' '.join(tokens))
+    return normalized
+
+
+def data_check(obj, task):
+    """
+    Check data.
+
+    Raises:
+        Raises AssertionError when data is not legal.
+    """
+    assert 'question_id' in obj, "Missing 'question_id' field."
+    assert 'question_type' in obj, \
+        "Missing 'question_type' field. question_id: {}".format(obj['question_type'])
+
+    assert 'yesno_answers' in obj, \
+        "Missing 'yesno_answers' field. question_id: {}".format(obj['question_id'])
+    assert isinstance(obj['yesno_answers'], list), \
+        r"""'yesno_answers' field must be a list, if the 'question_type' is not
+            'YES_NO', then this field should be an empty list.
+            question_id: {}""".format(obj['question_id'])
+
+    assert 'entity_answers' in obj, \
+        "Missing 'entity_answers' field. question_id: {}".format(obj['question_id'])
+    assert isinstance(
+        obj['entity_answers'],
+        list) and len(obj['entity_answers']) > 0, r"""'entity_answers' field
+            must be a list, and has at least one element, which can be a empty list.
+            question_id: {}""".format(obj['question_id'])
+
+
+def read_file(file_name, task, is_ref=False):
+    """
+    Read predict answers or reference answers from file.
+
+    Args:
+        file_name: the name of the file containing predict result or reference
+                   result.
+
+    Returns:
+        A dictionary mapping question_id to the result information. The result
+        information itself is also a dictionary with has four keys:
+        - question_type: type of the query.
+        - yesno_answers: A list of yesno answers corresponding to 'answers'.
+        - answers: A list of predicted answers.
+        - entity_answers: A list, each element is also a list containing the entities
+                    tagged out from the corresponding answer string.
+    """
+
+    def _open(file_name, mode, zip_obj=None):
+        if zip_obj is not None:
+            return zip_obj.open(file_name, mode)
+        return open(file_name, mode)
+
+    results = {}
+    keys = ['answers', 'yesno_answers', 'entity_answers', 'question_type']
+    if is_ref:
+        keys += ['source']
+
+    zf = zipfile.ZipFile(file_name,
+                         'r') if file_name.endswith('.zip') else None
+    file_list = [file_name] if zf is None else zf.namelist()
+
+    for fn in file_list:
+        for line in _open(fn, 'r', zip_obj=zf):
+            try:
+                obj = json.loads(line.strip())
+            except ValueError:
+                raise ValueError('Every line of data should be legal json')
+            data_check(obj, task)
+            qid = obj['question_id']
+            assert qid not in results, 'Duplicate question_id: {}'.format(qid)
+            results[qid] = {}
+            for k in keys:
+                results[qid][k] = obj[k]
+    return results
+
+
+def compute_bleu_rouge(pred_dict, ref_dict, bleu_order=4):
+    """
+    Compute bleu and rouge scores.
+    """
+    assert set(pred_dict.keys()) == set(ref_dict.keys()), \
+        'missing keys: {}'.format(set(ref_dict.keys()) - set(pred_dict.keys()))
+    scores = {}
+    bleu_scores, _ = Bleu(bleu_order).compute_score(ref_dict, pred_dict)
+    for i, bleu_score in enumerate(bleu_scores):
+        scores['Bleu-%d' % (i + 1)] = bleu_score
+    # rouge_score, _ = Rouge().compute_score(ref_dict, pred_dict)
+    rouge_score = Rouge().get_scores(
+        list(map(lambda x: x[0], pred_dict.values())),
+        list(map(lambda x: x[0], ref_dict.values())))
+    rouge_score = sum([d['rouge-l']['f']
+                       for d in rouge_score]) / len(rouge_score)
+    scores['Rouge-L'] = rouge_score
+    return scores
+
+
+def local_prf(pred_list, ref_list):
+    """
+    Compute local precision recall and f1-score,
+    given only one prediction list and one reference list
+    """
+    common = Counter(pred_list) & Counter(ref_list)
+    num_same = sum(common.values())
+    if num_same == 0:
+        return 0, 0, 0
+    p = 1.0 * num_same / len(pred_list)
+    r = 1.0 * num_same / len(ref_list)
+    f1 = (2 * p * r) / (p + r)
+    return p, r, f1
+
+
+def compute_prf(pred_dict, ref_dict):
+    """
+    Compute precision recall and f1-score.
+    """
+    # pred_question_ids = set(pred_dict.keys())
+    ref_question_ids = set(ref_dict.keys())
+    correct_preds, total_correct, total_preds = 0, 0, 0
+    for question_id in ref_question_ids:
+        pred_entity_list = pred_dict.get(question_id, [[]])
+        assert len(pred_entity_list) == 1, \
+            'the number of entity list for question_id {} is not 1.'.format(question_id)
+        pred_entity_list = pred_entity_list[0]
+        all_ref_entity_lists = ref_dict[question_id]
+        best_local_f1 = 0
+        best_ref_entity_list = None
+        for ref_entity_list in all_ref_entity_lists:
+            local_f1 = local_prf(pred_entity_list, ref_entity_list)[2]
+            if local_f1 > best_local_f1:
+                best_ref_entity_list = ref_entity_list
+                best_local_f1 = local_f1
+        if best_ref_entity_list is None:
+            if len(all_ref_entity_lists) > 0:
+                best_ref_entity_list = sorted(
+                    all_ref_entity_lists, key=lambda x: len(x))[0]
+            else:
+                best_ref_entity_list = []
+        gold_entities = set(best_ref_entity_list)
+        pred_entities = set(pred_entity_list)
+        correct_preds += len(gold_entities & pred_entities)
+        total_preds += len(pred_entities)
+        total_correct += len(gold_entities)
+    p = float(correct_preds) / total_preds if correct_preds > 0 else 0
+    r = float(correct_preds) / total_correct if correct_preds > 0 else 0
+    f1 = 2 * p * r / (p + r) if correct_preds > 0 else 0
+    return {'Precision': p, 'Recall': r, 'F1': f1}
+
+
+def prepare_prf(pred_dict, ref_dict):
+    """
+    Prepares data for calculation of prf scores.
+    """
+    preds = {k: v['entity_answers'] for k, v in pred_dict.items()}
+    refs = {k: v['entity_answers'] for k, v in ref_dict.items()}
+    return preds, refs
+
+
+def filter_dict(result_dict, key_tag):
+    """
+    Filter a subset of the result_dict, where keys ends with 'key_tag'.
+    """
+    filtered = {}
+    for k, v in result_dict.items():
+        if k.endswith(key_tag):
+            filtered[k] = v
+    return filtered
+
+
+def get_metrics(pred_result, ref_result, task, source):
+    """
+    Computes metrics.
+    """
+    metrics = {}
+
+    ref_result_filtered = {}
+    pred_result_filtered = {}
+    if source == 'both':
+        ref_result_filtered = ref_result
+        pred_result_filtered = pred_result
+    else:
+        for question_id, info in ref_result.items():
+            if info['source'] == source:
+                ref_result_filtered[question_id] = info
+                if question_id in pred_result:
+                    pred_result_filtered[question_id] = pred_result[
+                        question_id]
+
+    if task == 'main' or task == 'all' \
+            or task == 'description':
+        pred_dict, ref_dict = prepare_bleu(pred_result_filtered,
+                                           ref_result_filtered, task)
+        metrics = compute_bleu_rouge(pred_dict, ref_dict)
+    elif task == 'yesno':
+        pred_dict, ref_dict = prepare_bleu(pred_result_filtered,
+                                           ref_result_filtered, task)
+        keys = ['Yes', 'No', 'Depends']
+        preds = [filter_dict(pred_dict, k) for k in keys]
+        refs = [filter_dict(ref_dict, k) for k in keys]
+
+        metrics = compute_bleu_rouge(pred_dict, ref_dict)
+
+        for k, pred, ref in zip(keys, preds, refs):
+            m = compute_bleu_rouge(pred, ref)
+            k_metric = [(k + '|' + key, v) for key, v in m.items()]
+            metrics.update(k_metric)
+
+    elif task == 'entity':
+        pred_dict, ref_dict = prepare_prf(pred_result_filtered,
+                                          ref_result_filtered)
+        pred_dict_bleu, ref_dict_bleu = prepare_bleu(pred_result_filtered,
+                                                     ref_result_filtered, task)
+        metrics = compute_prf(pred_dict, ref_dict)
+        metrics.update(compute_bleu_rouge(pred_dict_bleu, ref_dict_bleu))
+    else:
+        raise ValueError('Illegal task name: {}'.format(task))
+
+    return metrics
+
+
+def prepare_bleu(pred_result, ref_result, task):
+    """
+    Prepares data for calculation of bleu and rouge scores.
+    """
+    pred_list, ref_list = [], []
+    qids = ref_result.keys()
+    for qid in qids:
+        if task == 'main':
+            pred, ref = get_main_result(qid, pred_result, ref_result)
+        elif task == 'yesno':
+            pred, ref = get_yesno_result(qid, pred_result, ref_result)
+        elif task == 'all':
+            pred, ref = get_all_result(qid, pred_result, ref_result)
+        elif task == 'entity':
+            pred, ref = get_entity_result(qid, pred_result, ref_result)
+        elif task == 'description':
+            pred, ref = get_desc_result(qid, pred_result, ref_result)
+        else:
+            raise ValueError('Illegal task name: {}'.format(task))
+        if pred and ref:
+            pred_list += pred
+            ref_list += ref
+    pred_dict = dict(pred_list)
+    ref_dict = dict(ref_list)
+    for qid, ans in ref_dict.items():
+        ref_dict[qid] = normalize(ref_dict[qid])
+        pred_dict[qid] = normalize(pred_dict.get(qid, [EMPTY]))
+        if not ans or ans == [EMPTY]:
+            del ref_dict[qid]
+            del pred_dict[qid]
+
+    for k, v in pred_dict.items():
+        assert len(v) == 1, \
+            'There should be only one predict answer. question_id: {}'.format(k)
+    return pred_dict, ref_dict
+
+
+def get_main_result(qid, pred_result, ref_result):
+    """
+    Prepare answers for task 'main'.
+
+    Args:
+        qid: question_id.
+        pred_result: A dict include all question_id's result information read
+                     from args.pred_file.
+        ref_result: A dict incluce all question_id's result information read
+                    from args.ref_file.
+    Returns:
+        Two lists, the first one contains predict result, the second
+        one contains reference result of the same question_id. Each list has
+        elements of tuple (question_id, answers), 'answers' is a list of strings.
+    """
+    ref_ans = ref_result[qid]['answers']
+    if not ref_ans:
+        ref_ans = [EMPTY]
+    pred_ans = pred_result.get(qid, {}).get('answers', [])[:1]
+    if not pred_ans:
+        pred_ans = [EMPTY]
+
+    return [(qid, pred_ans)], [(qid, ref_ans)]
+
+
+def get_entity_result(qid, pred_result, ref_result):
+    """
+    Prepare answers for task 'entity'.
+
+    Args:
+        qid: question_id.
+        pred_result: A dict include all question_id's result information read
+                     from args.pred_file.
+        ref_result: A dict incluce all question_id's result information read
+                    from args.ref_file.
+    Returns:
+        Two lists, the first one contains predict result, the second
+        one contains reference result of the same question_id. Each list has
+        elements of tuple (question_id, answers), 'answers' is a list of strings.
+    """
+    if ref_result[qid]['question_type'] != 'ENTITY':
+        return None, None
+    return get_main_result(qid, pred_result, ref_result)
+
+
+def get_desc_result(qid, pred_result, ref_result):
+    """
+    Prepare answers for task 'description'.
+
+    Args:
+        qid: question_id.
+        pred_result: A dict include all question_id's result information read
+                     from args.pred_file.
+        ref_result: A dict incluce all question_id's result information read
+                    from args.ref_file.
+    Returns:
+        Two lists, the first one contains predict result, the second
+        one contains reference result of the same question_id. Each list has
+        elements of tuple (question_id, answers), 'answers' is a list of strings.
+    """
+    if ref_result[qid]['question_type'] != 'DESCRIPTION':
+        return None, None
+    return get_main_result(qid, pred_result, ref_result)
+
+
+def get_yesno_result(qid, pred_result, ref_result):
+    """
+    Prepare answers for task 'yesno'.
+
+    Args:
+        qid: question_id.
+        pred_result: A dict include all question_id's result information read
+                     from args.pred_file.
+        ref_result: A dict incluce all question_id's result information read
+                    from args.ref_file.
+    Returns:
+        Two lists, the first one contains predict result, the second
+        one contains reference result of the same question_id. Each list has
+        elements of tuple (question_id, answers), 'answers' is a list of strings.
+    """
+
+    def _uniq(li, is_ref):
+        uniq_li = []
+        left = []
+        keys = set()
+        for k, v in li:
+            if k not in keys:
+                uniq_li.append((k, v))
+                keys.add(k)
+            else:
+                left.append((k, v))
+
+        if is_ref:
+            dict_li = dict(uniq_li)
+            for k, v in left:
+                dict_li[k] += v
+            uniq_li = [(k, v) for k, v in dict_li.items()]
+        return uniq_li
+
+    def _expand_result(uniq_li):
+        expanded = uniq_li[:]
+        keys = set([x[0] for x in uniq_li])
+        for k in YESNO_LABELS - keys:
+            expanded.append((k, [EMPTY]))
+        return expanded
+
+    def _get_yesno_ans(qid, result_dict, is_ref=False):
+        if qid not in result_dict:
+            return [(str(qid) + '_' + k, v) for k, v in _expand_result([])]
+        yesno_answers = result_dict[qid]['yesno_answers']
+        answers = result_dict[qid]['answers']
+        lbl_ans = _uniq([(k, [v]) for k, v in zip(yesno_answers, answers)],
+                        is_ref)
+        ret = [(str(qid) + '_' + k, v) for k, v in _expand_result(lbl_ans)]
+        return ret
+
+    if ref_result[qid]['question_type'] != 'YES_NO':
+        return None, None
+
+    ref_ans = _get_yesno_ans(qid, ref_result, is_ref=True)
+    pred_ans = _get_yesno_ans(qid, pred_result)
+    return pred_ans, ref_ans
+
+
+def get_all_result(qid, pred_result, ref_result):
+    """
+    Prepare answers for task 'all'.
+
+    Args:
+        qid: question_id.
+        pred_result: A dict include all question_id's result information read
+                     from args.pred_file.
+        ref_result: A dict incluce all question_id's result information read
+                    from args.ref_file.
+    Returns:
+        Two lists, the first one contains predict result, the second
+        one contains reference result of the same question_id. Each list has
+        elements of tuple (question_id, answers), 'answers' is a list of strings.
+    """
+    if ref_result[qid]['question_type'] == 'YES_NO':
+        return get_yesno_result(qid, pred_result, ref_result)
+    return get_main_result(qid, pred_result, ref_result)
+
+
+def format_metrics(metrics, task, err_msg):
+    """
+    Format metrics. 'err' field returns any error occured during evaluation.
+
+    Args:
+        metrics: A dict object contains metrics for different tasks.
+        task: Task name.
+        err_msg: Exception raised during evaluation.
+    Returns:
+        Formatted result.
+    """
+    result = {}
+    sources = ['both', 'search', 'zhidao']
+    if err_msg is not None:
+        return {'errorMsg': str(err_msg), 'errorCode': 1, 'data': []}
+    data = []
+    if task != 'all' and task != 'main':
+        sources = ['both']
+
+    if task == 'entity':
+        metric_names = ['Bleu-4', 'Rouge-L']
+        metric_names_prf = ['F1', 'Precision', 'Recall']
+        for name in metric_names + metric_names_prf:
+            for src in sources:
+                obj = {
+                    'name': name,
+                    'value': round(metrics[src].get(name, 0) * 100, 2),
+                    'type': src,
+                }
+                data.append(obj)
+    elif task == 'yesno':
+        metric_names = ['Bleu-4', 'Rouge-L']
+        details = ['Yes', 'No', 'Depends']
+        src = sources[0]
+        for name in metric_names:
+            obj = {
+                'name': name,
+                'value': round(metrics[src].get(name, 0) * 100, 2),
+                'type': 'All',
+            }
+            data.append(obj)
+            for d in details:
+                obj = {
+                    'name': name,
+                    'value': round(metrics[src].get(d + '|' + name, 0) * 100,
+                                   2),
+                    'type': d
+                }
+                data.append(obj)
+    else:
+        metric_names = ['Bleu-4', 'Rouge-L']
+        for name in metric_names:
+            for src in sources:
+                obj = {
+                    'name': name,
+                    'value': round(metrics[src].get(name, 0) * 100, 2),
+                    'type': src
+                }
+                data.append(obj)
+
+    result['data'] = data
+    result['errorCode'] = 0
+    result['errorMsg'] = 'success'
+
+    return result
+
+
+def main(args):
+    """
+    Do evaluation.
+    """
+    err = None
+    metrics = {}
+    try:
+        pred_result = read_file(args.pred_file, args.task)
+        ref_result = read_file(args.ref_file, args.task, is_ref=True)
+        sources = ['both', 'search', 'zhidao']
+        if args.task not in set(['main', 'all']):
+            sources = sources[:1]
+        for source in sources:
+            metrics[source] = get_metrics(pred_result, ref_result, args.task,
+                                          source)
+    except ValueError as ve:
+        err = ve
+    except AssertionError as ae:
+        err = ae
+
+    print(
+        json.dumps(
+            format_metrics(metrics, args.task, err),
+            ensure_ascii=False).encode('utf8'))
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('pred_file', help='predict file')
+    parser.add_argument('ref_file', help='reference file')
+    parser.add_argument(
+        'task', help='task name: Main|Yes_No|All|Entity|Description')
+
+    args = parser.parse_args()
+    args.task = args.task.lower().replace('_', '')
+    main(args)
--- a/modelscope/models/nlp/palm_v2/modeling_palm.py
+++ b/modelscope/models/nlp/palm_v2/modeling_palm.py
--- a/modelscope/models/nlp/palm_v2/palm_for_text_generation.py
+++ b/modelscope/models/nlp/palm_v2/palm_for_text_generation.py
@@ -22,8 +22,8 @@ class PalmForTextGeneration(TorchModel):
        """
        super().__init__(model_dir, *args, **kwargs)

-        from sofa.models.palm_v2 import (PalmForConditionalGeneration,
-                                         Translator)
+        from modelscope.models.nlp.palm_v2 import (
+            PalmForConditionalGeneration, Translator)
        self.model = PalmForConditionalGeneration.from_pretrained(model_dir)
        self.tokenizer = self.model.tokenizer
        self.generator = Translator(self.model)
--- a/modelscope/models/nlp/sbert_for_nli.py
+++ b/modelscope/models/nlp/sbert_for_nli.py
@@ -1,23 +0,0 @@
-from modelscope.metainfo import Models
-from modelscope.models.builder import MODELS
-from modelscope.utils.constant import Tasks
-from .sbert_for_sequence_classification import \
-    SbertForSequenceClassificationBase
-
-__all__ = ['SbertForNLI']
-
-
-@MODELS.register_module(Tasks.nli, module_name=Models.structbert)
-class SbertForNLI(SbertForSequenceClassificationBase):
-
-    def __init__(self, model_dir: str, *args, **kwargs):
-        """initialize the text generation model from the `model_dir` path.
-
-        Args:
-            model_dir (str): the model path.
-            model_cls (Optional[Any], optional): model loader, if None, use the
-                default loader to load model weights, by default None.
-        """
-        super().__init__(
-            model_dir, *args, model_args={'num_labels': 3}, **kwargs)
-        assert self.model.config.num_labels == 3
--- a/modelscope/models/nlp/sbert_for_sentence_similarity.py
+++ b/modelscope/models/nlp/sbert_for_sentence_similarity.py
@@ -1,25 +0,0 @@
-from modelscope.metainfo import Models
-from modelscope.models.builder import MODELS
-from modelscope.utils.constant import Tasks
-from .sbert_for_sequence_classification import \
-    SbertForSequenceClassificationBase
-
-__all__ = ['SbertForSentenceSimilarity']
-
-
-@MODELS.register_module(
-    Tasks.sentence_similarity, module_name=Models.structbert)
-class SbertForSentenceSimilarity(SbertForSequenceClassificationBase):
-
-    def __init__(self, model_dir: str, *args, **kwargs):
-        """initialize the sentence similarity model from the `model_dir` path.
-
-        Args:
-            model_dir (str): the model path.
-            model_cls (Optional[Any], optional): model loader, if None, use the
-                default loader to load model weights, by default None.
-        """
-        super().__init__(
-            model_dir, *args, model_args={'num_labels': 2}, **kwargs)
-        self.model_dir = model_dir
-        assert self.model.config.num_labels == 2
--- a/modelscope/models/nlp/sbert_for_sentiment_classification.py
+++ b/modelscope/models/nlp/sbert_for_sentiment_classification.py
@@ -1,22 +0,0 @@
-from modelscope.metainfo import Models
-from modelscope.models.builder import MODELS
-from modelscope.utils.constant import Tasks
-from .sbert_for_sequence_classification import \
-    SbertForSequenceClassificationBase
-
-__all__ = ['SbertForSentimentClassification']
-
-
-@MODELS.register_module(
-    Tasks.sentiment_classification, module_name=Models.structbert)
-class SbertForSentimentClassification(SbertForSequenceClassificationBase):
-
-    def __init__(self, model_dir: str, *args, **kwargs):
-        """initialize the text generation model from the `model_dir` path.
-
-        Args:
-            model_dir (str): the model path.
-        """
-        super().__init__(
-            model_dir, *args, model_args={'num_labels': 2}, **kwargs)
-        assert self.model.config.num_labels == 2
--- a/modelscope/models/nlp/sbert_for_sequence_classification.py
+++ b/modelscope/models/nlp/sbert_for_sequence_classification.py
@@ -1,82 +0,0 @@
-import os
-from typing import Any, Dict
-
-import json
-import numpy as np
-import torch
-from sofa.models.sbert.modeling_sbert import SbertModel, SbertPreTrainedModel
-from torch import nn
-
-from modelscope.models import TorchModel
-
-
-class SbertTextClassfier(SbertPreTrainedModel):
-
-    def __init__(self, config):
-        super().__init__(config)
-        self.num_labels = config.num_labels
-        self.config = config
-        self.encoder = SbertModel(config, add_pooling_layer=True)
-        self.dropout = nn.Dropout(config.hidden_dropout_prob)
-        self.classifier = nn.Linear(config.hidden_size, config.num_labels)
-
-    def forward(self,
-                input_ids=None,
-                token_type_ids=None,
-                labels=None,
-                **kwargs):
-        outputs = self.encoder(
-            input_ids,
-            token_type_ids=token_type_ids,
-            return_dict=None,
-        )
-        pooled_output = outputs[1]
-        pooled_output = self.dropout(pooled_output)
-        logits = self.classifier(pooled_output)
-        if labels is not None:
-            loss_fct = nn.CrossEntropyLoss()
-            loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
-            return {'logits': logits, 'loss': loss}
-        return {'logits': logits}
-
-    def build(**kwags):
-        return SbertTextClassfier.from_pretrained(model_dir, **model_args)
-
-
-class SbertForSequenceClassificationBase(TorchModel):
-
-    def __init__(self, model_dir: str, model_args=None, *args, **kwargs):
-        super().__init__(model_dir, *args, **kwargs)
-        if model_args is None:
-            model_args = {}
-        self.model = SbertTextClassfier.from_pretrained(
-            model_dir, **model_args)
-        self.id2label = {}
-        self.label_path = os.path.join(self.model_dir, 'label_mapping.json')
-        if os.path.exists(self.label_path):
-            with open(self.label_path) as f:
-                self.label_mapping = json.load(f)
-            self.id2label = {
-                idx: name
-                for name, idx in self.label_mapping.items()
-            }
-
-    def train(self):
-        return self.model.train()
-
-    def eval(self):
-        return self.model.eval()
-
-    def forward(self, input: Dict[str, Any]) -> Dict[str, np.ndarray]:
-        input_ids = torch.tensor(input['input_ids'], dtype=torch.long)
-        token_type_ids = torch.tensor(
-            input['token_type_ids'], dtype=torch.long)
-        return self.model.forward(input_ids, token_type_ids)
-
-    def postprocess(self, input, **kwargs):
-        logits = input['logits']
-        probs = logits.softmax(-1).cpu().numpy()
-        pred = logits.argmax(-1).cpu().numpy()
-        logits = logits.cpu().numpy()
-        res = {'predictions': pred, 'probabilities': probs, 'logits': logits}
-        return res
--- a/modelscope/models/nlp/sbert_for_token_classification.py
+++ b/modelscope/models/nlp/sbert_for_token_classification.py
@@ -1,64 +0,0 @@
-from typing import Any, Dict, Union
-
-import numpy as np
-import torch
-
-from modelscope.metainfo import Models
-from modelscope.models import TorchModel
-from modelscope.models.base import Tensor
-from modelscope.models.builder import MODELS
-from modelscope.utils.constant import Tasks
-
-__all__ = ['SbertForTokenClassification']
-
-
-@MODELS.register_module(Tasks.word_segmentation, module_name=Models.structbert)
-class SbertForTokenClassification(TorchModel):
-
-    def __init__(self, model_dir: str, *args, **kwargs):
-        """initialize the word segmentation model from the `model_dir` path.
-
-        Args:
-            model_dir (str): the model path.
-            model_cls (Optional[Any], optional): model loader, if None, use the
-                default loader to load model weights, by default None.
-        """
-        super().__init__(model_dir, *args, **kwargs)
-        self.model_dir = model_dir
-        import sofa
-        self.model = sofa.SbertForTokenClassification.from_pretrained(
-            self.model_dir)
-        self.config = sofa.SbertConfig.from_pretrained(self.model_dir)
-
-    def train(self):
-        return self.model.train()
-
-    def eval(self):
-        return self.model.eval()
-
-    def forward(self, input: Dict[str,
-                                  Any]) -> Dict[str, Union[str, np.ndarray]]:
-        """return the result by the model
-
-        Args:
-            input (Dict[str, Any]): the preprocessed data
-
-        Returns:
-            Dict[str, Union[str,np.ndarray]]: results
-                Example:
-                    {
-                        'predictions': array([1,4]), # lable 0-negative 1-positive
-                        'logits': array([[-0.53860897,  1.5029076 ]], dtype=float32) # true value
-                        'text': str(今天),
-                    }
-        """
-        input_ids = torch.tensor(input['input_ids']).unsqueeze(0)
-        return {**self.model(input_ids), 'text': input['text']}
-
-    def postprocess(self, input: Dict[str, Tensor],
-                    **kwargs) -> Dict[str, Tensor]:
-        logits = input['logits']
-        pred = torch.argmax(logits[0], dim=-1)
-        pred = pred.cpu().numpy()
-        rst = {'predictions': pred, 'logits': logits, 'text': input['text']}
-        return rst
--- a/modelscope/models/nlp/sbert_for_zero_shot_classification.py
+++ b/modelscope/models/nlp/sbert_for_zero_shot_classification.py
@@ -1,50 +0,0 @@
-from typing import Any, Dict
-
-import numpy as np
-
-from modelscope.metainfo import Models
-from modelscope.models import TorchModel
-from modelscope.models.builder import MODELS
-from modelscope.utils.constant import Tasks
-
-__all__ = ['SbertForZeroShotClassification']
-
-
-@MODELS.register_module(
-    Tasks.zero_shot_classification, module_name=Models.structbert)
-class SbertForZeroShotClassification(TorchModel):
-
-    def __init__(self, model_dir: str, *args, **kwargs):
-        """initialize the zero shot classification model from the `model_dir` path.
-
-        Args:
-            model_dir (str): the model path.
-        """
-
-        super().__init__(model_dir, *args, **kwargs)
-        from sofa import SbertForSequenceClassification
-        self.model = SbertForSequenceClassification.from_pretrained(model_dir)
-
-    def train(self):
-        return self.model.train()
-
-    def eval(self):
-        return self.model.eval()
-
-    def forward(self, input: Dict[str, Any]) -> Dict[str, np.ndarray]:
-        """return the result by the model
-
-        Args:
-            input (Dict[str, Any]): the preprocessed data
-
-        Returns:
-            Dict[str, np.ndarray]: results
-                Example:
-                    {
-                        'logits': array([[-0.53860897,  1.5029076 ]], dtype=float32) # true value
-                    }
-        """
-        outputs = self.model(**input)
-        logits = outputs['logits'].cpu().numpy()
-        res = {'logits': logits}
-        return res
--- a/modelscope/models/nlp/sequence_classification.py
+++ b/modelscope/models/nlp/sequence_classification.py
@@ -1,85 +1,174 @@
-import os
-from typing import Any, Dict
+from abc import abstractmethod

-import json
-import numpy as np
+from torch import nn

-from modelscope.metainfo import TaskModels
+from modelscope.metainfo import Models
+from modelscope.models.base import TorchModel
 from modelscope.models.builder import MODELS
+from modelscope.models.nlp.structbert import SbertPreTrainedModel
+from modelscope.models.nlp.veco import \
+    VecoForSequenceClassification as VecoForSequenceClassificationTransform
 from modelscope.outputs import OutputKeys
 from modelscope.utils.constant import Tasks
-from .task_model import SingleBackboneTaskModelBase
+from modelscope.utils.hub import parse_label_mapping
+from modelscope.utils.tensor_utils import (torch_nested_detach,
+                                           torch_nested_numpify)

-__all__ = ['SequenceClassificationModel']
+__all__ = ['SbertForSequenceClassification', 'VecoForSequenceClassification']


-@MODELS.register_module(
-    Tasks.sentiment_classification, module_name=TaskModels.text_classification)
-@MODELS.register_module(
-    Tasks.text_classification, module_name=TaskModels.text_classification)
-class SequenceClassificationModel(SingleBackboneTaskModelBase):
+class SequenceClassificationBase(TorchModel):
+    base_model_prefix: str = 'bert'

-    def __init__(self, model_dir: str, *args, **kwargs):
-        """initialize the sequence classification model from the `model_dir` path.
+    def __init__(self, config, model_dir):
+        super().__init__(model_dir)
+        self.num_labels = config.num_labels
+        self.config = config
+        setattr(self, self.base_model_prefix, self.build_base_model())
+        self.dropout = nn.Dropout(config.hidden_dropout_prob)
+        self.classifier = nn.Linear(config.hidden_size, config.num_labels)

-        Args:
-            model_dir (str): the model path.
+    @abstractmethod
+    def build_base_model(self):
+        """Build the backbone model.
+
+        Returns: the backbone instance.
        """
-        super().__init__(model_dir, *args, **kwargs)
-        if 'base_model_prefix' in kwargs:
-            self._base_model_prefix = kwargs['base_model_prefix']
+        pass

-        backbone_cfg = self.cfg.backbone
-        head_cfg = self.cfg.head
+    @property
+    def base_model(self):
+        return getattr(self, self.base_model_prefix)

-        # get the num_labels from label_mapping.json
-        self.id2label = {}
-        self.label_path = os.path.join(model_dir, 'label_mapping.json')
-        if os.path.exists(self.label_path):
-            with open(self.label_path) as f:
-                self.label_mapping = json.load(f)
-            self.id2label = {
-                idx: name
-                for name, idx in self.label_mapping.items()
-            }
-        head_cfg['num_labels'] = len(self.label_mapping)
+    def forward(self, **kwargs):
+        labels = None
+        if OutputKeys.LABEL in kwargs:
+            labels = kwargs.pop(OutputKeys.LABEL)
+        elif OutputKeys.LABELS in kwargs:
+            labels = kwargs.pop(OutputKeys.LABELS)

-        self.build_backbone(backbone_cfg)
-        self.build_head(head_cfg)
+        outputs = self.base_model.forward(**kwargs)

-    def forward(self, input: Dict[str, Any]) -> Dict[str, np.ndarray]:
-        outputs = super().forward(input)
-        sequence_output, pooled_output = self.extract_backbone_outputs(outputs)
-        outputs = self.head.forward(pooled_output)
-        if 'labels' in input:
-            loss = self.compute_loss(outputs, input['labels'])
-            outputs.update(loss)
-        return outputs
-
-    def extract_logits(self, outputs):
-        return outputs[OutputKeys.LOGITS].cpu().detach()
-
-    def extract_backbone_outputs(self, outputs):
-        sequence_output = None
-        pooled_output = None
-        if hasattr(self.backbone, 'extract_sequence_outputs'):
-            sequence_output = self.backbone.extract_sequence_outputs(outputs)
-        if hasattr(self.backbone, 'extract_pooled_outputs'):
-            pooled_output = self.backbone.extract_pooled_outputs(outputs)
-        return sequence_output, pooled_output
-
-    def compute_loss(self, outputs, labels):
-        loss = self.head.compute_loss(outputs, labels)
-        return loss
+        # backbone model should return pooled_output as its second output
+        pooled_output = outputs[1]
+        pooled_output = self.dropout(pooled_output)
+        logits = self.classifier(pooled_output)
+        if labels is not None:
+            loss_fct = nn.CrossEntropyLoss()
+            loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
+            return {OutputKeys.LOGITS: logits, OutputKeys.LOSS: loss}
+        return {OutputKeys.LOGITS: logits}

    def postprocess(self, input, **kwargs):
-        logits = self.extract_logits(input)
-        probs = logits.softmax(-1).numpy()
-        pred = logits.argmax(-1).numpy()
-        logits = logits.numpy()
+        logits = input[OutputKeys.LOGITS]
+        probs = torch_nested_numpify(torch_nested_detach(logits.softmax(-1)))
+        pred = torch_nested_numpify(torch_nested_detach(logits.argmax(-1)))
+        logits = torch_nested_numpify(torch_nested_detach(logits))
        res = {
            OutputKeys.PREDICTIONS: pred,
            OutputKeys.PROBABILITIES: probs,
            OutputKeys.LOGITS: logits
        }
        return res
+
+
+@MODELS.register_module(
+    Tasks.sentence_similarity, module_name=Models.structbert)
+@MODELS.register_module(
+    Tasks.sentiment_classification, module_name=Models.structbert)
+@MODELS.register_module(Tasks.nli, module_name=Models.structbert)
+@MODELS.register_module(
+    Tasks.zero_shot_classification, module_name=Models.structbert)
+class SbertForSequenceClassification(SequenceClassificationBase,
+                                     SbertPreTrainedModel):
+    base_model_prefix: str = 'bert'
+    supports_gradient_checkpointing = True
+    _keys_to_ignore_on_load_missing = [r'position_ids']
+
+    def __init__(self, config, model_dir):
+        if hasattr(config, 'base_model_prefix'):
+            SbertForSequenceClassification.base_model_prefix = config.base_model_prefix
+        super().__init__(config, model_dir)
+
+    def build_base_model(self):
+        from .structbert import SbertModel
+        return SbertModel(self.config, add_pooling_layer=True)
+
+    def forward(self,
+                input_ids=None,
+                attention_mask=None,
+                token_type_ids=None,
+                labels=None,
+                **kwargs):
+        return super().forward(
+            input_ids=input_ids,
+            attention_mask=attention_mask,
+            token_type_ids=token_type_ids,
+            labels=labels)
+
+    @classmethod
+    def _instantiate(cls, **kwargs):
+        model_dir = kwargs.get('model_dir')
+        num_labels = kwargs.get('num_labels')
+        if num_labels is None:
+            label2id = parse_label_mapping(model_dir)
+            if label2id is not None and len(label2id) > 0:
+                num_labels = len(label2id)
+
+        model_args = {} if num_labels is None else {'num_labels': num_labels}
+        return super(SbertPreTrainedModel,
+                     SbertForSequenceClassification).from_pretrained(
+                         pretrained_model_name_or_path=kwargs.get('model_dir'),
+                         model_dir=kwargs.get('model_dir'),
+                         **model_args)
+
+
+@MODELS.register_module(Tasks.sentence_similarity, module_name=Models.veco)
+@MODELS.register_module(
+    Tasks.sentiment_classification, module_name=Models.veco)
+@MODELS.register_module(Tasks.nli, module_name=Models.veco)
+class VecoForSequenceClassification(TorchModel,
+                                    VecoForSequenceClassificationTransform):
+
+    def __init__(self, config, model_dir):
+        super().__init__(model_dir)
+        VecoForSequenceClassificationTransform.__init__(self, config)
+
+    def forward(self,
+                input_ids=None,
+                attention_mask=None,
+                token_type_ids=None,
+                position_ids=None,
+                head_mask=None,
+                inputs_embeds=None,
+                labels=None,
+                output_attentions=None,
+                output_hidden_states=None,
+                **kwargs):
+        return VecoForSequenceClassificationTransform.forward(
+            self,
+            input_ids=input_ids,
+            attention_mask=attention_mask,
+            token_type_ids=token_type_ids,
+            position_ids=position_ids,
+            head_mask=head_mask,
+            inputs_embeds=inputs_embeds,
+            output_attentions=output_attentions,
+            output_hidden_states=output_hidden_states,
+            labels=labels)
+
+    @classmethod
+    def _instantiate(cls, **kwargs):
+        model_dir = kwargs.get('model_dir')
+        num_labels = kwargs.get('num_labels')
+        if num_labels is None:
+            label2id = parse_label_mapping(model_dir)
+            if label2id is not None and len(label2id) > 0:
+                num_labels = len(label2id)
+
+        model_args = {} if num_labels is None else {'num_labels': num_labels}
+        return super(VecoForSequenceClassificationTransform,
+                     VecoForSequenceClassification).from_pretrained(
+                         pretrained_model_name_or_path=kwargs.get('model_dir'),
+                         model_dir=kwargs.get('model_dir'),
+                         **model_args)
--- a/modelscope/models/nlp/space/init.py
+++ b/modelscope/models/nlp/space/init.py
@@ -0,0 +1,28 @@
+from typing import TYPE_CHECKING
+
+from modelscope.utils.import_utils import LazyImportModule
+
+if TYPE_CHECKING:
+    from .model import SpaceGenerator
+    from .model import SpaceModelBase, SpaceTokenizer, SpaceConfig
+    from .space_for_dialog_intent_prediction import SpaceForDialogIntent
+    from .space_for_dialog_modeling import SpaceForDialogModeling
+    from .space_for_dialog_state_tracking import SpaceForDialogStateTracking
+else:
+    _import_structure = {
+        'model':
+        ['SpaceGenerator', 'SpaceModelBase', 'SpaceTokenizer', 'SpaceConfig'],
+        'space_for_dialog_intent_prediction': ['SpaceForDialogIntent'],
+        'space_for_dialog_modeling': ['SpaceForDialogModeling'],
+        'space_for_dialog_state_tracking': ['SpaceForDialogStateTracking'],
+    }
+
+    import sys
+
+    sys.modules[__name__] = LazyImportModule(
+        __name__,
+        globals()['__file__'],
+        _import_structure,
+        module_spec=__spec__,
+        extra_objects={},
+    )
--- a/modelscope/models/nlp/space/model/init.py
+++ b/modelscope/models/nlp/space/model/init.py
@@ -0,0 +1,10 @@
+from .configuration_space import SpaceConfig
+from .gen_unified_transformer import GenUnifiedTransformer
+from .generator import Generator as SpaceGenerator
+from .intent_unified_transformer import IntentUnifiedTransformer
+from .model_base import SpaceModelBase
+from .modeling_space import (SpaceForDST, SpaceForMaskedLM,
+                             SpaceForPreTraining, SpaceModel)
+from .tokenization_space import (BasicTokenizer, SpaceTokenizer,
+                                 WordpieceTokenizer)
+from .unified_transformer import UnifiedTransformer
--- a/modelscope/models/nlp/space/model/configuration_space.py
+++ b/modelscope/models/nlp/space/model/configuration_space.py
@@ -0,0 +1,32 @@
+# Copyright 2021-2022 The Alibaba DAMO NLP Team Authors.
+# Copyright 2018 The Google AI Language Team Authors.
+# Copyright 2020 The HuggingFace Inc. team.
+# Copyright (c) 2018, NVIDIA CORPORATION.
+# All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Space configuration, mainly copied from :class:`~transformers.configuration_xlm_roberta` """
+
+from modelscope.models.nlp.structbert import SbertConfig
+from modelscope.utils import logger as logging
+
+logger = logging.get_logger(__name__)
+
+
+class SpaceConfig(SbertConfig):
+    """
+    This class overrides [`SbertConfig`]. Please check the superclass for the appropriate
+    documentation alongside usage examples.
+    """
+
+    model_type = 'space'
--- a/modelscope/models/nlp/backbones/space/model/gen_unified_transformer.py
+++ b/modelscope/models/nlp/backbones/space/model/gen_unified_transformer.py
--- a/modelscope/models/nlp/backbones/space/model/generator.py
+++ b/modelscope/models/nlp/backbones/space/model/generator.py
--- a/modelscope/models/nlp/backbones/space/model/intent_unified_transformer.py
+++ b/modelscope/models/nlp/backbones/space/model/intent_unified_transformer.py
--- a/modelscope/models/nlp/backbones/space/model/model_base.py
+++ b/modelscope/models/nlp/backbones/space/model/model_base.py
--- a/modelscope/models/nlp/space/model/modeling_space.py
+++ b/modelscope/models/nlp/space/model/modeling_space.py
@@ -0,0 +1,268 @@
+# Copyright 2019 Facebook AI Research and the HuggingFace Inc. team.
+# Copyright (c) 2018, NVIDIA CORPORATION.
+# Copyright 2021-2022 The Alibaba DAMO NLP Team Authors.
+# All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""PyTorch Space model. mainly copied from :module:`~transformers.modeling_xlm_roberta`"""
+
+import torch
+from torch import nn
+from torch.nn import CrossEntropyLoss
+from transformers.file_utils import add_start_docstrings
+
+from modelscope.models.nlp.structbert.modeling_sbert import (
+    SbertForMaskedLM, SbertModel, SbertPreTrainedModel)
+from .configuration_space import SpaceConfig
+
+SPACE_START_DOCSTRING = r"""
+
+    This model inherits from [`PreTrainedModel`]. Check the superclass documentation for the generic
+    methods the library implements for all its model (such as downloading or saving, resizing the input embeddings,
+    pruning heads etc.)
+
+    This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module)
+    subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to
+    general usage and behavior.
+
+    Parameters:
+        config ([`SpaceConfig`]): Model configuration class with all the parameters of the
+            model. Initializing with a config file does not load the weights associated with the model, only the
+            configuration. Check out the [`~PreTrainedModel.from_pretrained`] method to load the model
+            weights.
+"""
+
+
+@add_start_docstrings(
+    'The bare Space Model transformer outputting raw hidden-states without any specific head on top. '
+    'It is identical with the Bert Model from Transformers',
+    SPACE_START_DOCSTRING,
+)
+class SpaceModel(SbertModel):
+    """
+    This class overrides [`SbertModel`]. Please check the superclass for the appropriate
+    documentation alongside usage examples.
+    """
+
+    config_class = SpaceConfig
+
+
+@add_start_docstrings(
+    """
+    Space Model transformer with Dialog state tracking heads on top (a inform projection
+    layer with a dialog state layer and a set of slots including history infromation from
+    previous dialog) e.g. for multiwoz2.2 tasks.
+    """,
+    SPACE_START_DOCSTRING,
+)
+class SpaceForDST(SbertPreTrainedModel):
+
+    def __init__(self, config):
+        super(SpaceForDST, self).__init__(config)
+        self.slot_list = config.dst_slot_list
+        self.class_types = config.dst_class_types
+        self.class_labels = config.dst_class_labels
+        self.token_loss_for_nonpointable = config.dst_token_loss_for_nonpointable
+        self.refer_loss_for_nonpointable = config.dst_refer_loss_for_nonpointable
+        self.class_aux_feats_inform = config.dst_class_aux_feats_inform
+        self.class_aux_feats_ds = config.dst_class_aux_feats_ds
+        self.class_loss_ratio = config.dst_class_loss_ratio
+
+        # Only use refer loss if refer class is present in dataset.
+        if 'refer' in self.class_types:
+            self.refer_index = self.class_types.index('refer')
+        else:
+            self.refer_index = -1
+
+        self.bert = SpaceModel(config)
+        self.dropout = nn.Dropout(config.dst_dropout_rate)
+        self.dropout_heads = nn.Dropout(config.dst_heads_dropout_rate)
+
+        if self.class_aux_feats_inform:
+            self.add_module(
+                'inform_projection',
+                nn.Linear(len(self.slot_list), len(self.slot_list)))
+        if self.class_aux_feats_ds:
+            self.add_module(
+                'ds_projection',
+                nn.Linear(len(self.slot_list), len(self.slot_list)))
+
+        aux_dims = len(self.slot_list) * (
+            self.class_aux_feats_inform + self.class_aux_feats_ds
+        )  # second term is 0, 1 or 2
+
+        for slot in self.slot_list:
+            self.add_module(
+                'class_' + slot,
+                nn.Linear(config.hidden_size + aux_dims, self.class_labels))
+            self.add_module('token_' + slot, nn.Linear(config.hidden_size, 2))
+            self.add_module(
+                'refer_' + slot,
+                nn.Linear(config.hidden_size + aux_dims,
+                          len(self.slot_list) + 1))
+
+        self.init_weights()
+
+    def forward(self,
+                input_ids,
+                input_mask=None,
+                segment_ids=None,
+                position_ids=None,
+                head_mask=None,
+                start_pos=None,
+                end_pos=None,
+                inform_slot_id=None,
+                refer_id=None,
+                class_label_id=None,
+                diag_state=None):
+        outputs = self.bert(
+            input_ids,
+            attention_mask=input_mask,
+            token_type_ids=segment_ids,
+            position_ids=position_ids,
+            head_mask=head_mask)
+
+        sequence_output = outputs[0]
+        pooled_output = outputs[1]
+
+        sequence_output = self.dropout(sequence_output)
+        pooled_output = self.dropout(pooled_output)
+
+        # TODO: establish proper format in labels already?
+        if inform_slot_id is not None:
+            inform_labels = torch.stack(list(inform_slot_id.values()),
+                                        1).float()
+        if diag_state is not None:
+            diag_state_labels = torch.clamp(
+                torch.stack(list(diag_state.values()), 1).float(), 0.0, 1.0)
+
+        total_loss = 0
+        per_slot_per_example_loss = {}
+        per_slot_class_logits = {}
+        per_slot_start_logits = {}
+        per_slot_end_logits = {}
+        per_slot_refer_logits = {}
+        for slot in self.slot_list:
+            if self.class_aux_feats_inform and self.class_aux_feats_ds:
+                pooled_output_aux = torch.cat(
+                    (pooled_output, self.inform_projection(inform_labels),
+                     self.ds_projection(diag_state_labels)), 1)
+            elif self.class_aux_feats_inform:
+                pooled_output_aux = torch.cat(
+                    (pooled_output, self.inform_projection(inform_labels)), 1)
+            elif self.class_aux_feats_ds:
+                pooled_output_aux = torch.cat(
+                    (pooled_output, self.ds_projection(diag_state_labels)), 1)
+            else:
+                pooled_output_aux = pooled_output
+            class_logits = self.dropout_heads(
+                getattr(self, 'class_' + slot)(pooled_output_aux))
+
+            token_logits = self.dropout_heads(
+                getattr(self, 'token_' + slot)(sequence_output))
+            start_logits, end_logits = token_logits.split(1, dim=-1)
+            start_logits = start_logits.squeeze(-1)
+            end_logits = end_logits.squeeze(-1)
+
+            refer_logits = self.dropout_heads(
+                getattr(self, 'refer_' + slot)(pooled_output_aux))
+
+            per_slot_class_logits[slot] = class_logits
+            per_slot_start_logits[slot] = start_logits
+            per_slot_end_logits[slot] = end_logits
+            per_slot_refer_logits[slot] = refer_logits
+
+            # If there are no labels, don't compute loss
+            if class_label_id is not None and start_pos is not None and end_pos is not None and refer_id is not None:
+                # If we are on multi-GPU, split add a dimension
+                if len(start_pos[slot].size()) > 1:
+                    start_pos[slot] = start_pos[slot].squeeze(-1)
+                if len(end_pos[slot].size()) > 1:
+                    end_pos[slot] = end_pos[slot].squeeze(-1)
+                # sometimes the start/end positions are outside our model inputs, we ignore these terms
+                ignored_index = start_logits.size(1)  # This is a single index
+                start_pos[slot].clamp_(0, ignored_index)
+                end_pos[slot].clamp_(0, ignored_index)
+
+                class_loss_fct = CrossEntropyLoss(reduction='none')
+                token_loss_fct = CrossEntropyLoss(
+                    reduction='none', ignore_index=ignored_index)
+                refer_loss_fct = CrossEntropyLoss(reduction='none')
+
+                start_loss = token_loss_fct(start_logits, start_pos[slot])
+                end_loss = token_loss_fct(end_logits, end_pos[slot])
+                token_loss = (start_loss + end_loss) / 2.0
+
+                token_is_pointable = (start_pos[slot] > 0).float()
+                if not self.token_loss_for_nonpointable:
+                    token_loss *= token_is_pointable
+
+                refer_loss = refer_loss_fct(refer_logits, refer_id[slot])
+                token_is_referrable = torch.eq(class_label_id[slot],
+                                               self.refer_index).float()
+                if not self.refer_loss_for_nonpointable:
+                    refer_loss *= token_is_referrable
+
+                class_loss = class_loss_fct(class_logits, class_label_id[slot])
+
+                if self.refer_index > -1:
+                    per_example_loss = (self.class_loss_ratio) * class_loss + (
+                        (1 - self.class_loss_ratio) / 2) * token_loss + (
+                            (1 - self.class_loss_ratio) / 2) * refer_loss
+                else:
+                    per_example_loss = self.class_loss_ratio * class_loss + (
+                        1 - self.class_loss_ratio) * token_loss
+
+                total_loss += per_example_loss.sum()
+                per_slot_per_example_loss[slot] = per_example_loss
+
+        # add hidden states and attention if they are here
+        outputs = (total_loss, ) + (
+            per_slot_per_example_loss,
+            per_slot_class_logits,
+            per_slot_start_logits,
+            per_slot_end_logits,
+            per_slot_refer_logits,
+        ) + outputs[2:]
+
+        return outputs
+
+
+@add_start_docstrings(
+    'The Space Model Model with a `language modeling` head on tops',
+    SPACE_START_DOCSTRING,
+)
+class SpaceForMaskedLM(SbertForMaskedLM):
+    """
+    This class overrides [`SbertForMaskedLM`]. Please check the superclass for the
+    appropriate documentation alongside usage examples.
+    """
+
+    config_class = SpaceConfig
+
+
+@add_start_docstrings(
+    """
+    Space Model with only one head on top as done during the pretraining: a `masked language modeling` head.
+    """,
+    SPACE_START_DOCSTRING,
+)
+class SpaceForPreTraining(SbertPreTrainedModel):
+
+    def __init__(self, model_name_or_path: str):
+        super(SpaceForPreTraining, self).__init__()
+        self.bert_model = SpaceForMaskedLM.from_pretrained(model_name_or_path)
+
+    def forward(self, input_ids: torch.tensor, mlm_labels: torch.tensor):
+        outputs = self.bert_model(input_ids, masked_lm_labels=mlm_labels)
+        return outputs[0]
--- a/modelscope/models/nlp/space/model/tokenization_space.py
+++ b/modelscope/models/nlp/space/model/tokenization_space.py
@@ -0,0 +1,29 @@
+# Copyright 2018 Google AI, Google Brain and Carnegie Mellon University Authors and the HuggingFace Inc. team.
+# Copyright 2021-2022 The Alibaba DAMO NLP Team Authors.
+# All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License
+"""Tokenization classes for Space. mainly copied from :module:`~transformers.tokenization_xlm_roberta`"""
+
+from modelscope.models.nlp.structbert import (BasicTokenizer, SbertTokenizer,
+                                              WordpieceTokenizer)
+from modelscope.utils import logger as logging
+
+logger = logging.get_logger(__name__)
+
+
+class SpaceTokenizer(SbertTokenizer):
+    """
+    This class overrides [`SpaceTokenizer`]. Please check the superclass for the appropriate
+    documentation alongside usage examples.
+    """
--- a/modelscope/models/nlp/backbones/space/model/unified_transformer.py
+++ b/modelscope/models/nlp/backbones/space/model/unified_transformer.py
@@ -5,10 +5,9 @@ import torch
 import torch.nn as nn
 import torch.nn.functional as F

-from modelscope.models.nlp.backbones.space.model.model_base import \
-    SpaceModelBase
-from modelscope.models.nlp.backbones.space.modules.embedder import Embedder
-from modelscope.models.nlp.backbones.space.modules.transformer_block import \
+from modelscope.models.nlp.space.model.model_base import SpaceModelBase
+from modelscope.models.nlp.space.modules.embedder import Embedder
+from modelscope.models.nlp.space.modules.transformer_block import \
    TransformerBlock


--- a/modelscope/models/nlp/backbones/space/modules/init.py
+++ b/modelscope/models/nlp/backbones/space/modules/init.py
--- a/modelscope/models/nlp/backbones/space/modules/embedder.py
+++ b/modelscope/models/nlp/backbones/space/modules/embedder.py
--- a/modelscope/models/nlp/backbones/space/modules/feedforward.py
+++ b/modelscope/models/nlp/backbones/space/modules/feedforward.py
--- a/modelscope/models/nlp/backbones/space/modules/functions.py
+++ b/modelscope/models/nlp/backbones/space/modules/functions.py
--- a/modelscope/models/nlp/backbones/space/modules/multihead_attention.py
+++ b/modelscope/models/nlp/backbones/space/modules/multihead_attention.py
--- a/modelscope/models/nlp/backbones/space/modules/transformer_block.py
+++ b/modelscope/models/nlp/backbones/space/modules/transformer_block.py
--- a/modelscope/models/nlp/space/space_for_dialog_intent_prediction.py
+++ b/modelscope/models/nlp/space/space_for_dialog_intent_prediction.py
@@ -7,7 +7,7 @@ from modelscope.metainfo import Models
 from modelscope.models import TorchModel
 from modelscope.models.base import Tensor
 from modelscope.models.builder import MODELS
-from modelscope.models.nlp.backbones import SpaceGenerator, SpaceModelBase
+from modelscope.models.nlp.space import SpaceGenerator, SpaceModelBase
 from modelscope.preprocessors.space import IntentBPETextField
 from modelscope.utils.config import Config
 from modelscope.utils.constant import ModelFile, Tasks
--- a/modelscope/models/nlp/space/space_for_dialog_modeling.py
+++ b/modelscope/models/nlp/space/space_for_dialog_modeling.py
@@ -7,7 +7,7 @@ from modelscope.metainfo import Models
 from modelscope.models import TorchModel
 from modelscope.models.base import Tensor
 from modelscope.models.builder import MODELS
-from modelscope.models.nlp.backbones import SpaceGenerator, SpaceModelBase
+from modelscope.models.nlp.space import SpaceGenerator, SpaceModelBase
 from modelscope.preprocessors.space import MultiWOZBPETextField
 from modelscope.utils.config import Config
 from modelscope.utils.constant import ModelFile, Tasks
--- a/modelscope/models/nlp/space/space_for_dialog_state_tracking.py
+++ b/modelscope/models/nlp/space/space_for_dialog_state_tracking.py
@@ -21,7 +21,7 @@ class SpaceForDialogStateTracking(TorchModel):

        super().__init__(model_dir, *args, **kwargs)

-        from sofa.models.space import SpaceConfig, SpaceForDST
+        from modelscope.models.nlp.space.model import SpaceForDST, SpaceConfig
        self.model_dir = model_dir

        self.config = SpaceConfig.from_pretrained(self.model_dir)
--- a/modelscope/models/nlp/structbert/init.py
+++ b/modelscope/models/nlp/structbert/init.py
@@ -0,0 +1,45 @@
+# Copyright 2021-2022 The Alibaba DAMO NLP Team Authors.
+# All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import TYPE_CHECKING
+
+from modelscope.utils.import_utils import LazyImportModule
+
+if TYPE_CHECKING:
+    from .configuration_sbert import SbertConfig
+    from .modeling_sbert import (SbertForMaskedLM, SbertModel,
+                                 SbertPreTrainedModel)
+    from .tokenization_sbert import (BasicTokenizer, SbertTokenizer,
+                                     WordpieceTokenizer)
+    from .tokenization_sbert_fast import SbertTokenizerFast
+else:
+    _import_structure = {
+        'configuration_sbert': ['SbertConfig'],
+        'modeling_sbert':
+        ['SbertForMaskedLM', 'SbertModel', 'SbertPreTrainedModel'],
+        'tokenization_sbert':
+        ['BasicTokenizer', 'SbertTokenizer', 'WordpieceTokenizer'],
+        'tokenization_sbert_fast': ['SbertTokenizerFast'],
+    }
+
+    import sys
+
+    sys.modules[__name__] = LazyImportModule(
+        __name__,
+        globals()['__file__'],
+        _import_structure,
+        module_spec=__spec__,
+        extra_objects={},
+    )
--- a/modelscope/models/nlp/backbones/structbert/adv_utils.py
+++ b/modelscope/models/nlp/backbones/structbert/adv_utils.py
@@ -59,7 +59,8 @@ def compute_adv_loss(embedding,
    """
    Calculate the adv loss of the model.
    :param embedding: Original sentense embedding
-    :param model: The model or the forward function(including decoder/classifier), accept kwargs as input, output logits
+    :param model: The model, or the forward function(including decoder/classifier),
+            accept kwargs as input, output logits
    :param ori_logits: The original logits outputed from the model function
    :param ori_loss: The original loss
    :param adv_grad_factor: This factor will be multipled by the KL loss grad and then the result will be added to
@@ -119,7 +120,8 @@ def compute_adv_loss_pair(embedding,
    """
    Calculate the adv loss of the model. This function is used in the pair logits scenerio.
    :param embedding: Original sentense embedding
-    :param model: The model or the forward function(including decoder/classifier), accept kwargs as input, output logits
+    :param model: The model, or the forward function(including decoder/classifier),
+            accept kwargs as input, output logits
    :param start_logits: The original start logits outputed from the model function
    :param end_logits: The original end logits outputed from the model function
    :param ori_loss: The original loss
--- a/modelscope/models/nlp/backbones/structbert/configuration_sbert.py
+++ b/modelscope/models/nlp/backbones/structbert/configuration_sbert.py
@@ -24,11 +24,12 @@ logger = logging.get_logger(__name__)

 class SbertConfig(PretrainedConfig):
    r"""
-    This is the configuration class to store the configuration of a :class:`~sofa.models.SbertModel`.
+    This is the configuration class to store the configuration
+    of a :class:`~modelscope.models.nlp.structbert.SbertModel`.
    It is used to instantiate a SBERT model according to the specified arguments.

-    Configuration objects inherit from :class:`~sofa.utils.PretrainedConfig` and can be used to control the model
-    outputs. Read the documentation from :class:`~sofa.utils.PretrainedConfig` for more information.
+    Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used to control the model
+    outputs. Read the documentation from :class:`~transformers.PretrainedConfig` for more information.


    Args:
@@ -99,11 +100,13 @@ class SbertConfig(PretrainedConfig):
                 type_vocab_size=2,
                 initializer_range=0.02,
                 layer_norm_eps=1e-12,
+                 pad_token_id=0,
                 position_embedding_type='absolute',
                 use_cache=True,
                 classifier_dropout=None,
                 **kwargs):
-        super().__init__(**kwargs)
+        super().__init__(pad_token_id=pad_token_id, **kwargs)
+
        self.vocab_size = vocab_size
        self.hidden_size = hidden_size
        self.num_hidden_layers = num_hidden_layers
--- a/modelscope/models/nlp/structbert/modeling_sbert.py
+++ b/modelscope/models/nlp/structbert/modeling_sbert.py
--- a/modelscope/models/nlp/structbert/tokenization_sbert.py
+++ b/modelscope/models/nlp/structbert/tokenization_sbert.py
@@ -0,0 +1,516 @@
+# Copyright 2021-2022 The Alibaba DAMO NLP Team Authors.
+# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.
+# All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Tokenization classes for Sbert. mainly copied from :module:`~transformers.tokenization_bert`"""
+
+import collections
+import os
+import unicodedata
+from typing import List, Optional, Tuple
+
+from transformers.tokenization_utils import (PreTrainedTokenizer, _is_control,
+                                             _is_punctuation, _is_whitespace)
+
+from modelscope.utils.logger import get_logger
+
+logger = get_logger(__name__)
+
+VOCAB_FILES_NAMES = {'vocab_file': 'vocab.txt'}
+
+PRETRAINED_VOCAB_FILES_MAP = {'vocab_file': {}}
+
+PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
+    'chinese_sbert-large-std-512': 512,
+    'english_sbert-large-std-512': 512,
+}
+
+PRETRAINED_INIT_CONFIGURATION = {
+    'english_sbert-large-std-512': {
+        'do_lower_case': True
+    },
+}
+
+
+def load_vocab(vocab_file):
+    """Loads a vocabulary file into a dictionary."""
+    vocab = collections.OrderedDict()
+    with open(vocab_file, 'r', encoding='utf-8') as reader:
+        tokens = reader.readlines()
+    for index, token in enumerate(tokens):
+        token = token.rstrip('\n')
+        vocab[token] = index
+    return vocab
+
+
+def whitespace_tokenize(text):
+    """Runs basic whitespace cleaning and splitting on a piece of text."""
+    text = text.strip()
+    if not text:
+        return []
+    tokens = text.split()
+    return tokens
+
+
+class SbertTokenizer(PreTrainedTokenizer):
+    r"""
+    Construct a SBERT tokenizer. Based on WordPiece.
+
+    This tokenizer inherits from :class:`~transformers.PreTrainedTokenizer` which contains most of the main methods.
+    Users should refer to this superclass for more information regarding those methods.
+
+    Args:
+        vocab_file (:obj:`str`):
+            File containing the vocabulary.
+        do_lower_case (:obj:`bool`, `optional`, defaults to :obj:`True`):
+            Whether or not to lowercase the input when tokenizing.
+        do_basic_tokenize (:obj:`bool`, `optional`, defaults to :obj:`True`):
+            Whether or not to do basic tokenization before WordPiece.
+        never_split (:obj:`Iterable`, `optional`):
+            Collection of tokens which will never be split during tokenization. Only has an effect when
+            :obj:`do_basic_tokenize=True`
+        unk_token (:obj:`str`, `optional`, defaults to :obj:`"[UNK]"`):
+            The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
+            token instead.
+        sep_token (:obj:`str`, `optional`, defaults to :obj:`"[SEP]"`):
+            The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for
+            sequence classification or for a text and a question for question answering. It is also used as the last
+            token of a sequence built with special tokens.
+        pad_token (:obj:`str`, `optional`, defaults to :obj:`"[PAD]"`):
+            The token used for padding, for example when batching sequences of different lengths.
+        cls_token (:obj:`str`, `optional`, defaults to :obj:`"[CLS]"`):
+            The classifier token which is used when doing sequence classification (classification of the whole sequence
+            instead of per-token classification). It is the first token of the sequence when built with special tokens.
+        mask_token (:obj:`str`, `optional`, defaults to :obj:`"[MASK]"`):
+            The token used for masking values. This is the token used when training this model with masked language
+            modeling. This is the token which the model will try to predict.
+        tokenize_chinese_chars (:obj:`bool`, `optional`, defaults to :obj:`True`):
+            Whether or not to tokenize Chinese characters.
+
+            This should likely be deactivated for Japanese (see this `issue
+            <https://github.com/huggingface/transformers/issues/328>`__).
+        strip_accents: (:obj:`bool`, `optional`):
+            Whether or not to strip all accents. If this option is not specified, then it will be determined by the
+            value for :obj:`lowercase` (as in the original BERT).
+    """
+
+    vocab_files_names = VOCAB_FILES_NAMES
+    pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
+    pretrained_init_configuration = PRETRAINED_INIT_CONFIGURATION
+    max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
+
+    def __init__(self,
+                 vocab_file,
+                 do_lower_case=True,
+                 do_basic_tokenize=True,
+                 never_split=None,
+                 unk_token='[UNK]',
+                 sep_token='[SEP]',
+                 pad_token='[PAD]',
+                 cls_token='[CLS]',
+                 mask_token='[MASK]',
+                 tokenize_chinese_chars=True,
+                 strip_accents=None,
+                 **kwargs):
+        super().__init__(
+            do_lower_case=do_lower_case,
+            do_basic_tokenize=do_basic_tokenize,
+            never_split=never_split,
+            unk_token=unk_token,
+            sep_token=sep_token,
+            pad_token=pad_token,
+            cls_token=cls_token,
+            mask_token=mask_token,
+            tokenize_chinese_chars=tokenize_chinese_chars,
+            strip_accents=strip_accents,
+            **kwargs,
+        )
+
+        if not os.path.isfile(vocab_file):
+            raise ValueError(
+                f"Can't find a vocabulary file at path '{vocab_file}'. To load the vocabulary from a Google pretrained "
+                'model use `tokenizer = SbertTokenizer.from_pretrained(PRETRAINED_MODEL_NAME)`'
+            )
+        self.vocab = load_vocab(vocab_file)
+        self.ids_to_tokens = collections.OrderedDict([
+            (ids, tok) for tok, ids in self.vocab.items()
+        ])
+        self.do_basic_tokenize = do_basic_tokenize
+        if do_basic_tokenize:
+            self.basic_tokenizer = BasicTokenizer(
+                do_lower_case=do_lower_case,
+                never_split=never_split,
+                tokenize_chinese_chars=tokenize_chinese_chars,
+                strip_accents=strip_accents,
+            )
+        self.wordpiece_tokenizer = WordpieceTokenizer(
+            vocab=self.vocab, unk_token=self.unk_token)
+
+    @property
+    def do_lower_case(self):
+        return self.basic_tokenizer.do_lower_case
+
+    @property
+    def vocab_size(self):
+        return len(self.vocab)
+
+    def get_vocab(self):
+        return dict(self.vocab, **self.added_tokens_encoder)
+
+    def _tokenize(self, text):
+        split_tokens = []
+        if self.do_basic_tokenize:
+            for token in self.basic_tokenizer.tokenize(
+                    text, never_split=self.all_special_tokens):
+
+                # If the token is part of the never_split set
+                if token in self.basic_tokenizer.never_split:
+                    split_tokens.append(token)
+                else:
+                    split_tokens += self.wordpiece_tokenizer.tokenize(token)
+        else:
+            split_tokens = self.wordpiece_tokenizer.tokenize(text)
+        return split_tokens
+
+    def _convert_token_to_id(self, token):
+        """Converts a token (str) in an id using the vocab."""
+        return self.vocab.get(token, self.vocab.get(self.unk_token))
+
+    def _convert_id_to_token(self, index):
+        """Converts an index (integer) in a token (str) using the vocab."""
+        return self.ids_to_tokens.get(index, self.unk_token)
+
+    def convert_tokens_to_string(self, tokens):
+        """Converts a sequence of tokens (string) in a single string."""
+        out_string = ' '.join(tokens).replace(' ##', '').strip()
+        return out_string
+
+    def build_inputs_with_special_tokens(
+            self,
+            token_ids_0: List[int],
+            token_ids_1: Optional[List[int]] = None) -> List[int]:
+        """
+        Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
+        adding special tokens. A SBERT sequence has the following format:
+
+        - single sequence: ``[CLS] X [SEP]``
+        - pair of sequences: ``[CLS] A [SEP] B [SEP]``
+
+        Args:
+            token_ids_0 (:obj:`List[int]`):
+                List of IDs to which the special tokens will be added.
+            token_ids_1 (:obj:`List[int]`, `optional`):
+                Optional second list of IDs for sequence pairs.
+
+        Returns:
+            :obj:`List[int]`: List of `input IDs <../glossary.html#input-ids>`__ with the appropriate special tokens.
+        """
+        if token_ids_1 is None:
+            return [self.cls_token_id] + token_ids_0 + [self.sep_token_id]
+        cls = [self.cls_token_id]
+        sep = [self.sep_token_id]
+        return cls + token_ids_0 + sep + token_ids_1 + sep
+
+    def get_special_tokens_mask(
+            self,
+            token_ids_0: List[int],
+            token_ids_1: Optional[List[int]] = None,
+            already_has_special_tokens: bool = False) -> List[int]:
+        """
+        Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding
+        special tokens using the tokenizer ``prepare_for_model`` method.
+
+        Args:
+            token_ids_0 (:obj:`List[int]`):
+                List of IDs.
+            token_ids_1 (:obj:`List[int]`, `optional`):
+                Optional second list of IDs for sequence pairs.
+            already_has_special_tokens (:obj:`bool`, `optional`, defaults to :obj:`False`):
+                Whether or not the token list is already formatted with special tokens for the model.
+
+        Returns:
+            :obj:`List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
+        """
+
+        if already_has_special_tokens:
+            return super().get_special_tokens_mask(
+                token_ids_0=token_ids_0,
+                token_ids_1=token_ids_1,
+                already_has_special_tokens=True)
+
+        if token_ids_1 is not None:
+            return [1] + ([0] * len(token_ids_0)) + [1] + (
+                [0] * len(token_ids_1)) + [1]
+        return [1] + ([0] * len(token_ids_0)) + [1]
+
+    def create_token_type_ids_from_sequences(
+            self,
+            token_ids_0: List[int],
+            token_ids_1: Optional[List[int]] = None) -> List[int]:
+        """
+        Create a mask from the two sequences passed to be used in a sequence-pair classification task. A SBERT sequence
+        pair mask has the following format:
+
+        ::
+
+            0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
+            | first sequence    | second sequence |
+
+        If :obj:`token_ids_1` is :obj:`None`, this method only returns the first portion of the mask (0s).
+
+        Args:
+            token_ids_0 (:obj:`List[int]`):
+                List of IDs.
+            token_ids_1 (:obj:`List[int]`, `optional`):
+                Optional second list of IDs for sequence pairs.
+
+        Returns:
+            :obj:`List[int]`: List of `token type IDs <../glossary.html#token-type-ids>`_ according to the given
+            sequence(s).
+        """
+        sep = [self.sep_token_id]
+        cls = [self.cls_token_id]
+        if token_ids_1 is None:
+            return len(cls + token_ids_0 + sep) * [0]
+        return len(cls + token_ids_0 + sep) * [0] + len(token_ids_1
+                                                        + sep) * [1]
+
+    def save_vocabulary(self,
+                        save_directory: str,
+                        filename_prefix: Optional[str] = None) -> Tuple[str]:
+        index = 0
+        if os.path.isdir(save_directory):
+            vocab_file = os.path.join(
+                save_directory,
+                (filename_prefix + '-' if filename_prefix else '')
+                + VOCAB_FILES_NAMES['vocab_file'])
+        else:
+            vocab_file = (filename_prefix
+                          + '-' if filename_prefix else '') + save_directory
+        with open(vocab_file, 'w', encoding='utf-8') as writer:
+            for token, token_index in sorted(
+                    self.vocab.items(), key=lambda kv: kv[1]):
+                if index != token_index:
+                    logger.warning(
+                        f'Saving vocabulary to {vocab_file}: vocabulary indices are not consecutive.'
+                        ' Please check that the vocabulary is not corrupted!')
+                    index = token_index
+                writer.write(token + '\n')
+                index += 1
+        return (vocab_file, )
+
+
+class BasicTokenizer(object):
+    """
+    Constructs a BasicTokenizer that will run basic tokenization (punctuation splitting, lower casing, etc.).
+
+    Args:
+        do_lower_case (:obj:`bool`, `optional`, defaults to :obj:`True`):
+            Whether or not to lowercase the input when tokenizing.
+        never_split (:obj:`Iterable`, `optional`):
+            Collection of tokens which will never be split during tokenization. Only has an effect when
+            :obj:`do_basic_tokenize=True`
+        tokenize_chinese_chars (:obj:`bool`, `optional`, defaults to :obj:`True`):
+            Whether or not to tokenize Chinese characters.
+
+            This should likely be deactivated for Japanese (see this `issue
+            <https://github.com/huggingface/transformers/issues/328>`__).
+        strip_accents: (:obj:`bool`, `optional`):
+            Whether or not to strip all accents. If this option is not specified, then it will be determined by the
+            value for :obj:`lowercase` (as in the original BERT).
+    """
+
+    def __init__(self,
+                 do_lower_case=True,
+                 never_split=None,
+                 tokenize_chinese_chars=True,
+                 strip_accents=None):
+        if never_split is None:
+            never_split = []
+        self.do_lower_case = do_lower_case
+        self.never_split = set(never_split)
+        self.tokenize_chinese_chars = tokenize_chinese_chars
+        self.strip_accents = strip_accents
+
+    def tokenize(self, text, never_split=None):
+        """
+        Basic Tokenization of a piece of text. Split on "white spaces" only, for sub-word tokenization, see
+        WordPieceTokenizer.
+
+        Args:
+            **never_split**: (`optional`) list of str
+                Kept for backward compatibility purposes. Now implemented directly at the base class level (see
+                :func:`PreTrainedTokenizer.tokenize`) List of token not to split.
+        """
+        # union() returns a new set by concatenating the two sets.
+        never_split = self.never_split.union(
+            set(never_split)) if never_split else self.never_split
+        text = self._clean_text(text)
+
+        # This was added on November 1st, 2018 for the multilingual and Chinese
+        # models. This is also applied to the English models now, but it doesn't
+        # matter since the English models were not trained on any Chinese data
+        # and generally don't have any Chinese data in them (there are Chinese
+        # characters in the vocabulary because Wikipedia does have some Chinese
+        # words in the English Wikipedia.).
+        if self.tokenize_chinese_chars:
+            text = self._tokenize_chinese_chars(text)
+        orig_tokens = whitespace_tokenize(text)
+        split_tokens = []
+        for token in orig_tokens:
+            if token not in never_split:
+                if self.do_lower_case:
+                    token = token.lower()
+                    if self.strip_accents is not False:
+                        token = self._run_strip_accents(token)
+                elif self.strip_accents:
+                    token = self._run_strip_accents(token)
+            split_tokens.extend(self._run_split_on_punc(token, never_split))
+
+        output_tokens = whitespace_tokenize(' '.join(split_tokens))
+        return output_tokens
+
+    def _run_strip_accents(self, text):
+        """Strips accents from a piece of text."""
+        text = unicodedata.normalize('NFD', text)
+        output = []
+        for char in text:
+            cat = unicodedata.category(char)
+            if cat == 'Mn':
+                continue
+            output.append(char)
+        return ''.join(output)
+
+    def _run_split_on_punc(self, text, never_split=None):
+        """Splits punctuation on a piece of text."""
+        if never_split is not None and text in never_split:
+            return [text]
+        chars = list(text)
+        i = 0
+        start_new_word = True
+        output = []
+        while i < len(chars):
+            char = chars[i]
+            if _is_punctuation(char):
+                output.append([char])
+                start_new_word = True
+            else:
+                if start_new_word:
+                    output.append([])
+                start_new_word = False
+                output[-1].append(char)
+            i += 1
+
+        return [''.join(x) for x in output]
+
+    def _tokenize_chinese_chars(self, text):
+        """Adds whitespace around any CJK character."""
+        output = []
+        for char in text:
+            cp = ord(char)
+            if self._is_chinese_char(cp):
+                output.append(' ')
+                output.append(char)
+                output.append(' ')
+            else:
+                output.append(char)
+        return ''.join(output)
+
+    def _is_chinese_char(self, cp):
+        """Checks whether CP is the codepoint of a CJK character."""
+        # This defines a "chinese character" as anything in the CJK Unicode block:
+        #   https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_(Unicode_block)
+        #
+        # Note that the CJK Unicode block is NOT all Japanese and Korean characters,
+        # despite its name. The modern Korean Hangul alphabet is a different block,
+        # as is Japanese Hiragana and Katakana. Those alphabets are used to write
+        # space-separated words, so they are not treated specially and handled
+        # like the all of the other languages.
+        if ((0x4E00 <= cp <= 0x9FFF) or (0x3400 <= cp <= 0x4DBF)
+                or (0x20000 <= cp <= 0x2A6DF) or (0x2A700 <= cp <= 0x2B73F)
+                or (0x2B740 <= cp <= 0x2B81F) or (0x2B820 <= cp <= 0x2CEAF)
+                or (0xF900 <= cp <= 0xFAFF) or (0x2F800 <= cp <= 0x2FA1F)):
+            return True
+
+        return False
+
+    def _clean_text(self, text):
+        """Performs invalid character removal and whitespace cleanup on text."""
+        output = []
+        for char in text:
+            cp = ord(char)
+            if cp == 0 or cp == 0xFFFD or _is_control(char):
+                continue
+            if _is_whitespace(char):
+                output.append(' ')
+            else:
+                output.append(char)
+        return ''.join(output)
+
+
+class WordpieceTokenizer(object):
+    """Runs WordPiece tokenization."""
+
+    def __init__(self, vocab, unk_token, max_input_chars_per_word=100):
+        self.vocab = vocab
+        self.unk_token = unk_token
+        self.max_input_chars_per_word = max_input_chars_per_word
+
+    def tokenize(self, text):
+        """
+        Tokenizes a piece of text into its word pieces. This uses a greedy longest-match-first algorithm to perform
+        tokenization using the given vocabulary.
+
+        For example, :obj:`input = "unaffable"` wil return as output :obj:`["un", "##aff", "##able"]`.
+
+        Args:
+          text: A single token or whitespace separated tokens. This should have
+            already been passed through `BasicTokenizer`.
+
+        Returns:
+          A list of wordpiece tokens.
+        """
+
+        output_tokens = []
+        for token in whitespace_tokenize(text):
+            chars = list(token)
+            if len(chars) > self.max_input_chars_per_word:
+                output_tokens.append(self.unk_token)
+                continue
+
+            is_bad = False
+            start = 0
+            sub_tokens = []
+            while start < len(chars):
+                end = len(chars)
+                cur_substr = None
+                while start < end:
+                    substr = ''.join(chars[start:end])
+                    if start > 0:
+                        substr = '##' + substr
+                    if substr in self.vocab:
+                        cur_substr = substr
+                        break
+                    end -= 1
+                if cur_substr is None:
+                    is_bad = True
+                    break
+                sub_tokens.append(cur_substr)
+                start = end
+
+            if is_bad:
+                output_tokens.append(self.unk_token)
+            else:
+                output_tokens.extend(sub_tokens)
+        return output_tokens
--- a/modelscope/models/nlp/structbert/tokenization_sbert_fast.py
+++ b/modelscope/models/nlp/structbert/tokenization_sbert_fast.py
@@ -0,0 +1,200 @@
+# Copyright 2021-2022 The Alibaba DAMO NLP Team Authors.
+# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.
+# All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Fast Tokenization classes for Sbert. mainly copied from :module:`~transformers.tokenization_bert_fast`"""
+
+from typing import List, Optional, Tuple
+
+import json
+import transformers
+from tokenizers import normalizers
+from transformers.tokenization_utils_fast import PreTrainedTokenizerFast
+
+from modelscope.utils.logger import get_logger
+from .tokenization_sbert import SbertTokenizer
+
+logger = get_logger(__name__)
+
+VOCAB_FILES_NAMES = {
+    'vocab_file': 'vocab.txt',
+    'tokenizer_file': 'tokenizer.json'
+}
+
+PRETRAINED_VOCAB_FILES_MAP = {
+    'vocab_file': {},
+    'tokenizer_file': {},
+}
+
+PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
+    'chinese_sbert-large-std-512': 512,
+    'english_sbert-large-std-512': 512,
+}
+
+PRETRAINED_INIT_CONFIGURATION = {
+    'english_sbert-large-std-512': {
+        'do_lower_case': True
+    },
+}
+
+transformers.SLOW_TO_FAST_CONVERTERS[
+    'SbertTokenizer'] = transformers.SLOW_TO_FAST_CONVERTERS['BertTokenizer']
+
+
+class SbertTokenizerFast(PreTrainedTokenizerFast):
+    r"""
+    Construct a "fast" SBERT tokenizer (backed by HuggingFace's `tokenizers` library). Based on WordPiece.
+
+    This tokenizer inherits from :class:`~transformers.PreTrainedTokenizerFast` which contains most of the main
+    methods. Users should refer to this superclass for more information regarding those methods.
+
+    Args:
+        vocab_file (:obj:`str`):
+            File containing the vocabulary.
+        do_lower_case (:obj:`bool`, `optional`, defaults to :obj:`True`):
+            Whether or not to lowercase the input when tokenizing.
+        unk_token (:obj:`str`, `optional`, defaults to :obj:`"[UNK]"`):
+            The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
+            token instead.
+        sep_token (:obj:`str`, `optional`, defaults to :obj:`"[SEP]"`):
+            The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for
+            sequence classification or for a text and a question for question answering. It is also used as the last
+            token of a sequence built with special tokens.
+        pad_token (:obj:`str`, `optional`, defaults to :obj:`"[PAD]"`):
+            The token used for padding, for example when batching sequences of different lengths.
+        cls_token (:obj:`str`, `optional`, defaults to :obj:`"[CLS]"`):
+            The classifier token which is used when doing sequence classification (classification of the whole sequence
+            instead of per-token classification). It is the first token of the sequence when built with special tokens.
+        mask_token (:obj:`str`, `optional`, defaults to :obj:`"[MASK]"`):
+            The token used for masking values. This is the token used when training this model with masked language
+            modeling. This is the token which the model will try to predict.
+        clean_text (:obj:`bool`, `optional`, defaults to :obj:`True`):
+            Whether or not to clean the text before tokenization by removing any control characters and replacing all
+            whitespaces by the classic one.
+        tokenize_chinese_chars (:obj:`bool`, `optional`, defaults to :obj:`True`):
+            Whether or not to tokenize Chinese characters. This should likely be deactivated for Japanese (see `this
+            issue <https://github.com/huggingface/transformers/issues/328>`__).
+        strip_accents: (:obj:`bool`, `optional`):
+            Whether or not to strip all accents. If this option is not specified, then it will be determined by the
+            value for :obj:`lowercase` (as in the original BERT).
+        wordpieces_prefix: (:obj:`str`, `optional`, defaults to :obj:`"##"`):
+            The prefix for subwords.
+    """
+
+    vocab_files_names = VOCAB_FILES_NAMES
+    pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
+    pretrained_init_configuration = PRETRAINED_INIT_CONFIGURATION
+    max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
+    slow_tokenizer_class = SbertTokenizer
+
+    def __init__(self,
+                 vocab_file=None,
+                 tokenizer_file=None,
+                 do_lower_case=True,
+                 unk_token='[UNK]',
+                 sep_token='[SEP]',
+                 pad_token='[PAD]',
+                 cls_token='[CLS]',
+                 mask_token='[MASK]',
+                 tokenize_chinese_chars=True,
+                 strip_accents=None,
+                 **kwargs):
+        super().__init__(
+            vocab_file,
+            tokenizer_file=tokenizer_file,
+            do_lower_case=do_lower_case,
+            unk_token=unk_token,
+            sep_token=sep_token,
+            pad_token=pad_token,
+            cls_token=cls_token,
+            mask_token=mask_token,
+            tokenize_chinese_chars=tokenize_chinese_chars,
+            strip_accents=strip_accents,
+            **kwargs,
+        )
+
+        pre_tok_state = json.loads(
+            self.backend_tokenizer.normalizer.__getstate__())
+        if (pre_tok_state.get('lowercase', do_lower_case) != do_lower_case
+                or pre_tok_state.get('strip_accents',
+                                     strip_accents) != strip_accents):
+            pre_tok_class = getattr(normalizers, pre_tok_state.pop('type'))
+            pre_tok_state['lowercase'] = do_lower_case
+            pre_tok_state['strip_accents'] = strip_accents
+            self.backend_tokenizer.normalizer = pre_tok_class(**pre_tok_state)
+
+        self.do_lower_case = do_lower_case
+
+    def build_inputs_with_special_tokens(self, token_ids_0, token_ids_1=None):
+        """
+        Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
+        adding special tokens. A SBERT sequence has the following format:
+
+        - single sequence: ``[CLS] X [SEP]``
+        - pair of sequences: ``[CLS] A [SEP] B [SEP]``
+
+        Args:
+            token_ids_0 (:obj:`List[int]`):
+                List of IDs to which the special tokens will be added.
+            token_ids_1 (:obj:`List[int]`, `optional`):
+                Optional second list of IDs for sequence pairs.
+
+        Returns:
+            :obj:`List[int]`: List of `input IDs <../glossary.html#input-ids>`__ with the appropriate special tokens.
+        """
+        output = [self.cls_token_id] + token_ids_0 + [self.sep_token_id]
+
+        if token_ids_1:
+            output += token_ids_1 + [self.sep_token_id]
+
+        return output
+
+    def create_token_type_ids_from_sequences(
+            self,
+            token_ids_0: List[int],
+            token_ids_1: Optional[List[int]] = None) -> List[int]:
+        """
+        Create a mask from the two sequences passed to be used in a sequence-pair classification task. A SBERT sequence
+        pair mask has the following format:
+
+        ::
+
+            0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
+            | first sequence    | second sequence |
+
+        If :obj:`token_ids_1` is :obj:`None`, this method only returns the first portion of the mask (0s).
+
+        Args:
+            token_ids_0 (:obj:`List[int]`):
+                List of IDs.
+            token_ids_1 (:obj:`List[int]`, `optional`):
+                Optional second list of IDs for sequence pairs.
+
+        Returns:
+            :obj:`List[int]`: List of `token type IDs <../glossary.html#token-type-ids>`_ according to the given
+            sequence(s).
+        """
+        sep = [self.sep_token_id]
+        cls = [self.cls_token_id]
+        if token_ids_1 is None:
+            return len(cls + token_ids_0 + sep) * [0]
+        return len(cls + token_ids_0 + sep) * [0] + len(token_ids_1
+                                                        + sep) * [1]
+
+    def save_vocabulary(self,
+                        save_directory: str,
+                        filename_prefix: Optional[str] = None) -> Tuple[str]:
+        files = self._tokenizer.model.save(
+            save_directory, name=filename_prefix)
+        return tuple(files)
--- a/modelscope/models/nlp/task_models/init.py
+++ b/modelscope/models/nlp/task_models/init.py
--- a/modelscope/models/nlp/task_models/sequence_classification.py
+++ b/modelscope/models/nlp/task_models/sequence_classification.py
@@ -0,0 +1,86 @@
+import os
+from typing import Any, Dict
+
+import json
+import numpy as np
+
+from modelscope.metainfo import TaskModels
+from modelscope.models.builder import MODELS
+from modelscope.models.nlp.task_models.task_model import \
+    SingleBackboneTaskModelBase
+from modelscope.outputs import OutputKeys
+from modelscope.utils.constant import Tasks
+
+__all__ = ['SequenceClassificationModel']
+
+
+@MODELS.register_module(
+    Tasks.sentiment_classification, module_name=TaskModels.text_classification)
+@MODELS.register_module(
+    Tasks.text_classification, module_name=TaskModels.text_classification)
+class SequenceClassificationModel(SingleBackboneTaskModelBase):
+
+    def __init__(self, model_dir: str, *args, **kwargs):
+        """initialize the sequence classification model from the `model_dir` path.
+
+        Args:
+            model_dir (str): the model path.
+        """
+        super().__init__(model_dir, *args, **kwargs)
+        if 'base_model_prefix' in kwargs:
+            self._base_model_prefix = kwargs['base_model_prefix']
+
+        backbone_cfg = self.cfg.backbone
+        head_cfg = self.cfg.head
+
+        # get the num_labels from label_mapping.json
+        self.id2label = {}
+        self.label_path = os.path.join(model_dir, 'label_mapping.json')
+        if os.path.exists(self.label_path):
+            with open(self.label_path) as f:
+                self.label_mapping = json.load(f)
+            self.id2label = {
+                idx: name
+                for name, idx in self.label_mapping.items()
+            }
+        head_cfg['num_labels'] = len(self.label_mapping)
+
+        self.build_backbone(backbone_cfg)
+        self.build_head(head_cfg)
+
+    def forward(self, input: Dict[str, Any]) -> Dict[str, np.ndarray]:
+        outputs = super().forward(input)
+        sequence_output, pooled_output = self.extract_backbone_outputs(outputs)
+        outputs = self.head.forward(pooled_output)
+        if 'labels' in input:
+            loss = self.compute_loss(outputs, input['labels'])
+            outputs.update(loss)
+        return outputs
+
+    def extract_logits(self, outputs):
+        return outputs[OutputKeys.LOGITS].cpu().detach()
+
+    def extract_backbone_outputs(self, outputs):
+        sequence_output = None
+        pooled_output = None
+        if hasattr(self.backbone, 'extract_sequence_outputs'):
+            sequence_output = self.backbone.extract_sequence_outputs(outputs)
+        if hasattr(self.backbone, 'extract_pooled_outputs'):
+            pooled_output = self.backbone.extract_pooled_outputs(outputs)
+        return sequence_output, pooled_output
+
+    def compute_loss(self, outputs, labels):
+        loss = self.head.compute_loss(outputs, labels)
+        return loss
+
+    def postprocess(self, input, **kwargs):
+        logits = self.extract_logits(input)
+        probs = logits.softmax(-1).numpy()
+        pred = logits.argmax(-1).numpy()
+        logits = logits.numpy()
+        res = {
+            OutputKeys.PREDICTIONS: pred,
+            OutputKeys.PROBABILITIES: probs,
+            OutputKeys.LOGITS: logits
+        }
+        return res
--- a/modelscope/models/nlp/task_models/task_model.py
+++ b/modelscope/models/nlp/task_models/task_model.py
@@ -11,8 +11,8 @@ from modelscope.models.base import TorchModel
 from modelscope.models.builder import build_backbone, build_head
 from modelscope.utils.config import ConfigDict
 from modelscope.utils.constant import Fields, Tasks
+from modelscope.utils.file_utils import func_receive_dict_inputs
 from modelscope.utils.logger import get_logger
-from modelscope.utils.utils import if_func_receive_dict_inputs

 logger = get_logger(__name__)

@@ -424,12 +424,15 @@ class SingleBackboneTaskModelBase(BaseTaskModel):

    def forward(self, input: Dict[str, Any]) -> Dict[str, Any]:
        """default forward method is the backbone-only forward"""
-        if if_func_receive_dict_inputs(self.backbone.forward):
+        if func_receive_dict_inputs(self.backbone.forward):
            outputs = self.backbone.forward(input)
        else:
            outputs = self.backbone.forward(**input)
        return outputs

+    def compute_loss(self, outputs: Dict[str, Any], labels):
+        raise NotImplementedError()
+

 class EncoderDecoderTaskModelBase(BaseTaskModel):
    """
@@ -472,13 +475,13 @@ class EncoderDecoderTaskModelBase(BaseTaskModel):
        return getattr(self, self._decoder_prefix)

    def forward(self, input: Dict[str, Any]) -> Dict[str, Any]:
-        if if_func_receive_dict_inputs(self.encoder_.forward):
+        if func_receive_dict_inputs(self.encoder_.forward):
            encoder_outputs = self.encoder_.forward(input)
        else:
            encoder_outputs = self.encoder_.forward(**input)
        decoder_inputs = self.project_decoder_inputs_and_mediate(
            input, encoder_outputs)
-        if if_func_receive_dict_inputs(self.decoder_.forward):
+        if func_receive_dict_inputs(self.decoder_.forward):
            outputs = self.decoder_.forward(decoder_inputs)
        else:
            outputs = self.decoder_.forward(**decoder_inputs)
--- a/modelscope/models/nlp/token_classification.py
+++ b/modelscope/models/nlp/token_classification.py
@@ -0,0 +1,147 @@
+from abc import abstractmethod
+from typing import Dict
+
+import numpy as np
+import torch
+from torch import nn
+
+from modelscope.metainfo import Models
+from modelscope.models.base import TorchModel
+from modelscope.models.builder import MODELS
+from modelscope.outputs import OutputKeys
+from modelscope.utils.constant import Tasks
+from modelscope.utils.hub import parse_label_mapping
+from modelscope.utils.tensor_utils import (torch_nested_detach,
+                                           torch_nested_numpify)
+from .structbert import SbertPreTrainedModel
+
+__all__ = ['SbertForTokenClassification']
+
+
+class TokenClassification(TorchModel):
+
+    base_model_prefix: str = 'bert'
+
+    def __init__(self, config, model_dir):
+        super().__init__(model_dir)
+        self.num_labels = config.num_labels
+        self.config = config
+        setattr(self, self.base_model_prefix, self.build_base_model())
+        classifier_dropout = (
+            config.classifier_dropout if config.classifier_dropout is not None
+            else config.hidden_dropout_prob)
+        self.dropout = nn.Dropout(classifier_dropout)
+        self.classifier = nn.Linear(config.hidden_size, config.num_labels)
+
+    @abstractmethod
+    def build_base_model(self):
+        """Build the backbone model.
+
+        Returns: the backbone instance.
+        """
+        pass
+
+    @property
+    def base_model(self):
+        return getattr(self, self.base_model_prefix)
+
+    def compute_loss(self, logits, labels, **kwargs):
+        """Compute loss.
+
+        For example, if backbone is pretrained model, there will be a 'attention_mask' parameter to skip
+        useless tokens.
+
+        Args:
+            logits: The logits from the classifier
+            labels: The labels
+            **kwargs: Other input params.
+
+        Returns: Loss.
+
+        """
+        pass
+
+    def forward(self, **kwargs):
+        labels = None
+        if OutputKeys.LABEL in kwargs:
+            labels = kwargs.pop(OutputKeys.LABEL)
+        elif OutputKeys.LABELS in kwargs:
+            labels = kwargs.pop(OutputKeys.LABELS)
+
+        outputs = self.base_model(**kwargs)
+        # base model should return the sequence_output as its first output
+        sequence_output = outputs[0]
+        sequence_output = self.dropout(sequence_output)
+        logits = self.classifier(sequence_output)
+        if labels is not None:
+            loss = self.compute_loss(logits, labels, **kwargs)
+            return {OutputKeys.LOGITS: logits, OutputKeys.LOSS: loss}
+        return {OutputKeys.LOGITS: logits}
+
+    def postprocess(self, input: Dict[str, np.ndarray],
+                    **kwargs) -> Dict[str, np.ndarray]:
+        logits = input[OutputKeys.LOGITS]
+        pred = torch.argmax(logits[0], dim=-1)
+        pred = torch_nested_numpify(torch_nested_detach(pred))
+        logits = torch_nested_numpify(torch_nested_detach(logits))
+        rst = {OutputKeys.PREDICTIONS: pred, OutputKeys.LOGITS: logits}
+        return rst
+
+
+@MODELS.register_module(Tasks.word_segmentation, module_name=Models.structbert)
+@MODELS.register_module(
+    Tasks.token_classification, module_name=Models.structbert)
+class SbertForTokenClassification(TokenClassification, SbertPreTrainedModel):
+
+    supports_gradient_checkpointing = True
+    _keys_to_ignore_on_load_unexpected = [r'pooler']
+
+    def __init__(self, config, model_dir):
+        if hasattr(config, 'base_model_prefix'):
+            SbertForTokenClassification.base_model_prefix = config.base_model_prefix
+        super().__init__(config, model_dir)
+
+    def build_base_model(self):
+        from .structbert import SbertModel
+        return SbertModel(self.config, add_pooling_layer=False)
+
+    def forward(self,
+                input_ids=None,
+                attention_mask=None,
+                token_type_ids=None,
+                labels=None,
+                **kwargs):
+        return super().forward(
+            input_ids=input_ids,
+            attention_mask=attention_mask,
+            token_type_ids=token_type_ids,
+            labels=labels)
+
+    def compute_loss(self, logits, labels, attention_mask=None, **kwargs):
+        loss_fct = nn.CrossEntropyLoss()
+        # Only keep active parts of the loss
+        if attention_mask is not None:
+            active_loss = attention_mask.view(-1) == 1
+            active_logits = logits.view(-1, self.num_labels)
+            active_labels = torch.where(
+                active_loss, labels.view(-1),
+                torch.tensor(loss_fct.ignore_index).type_as(labels))
+            return loss_fct(active_logits, active_labels)
+        else:
+            return loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
+
+    @classmethod
+    def _instantiate(cls, **kwargs):
+        model_dir = kwargs.get('model_dir')
+        num_labels = kwargs.get('num_labels')
+        if num_labels is None:
+            label2id = parse_label_mapping(model_dir)
+            if label2id is not None and len(label2id) > 0:
+                num_labels = len(label2id)
+
+        model_args = {} if num_labels is None else {'num_labels': num_labels}
+        return super(SbertPreTrainedModel,
+                     SbertForTokenClassification).from_pretrained(
+                         pretrained_model_name_or_path=kwargs.get('model_dir'),
+                         model_dir=kwargs.get('model_dir'),
+                         **model_args)
--- a/modelscope/models/nlp/veco/init.py
+++ b/modelscope/models/nlp/veco/init.py
@@ -0,0 +1,43 @@
+# Copyright 2021-2022 The Alibaba DAMO NLP Team Authors.
+# All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import TYPE_CHECKING
+
+from modelscope.utils.import_utils import LazyImportModule
+
+if TYPE_CHECKING:
+    from .configuration_veco import VecoConfig
+    from .modeling_veco import (VecoForMaskedLM, VecoForSequenceClassification,
+                                VecoModel)
+    from .tokenization_veco import VecoTokenizer
+    from .tokenization_veco_fast import VecoTokenizerFast
+else:
+    _import_structure = {
+        'configuration_veco': ['VecoConfig'],
+        'modeling_veco':
+        ['VecoForMaskedLM', 'VecoForSequenceClassification', 'VecoModel'],
+        'tokenization_veco': ['VecoTokenizer'],
+        'tokenization_veco_fast': ['VecoTokenizerFast'],
+    }
+
+    import sys
+
+    sys.modules[__name__] = LazyImportModule(
+        __name__,
+        globals()['__file__'],
+        _import_structure,
+        module_spec=__spec__,
+        extra_objects={},
+    )
--- a/modelscope/models/nlp/veco/configuration_veco.py
+++ b/modelscope/models/nlp/veco/configuration_veco.py
@@ -0,0 +1,33 @@
+# Copyright 2021-2022 The Alibaba DAMO NLP Team Authors.
+# Copyright 2018 The Google AI Language Team Authors.
+# Copyright 2020 The HuggingFace Inc. team.
+# Copyright (c) 2018, NVIDIA CORPORATION.
+# All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Veco configuration, mainly copied from :class:`~transformers.configuration_xlm_roberta` """
+
+from transformers import RobertaConfig
+
+from modelscope.utils import logger as logging
+
+logger = logging.get_logger(__name__)
+
+
+class VecoConfig(RobertaConfig):
+    """
+    This class overrides [`RobertaConfig`]. Please check the superclass for the appropriate
+    documentation alongside usage examples.
+    """
+
+    model_type = 'veco'
--- a/modelscope/models/nlp/veco/modeling_veco.py
+++ b/modelscope/models/nlp/veco/modeling_veco.py
@@ -0,0 +1,143 @@
+# Copyright 2019 Facebook AI Research and the HuggingFace Inc. team.
+# Copyright (c) 2018, NVIDIA CORPORATION.
+# Copyright 2021-2022 The Alibaba DAMO NLP Team Authors.
+# All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""PyTorch Veco model. mainly copied from :module:`~transformers.modeling_xlm_roberta`"""
+
+from transformers import (RobertaForMaskedLM, RobertaForMultipleChoice,
+                          RobertaForQuestionAnswering,
+                          RobertaForSequenceClassification,
+                          RobertaForTokenClassification, RobertaModel)
+from transformers.file_utils import add_start_docstrings
+
+from modelscope.metainfo import Models
+from modelscope.models.builder import BACKBONES
+from modelscope.utils import logger as logging
+from modelscope.utils.constant import Fields
+from .configuration_veco import VecoConfig
+
+logger = logging.get_logger(__name__)
+
+VECO_PRETRAINED_MODEL_ARCHIVE_LIST = []
+
+VECO_START_DOCSTRING = r"""
+
+    This model inherits from [`PreTrainedModel`]. Check the superclass documentation for the generic
+    methods the library implements for all its model (such as downloading or saving, resizing the input embeddings,
+    pruning heads etc.)
+
+    This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module)
+    subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to
+    general usage and behavior.
+
+    Parameters:
+        config ([`VecoConfig`]): Model configuration class with all the parameters of the
+            model. Initializing with a config file does not load the weights associated with the model, only the
+            configuration. Check out the [`~PreTrainedModel.from_pretrained`] method to load the model
+            weights.
+"""
+
+
+@add_start_docstrings(
+    'The bare Veco Model transformer outputting raw hidden-states without any specific head on top.',
+    VECO_START_DOCSTRING,
+)
+class VecoModel(RobertaModel):
+    """
+    This class overrides [`RobertaModel`]. Please check the superclass for the appropriate
+    documentation alongside usage examples.
+    """
+
+    config_class = VecoConfig
+
+
+@add_start_docstrings(
+    """
+    Veco Model transformer with a sequence classification/regression head on top (a linear layer on top of the
+    pooled output) e.g. for GLUE tasks.
+    """,
+    VECO_START_DOCSTRING,
+)
+class VecoForSequenceClassification(RobertaForSequenceClassification):
+    """
+    This class overrides [`RobertaForSequenceClassification`]. Please check the superclass for the
+    appropriate documentation alongside usage examples.
+    """
+
+    config_class = VecoConfig
+
+
+@add_start_docstrings(
+    """
+    Veco Model transformer with a masked language model head on top (a linear layer on top of the
+    pooled output).
+    """,
+    VECO_START_DOCSTRING,
+)
+class VecoForMaskedLM(RobertaForMaskedLM):
+    """
+    This class overrides [`RobertaForMaskedLM`]. Please check the superclass for the
+    appropriate documentation alongside usage examples.
+    """
+
+    config_class = VecoConfig
+
+
+@add_start_docstrings(
+    """
+    Veco Model with a multiple choice classification head on top (a linear layer on top of the pooled output and
+    a softmax) e.g. for RocStories/SWAG tasks.
+    """,
+    VECO_START_DOCSTRING,
+)
+class VecoForMultipleChoice(RobertaForMultipleChoice):
+    """
+    This class overrides [`RobertaForMultipleChoice`]. Please check the superclass for the
+    appropriate documentation alongside usage examples.
+    """
+
+    config_class = VecoConfig
+
+
+@add_start_docstrings(
+    """
+    Veco Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g.
+    for Named-Entity-Recognition (NER) tasks.
+    """,
+    VECO_START_DOCSTRING,
+)
+class VecoForTokenClassification(RobertaForTokenClassification):
+    """
+    This class overrides [`RobertaForTokenClassification`]. Please check the superclass for the
+    appropriate documentation alongside usage examples.
+    """
+
+    config_class = VecoConfig
+
+
+@add_start_docstrings(
+    """
+    Veco Model with a span classification head on top for extractive question-answering tasks like SQuAD (a
+    linear layers on top of the hidden-states output to compute `span start logits` and `span end logits`).
+    """,
+    VECO_START_DOCSTRING,
+)
+class VecoForQuestionAnswering(RobertaForQuestionAnswering):
+    """
+    This class overrides [`RobertaForQuestionAnswering`]. Please check the superclass for the
+    appropriate documentation alongside usage examples.
+    """
+
+    config_class = VecoConfig
--- a/modelscope/models/nlp/veco/tokenization_veco.py
+++ b/modelscope/models/nlp/veco/tokenization_veco.py
@@ -0,0 +1,321 @@
+# Copyright 2018 Google AI, Google Brain and Carnegie Mellon University Authors and the HuggingFace Inc. team.
+# Copyright 2021-2022 The Alibaba DAMO NLP Team Authors.
+# All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License
+"""Tokenization classes for Veco. mainly copied from :module:`~transformers.tokenization_xlm_roberta`"""
+
+import os
+from shutil import copyfile
+from typing import Any, Dict, List, Optional, Tuple
+
+import sentencepiece as spm
+from transformers.tokenization_utils import AddedToken, PreTrainedTokenizer
+
+from modelscope.utils import logger as logging
+
+logger = logging.get_logger(__name__)
+
+SPIECE_UNDERLINE = '▁'
+
+VOCAB_FILES_NAMES = {'vocab_file': 'sentencepiece.bpe.model'}
+
+PRETRAINED_VOCAB_FILES_MAP = {'vocab_file': {}}
+
+PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {}
+
+
+class VecoTokenizer(PreTrainedTokenizer):
+    """
+    Adapted from [`RobertaTokenizer`] and [`XLNetTokenizer`]. Based on
+    [SentencePiece](https://github.com/google/sentencepiece).
+
+    This tokenizer inherits from [`PreTrainedTokenizer`] which contains most of the main methods.
+    Users should refer to this superclass for more information regarding those methods.
+
+    Args:
+        vocab_file (`str`):
+            Path to the vocabulary file.
+        bos_token (`str`, *optional*, defaults to `"<s>"`):
+            The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token.
+
+            <Tip>
+
+            When building a sequence using special tokens, this is not the token that is used for the beginning of
+            sequence. The token used is the `cls_token`.
+
+            </Tip>
+
+        eos_token (`str`, *optional*, defaults to `"</s>"`):
+            The end of sequence token.
+
+            <Tip>
+
+            When building a sequence using special tokens, this is not the token that is used for the end of
+            sequence. The token used is the `sep_token`.
+
+            </Tip>
+
+        sep_token (`str`, *optional*, defaults to `"</s>"`):
+            The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for
+            sequence classification or for a text and a question for question answering. It is also used as the last
+            token of a sequence built with special tokens.
+        cls_token (`str`, *optional*, defaults to `"<s>"`):
+            The classifier token which is used when doing sequence classification (classification of the whole sequence
+            instead of per-token classification). It is the first token of the sequence when built with special tokens.
+        unk_token (`str`, *optional*, defaults to `"<unk>"`):
+            The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
+            token instead.
+        pad_token (`str`, *optional*, defaults to `"<pad>"`):
+            The token used for padding, for example when batching sequences of different lengths.
+        mask_token (`str`, *optional*, defaults to `"<mask>"`):
+            The token used for masking values. This is the token used when training this model with masked language
+            modeling. This is the token which the model will try to predict.
+        additional_special_tokens (`List[str]`, *optional*, defaults to `["<s>NOTUSED", "</s>NOTUSED"]`):
+            Additional special tokens used by the tokenizer.
+        sp_model_kwargs (`dict`, *optional*):
+            Will be passed to the `SentencePieceProcessor.__init__()` method.
+            The [Python wrapper for SentencePiece](https://github.com/google/sentencepiece/tree/master/python)
+            can be used, among other things, to set:
+
+            - `enable_sampling`: Enable subword regularization.
+            - `nbest_size`: Sampling parameters for unigram. Invalid for BPE-Dropout.
+
+              - `nbest_size = {0,1}`: No sampling is performed.
+              - `nbest_size > 1`: samples from the nbest_size results.
+              - `nbest_size < 0`: assuming that nbest_size is infinite and samples from the all hypothesis (lattice)
+                using forward-filtering-and-backward-sampling algorithm.
+
+            - `alpha`: Smoothing parameter for unigram sampling, and dropout probability of merge operations for
+              BPE-dropout.
+
+    Attributes:
+        sp_model (`SentencePieceProcessor`):
+            The *SentencePiece* processor that is used for every conversion (string, tokens and IDs).
+    """
+
+    vocab_files_names = VOCAB_FILES_NAMES
+    pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
+    max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
+    model_input_names = ['input_ids', 'attention_mask']
+
+    def __init__(self,
+                 vocab_file,
+                 bos_token='<s>',
+                 eos_token='</s>',
+                 sep_token='</s>',
+                 cls_token='<s>',
+                 unk_token='<unk>',
+                 pad_token='<pad>',
+                 mask_token='<mask>',
+                 sp_model_kwargs: Optional[Dict[str, Any]] = None,
+                 **kwargs) -> None:
+        # Mask token behave like a normal word, i.e. include the space before it
+        mask_token = AddedToken(
+            mask_token, lstrip=True, rstrip=False) if isinstance(
+                mask_token, str) else mask_token
+
+        self.sp_model_kwargs = {} if sp_model_kwargs is None else sp_model_kwargs
+
+        super().__init__(
+            bos_token=bos_token,
+            eos_token=eos_token,
+            unk_token=unk_token,
+            sep_token=sep_token,
+            cls_token=cls_token,
+            pad_token=pad_token,
+            mask_token=mask_token,
+            sp_model_kwargs=self.sp_model_kwargs,
+            **kwargs,
+        )
+
+        self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwargs)
+        self.sp_model.Load(str(vocab_file))
+        self.vocab_file = vocab_file
+
+        # Original fairseq vocab and spm vocab must be "aligned":
+        # Vocab    |    0    |    1    |   2    |    3    |  4  |  5  |  6  |   7   |   8   |  9
+        # -------- | ------- | ------- | ------ | ------- | --- | --- | --- | ----- | ----- | ----
+        # fairseq  | '<s>'   | '<pad>' | '</s>' | '<unk>' | ',' | '.' | '▁' | 's'   | '▁de' | '-'
+        # spm      | '<unk>' | '<s>'   | '</s>' | ','     | '.' | '▁' | 's' | '▁de' | '-'   | '▁a'
+
+        # Mimic fairseq token-to-id alignment for the first 4 token
+        self.fairseq_tokens_to_ids = {
+            '<s>': 0,
+            '<pad>': 1,
+            '</s>': 2,
+            '<unk>': 3
+        }
+
+        # The first "real" token "," has position 4 in the original fairseq vocab and position 3 in the spm vocab
+        self.fairseq_offset = 1
+
+        self.fairseq_tokens_to_ids['<mask>'] = len(
+            self.sp_model) + self.fairseq_offset
+        self.fairseq_ids_to_tokens = {
+            v: k
+            for k, v in self.fairseq_tokens_to_ids.items()
+        }
+
+    def __getstate__(self):
+        state = self.__dict__.copy()
+        state['sp_model'] = None
+        state['sp_model_proto'] = self.sp_model.serialized_model_proto()
+        return state
+
+    def __setstate__(self, d):
+        self.__dict__ = d
+
+        # for backward compatibility
+        if not hasattr(self, 'sp_model_kwargs'):
+            self.sp_model_kwargs = {}
+
+        self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwargs)
+        self.sp_model.LoadFromSerializedProto(self.sp_model_proto)
+
+    def build_inputs_with_special_tokens(
+            self,
+            token_ids_0: List[int],
+            token_ids_1: Optional[List[int]] = None) -> List[int]:
+        """
+        Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
+        adding special tokens. An Veco sequence has the following format:
+
+        - single sequence: `<s> X </s>`
+        - pair of sequences: `<s> A </s></s> B </s>`
+
+        Args:
+            token_ids_0 (`List[int]`):
+                List of IDs to which the special tokens will be added.
+            token_ids_1 (`List[int]`, *optional*):
+                Optional second list of IDs for sequence pairs.
+
+        Returns:
+            `List[int]`: List of [input IDs](../glossary#input-ids) with the appropriate special tokens.
+        """
+
+        if token_ids_1 is None:
+            return [self.cls_token_id] + token_ids_0 + [self.sep_token_id]
+        cls = [self.cls_token_id]
+        sep = [self.sep_token_id]
+        return cls + token_ids_0 + sep + sep + token_ids_1 + sep
+
+    def get_special_tokens_mask(
+            self,
+            token_ids_0: List[int],
+            token_ids_1: Optional[List[int]] = None,
+            already_has_special_tokens: bool = False) -> List[int]:
+        """
+        Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding
+        special tokens using the tokenizer `prepare_for_model` method.
+
+        Args:
+            token_ids_0 (`List[int]`):
+                List of IDs.
+            token_ids_1 (`List[int]`, *optional*):
+                Optional second list of IDs for sequence pairs.
+            already_has_special_tokens (`bool`, *optional*, defaults to `False`):
+                Whether or not the token list is already formatted with special tokens for the model.
+
+        Returns:
+            `List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
+        """
+
+        if already_has_special_tokens:
+            return super().get_special_tokens_mask(
+                token_ids_0=token_ids_0,
+                token_ids_1=token_ids_1,
+                already_has_special_tokens=True)
+
+        if token_ids_1 is None:
+            return [1] + ([0] * len(token_ids_0)) + [1]
+        return [1] + ([0] * len(token_ids_0)) + [1, 1] + (
+            [0] * len(token_ids_1)) + [1]
+
+    def create_token_type_ids_from_sequences(
+            self,
+            token_ids_0: List[int],
+            token_ids_1: Optional[List[int]] = None) -> List[int]:
+        """
+        Create a mask from the two sequences passed to be used in a sequence-pair classification task. Veco does
+        not make use of token type ids, therefore a list of zeros is returned.
+
+        Args:
+            token_ids_0 (`List[int]`):
+                List of IDs.
+            token_ids_1 (`List[int]`, *optional*):
+                Optional second list of IDs for sequence pairs.
+
+        Returns:
+            `List[int]`: List of zeros.
+
+        """
+
+        sep = [self.sep_token_id]
+        cls = [self.cls_token_id]
+
+        if token_ids_1 is None:
+            return len(cls + token_ids_0 + sep) * [0]
+        return len(cls + token_ids_0 + sep + sep + token_ids_1 + sep) * [0]
+
+    @property
+    def vocab_size(self):
+        return len(
+            self.sp_model) + self.fairseq_offset + 1  # Add the <mask> token
+
+    def get_vocab(self):
+        vocab = {
+            self.convert_ids_to_tokens(i): i
+            for i in range(self.vocab_size)
+        }
+        vocab.update(self.added_tokens_encoder)
+        return vocab
+
+    def _tokenize(self, text: str) -> List[str]:
+        return self.sp_model.encode(text, out_type=str)
+
+    def _convert_token_to_id(self, token):
+        """Converts a token (str) in an id using the vocab."""
+        if token in self.fairseq_tokens_to_ids:
+            return self.fairseq_tokens_to_ids[token]
+        spm_id = self.sp_model.PieceToId(token)
+
+        # Need to return unknown token if the SP model returned 0
+        return spm_id + self.fairseq_offset if spm_id else self.unk_token_id
+
+    def _convert_id_to_token(self, index):
+        """Converts an index (integer) in a token (str) using the vocab."""
+        if index in self.fairseq_ids_to_tokens:
+            return self.fairseq_ids_to_tokens[index]
+        return self.sp_model.IdToPiece(index - self.fairseq_offset)
+
+    def convert_tokens_to_string(self, tokens):
+        """Converts a sequence of tokens (strings for sub-words) in a single string."""
+        out_string = ''.join(tokens).replace(SPIECE_UNDERLINE, ' ').strip()
+        return out_string
+
+    def save_vocabulary(self,
+                        save_directory: str,
+                        filename_prefix: Optional[str] = None) -> Tuple[str]:
+        if not os.path.isdir(save_directory):
+            logger.error(
+                f'Vocabulary path ({save_directory}) should be a directory')
+            return
+        out_vocab_file = os.path.join(
+            save_directory, (filename_prefix + '-' if filename_prefix else '')
+            + VOCAB_FILES_NAMES['vocab_file'])
+
+        if os.path.abspath(self.vocab_file) != os.path.abspath(out_vocab_file):
+            copyfile(self.vocab_file, out_vocab_file)
+
+        return (out_vocab_file, )
--- a/modelscope/models/nlp/veco/tokenization_veco_fast.py
+++ b/modelscope/models/nlp/veco/tokenization_veco_fast.py
@@ -0,0 +1,213 @@
+# Copyright 2018 Google AI, Google Brain and Carnegie Mellon University Authors and the HuggingFace Inc. team.
+# Copyright 2021-2022 The Alibaba DAMO NLP Team Authors.
+# All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License
+"""Fast Tokenization classes for Veco. mainly copied from :module:`~transformers.tokenization_xlm_roberta_fast`"""
+
+import os
+from shutil import copyfile
+from typing import List, Optional, Tuple
+
+import transformers
+from transformers.file_utils import is_sentencepiece_available
+from transformers.tokenization_utils import AddedToken
+from transformers.tokenization_utils_fast import PreTrainedTokenizerFast
+
+from modelscope.utils import logger as logging
+
+if is_sentencepiece_available():
+    from .tokenization_veco import VecoTokenizer
+else:
+    VecoTokenizer = None
+
+logger = logging.get_logger(__name__)
+
+VOCAB_FILES_NAMES = {
+    'vocab_file': 'sentencepiece.bpe.model',
+    'tokenizer_file': 'tokenizer.json'
+}
+
+PRETRAINED_VOCAB_FILES_MAP = {
+    'vocab_file': {},
+    'tokenizer_file': {},
+}
+
+PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {}
+
+transformers.SLOW_TO_FAST_CONVERTERS[
+    'VecoTokenizer'] = transformers.SLOW_TO_FAST_CONVERTERS[
+        'XLMRobertaTokenizer']
+
+
+class VecoTokenizerFast(PreTrainedTokenizerFast):
+    """
+    Adapted from [`RobertaTokenizer`] and [`XLNetTokenizer`].
+    Based on [BPE](https://huggingface.co/docs/tokenizers/python/latest/components.html?highlight=BPE#models).
+
+    This tokenizer inherits from [`PreTrainedTokenizerFast`] which contains most of the main
+    methods. Users should refer to this superclass for more information regarding those methods.
+
+    Args:
+        vocab_file (`str`):
+            Path to the vocabulary file.
+        bos_token (`str`, *optional*, defaults to `"<s>"`):
+            The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token.
+
+            <Tip>
+
+            When building a sequence using special tokens, this is not the token that is used for the beginning of
+            sequence. The token used is the `cls_token`.
+
+            </Tip>
+
+        eos_token (`str`, *optional*, defaults to `"</s>"`):
+            The end of sequence token.
+
+            <Tip>
+
+            When building a sequence using special tokens, this is not the token that is used for the end of
+            sequence. The token used is the `sep_token`.
+
+            </Tip>
+
+        sep_token (`str`, *optional*, defaults to `"</s>"`):
+            The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for
+            sequence classification or for a text and a question for question answering. It is also used as the last
+            token of a sequence built with special tokens.
+        cls_token (`str`, *optional*, defaults to `"<s>"`):
+            The classifier token which is used when doing sequence classification (classification of the whole sequence
+            instead of per-token classification). It is the first token of the sequence when built with special tokens.
+        unk_token (`str`, *optional*, defaults to `"<unk>"`):
+            The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
+            token instead.
+        pad_token (`str`, *optional*, defaults to `"<pad>"`):
+            The token used for padding, for example when batching sequences of different lengths.
+        mask_token (`str`, *optional*, defaults to `"<mask>"`):
+            The token used for masking values. This is the token used when training this model with masked language
+            modeling. This is the token which the model will try to predict.
+        additional_special_tokens (`List[str]`, *optional*, defaults to `["<s>NOTUSED", "</s>NOTUSED"]`):
+            Additional special tokens used by the tokenizer.
+    """
+
+    vocab_files_names = VOCAB_FILES_NAMES
+    pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
+    max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
+    model_input_names = ['input_ids', 'attention_mask']
+    slow_tokenizer_class = VecoTokenizer
+
+    def __init__(self,
+                 vocab_file=None,
+                 tokenizer_file=None,
+                 bos_token='<s>',
+                 eos_token='</s>',
+                 sep_token='</s>',
+                 cls_token='<s>',
+                 unk_token='<unk>',
+                 pad_token='<pad>',
+                 mask_token='<mask>',
+                 **kwargs):
+        # Mask token behave like a normal word, i.e. include the space before it
+        mask_token = AddedToken(
+            mask_token, lstrip=True, rstrip=False) if isinstance(
+                mask_token, str) else mask_token
+
+        super().__init__(
+            vocab_file,
+            tokenizer_file=tokenizer_file,
+            bos_token=bos_token,
+            eos_token=eos_token,
+            sep_token=sep_token,
+            cls_token=cls_token,
+            unk_token=unk_token,
+            pad_token=pad_token,
+            mask_token=mask_token,
+            **kwargs,
+        )
+
+        self.vocab_file = vocab_file
+        self.can_save_slow_tokenizer = False if not self.vocab_file else True
+
+    def build_inputs_with_special_tokens(
+            self,
+            token_ids_0: List[int],
+            token_ids_1: Optional[List[int]] = None) -> List[int]:
+        """
+        Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
+        adding special tokens. An Veco sequence has the following format:
+
+        - single sequence: `<s> X </s>`
+        - pair of sequences: `<s> A </s></s> B </s>`
+
+        Args:
+            token_ids_0 (`List[int]`):
+                List of IDs to which the special tokens will be added.
+            token_ids_1 (`List[int]`, *optional*):
+                Optional second list of IDs for sequence pairs.
+
+        Returns:
+            `List[int]`: List of [input IDs](../glossary#input-ids) with the appropriate special tokens.
+        """
+
+        if token_ids_1 is None:
+            return [self.cls_token_id] + token_ids_0 + [self.sep_token_id]
+        cls = [self.cls_token_id]
+        sep = [self.sep_token_id]
+        return cls + token_ids_0 + sep + sep + token_ids_1 + sep
+
+    def create_token_type_ids_from_sequences(
+            self,
+            token_ids_0: List[int],
+            token_ids_1: Optional[List[int]] = None) -> List[int]:
+        """
+        Create a mask from the two sequences passed to be used in a sequence-pair classification task. Veco does
+        not make use of token type ids, therefore a list of zeros is returned.
+
+        Args:
+            token_ids_0 (`List[int]`):
+                List of IDs.
+            token_ids_1 (`List[int]`, *optional*):
+                Optional second list of IDs for sequence pairs.
+
+        Returns:
+            `List[int]`: List of zeros.
+
+        """
+
+        sep = [self.sep_token_id]
+        cls = [self.cls_token_id]
+
+        if token_ids_1 is None:
+            return len(cls + token_ids_0 + sep) * [0]
+        return len(cls + token_ids_0 + sep + sep + token_ids_1 + sep) * [0]
+
+    def save_vocabulary(self,
+                        save_directory: str,
+                        filename_prefix: Optional[str] = None) -> Tuple[str]:
+        if not self.can_save_slow_tokenizer:
+            raise ValueError(
+                'Your fast tokenizer does not have the necessary information to save the vocabulary for a slow '
+                'tokenizer.')
+
+        if not os.path.isdir(save_directory):
+            logger.error(
+                f'Vocabulary path ({save_directory}) should be a directory.')
+            return
+        out_vocab_file = os.path.join(
+            save_directory, (filename_prefix + '-' if filename_prefix else '')
+            + VOCAB_FILES_NAMES['vocab_file'])
+
+        if os.path.abspath(self.vocab_file) != os.path.abspath(out_vocab_file):
+            copyfile(self.vocab_file, out_vocab_file)
+
+        return (out_vocab_file, )
--- a/modelscope/msdatasets/ms_dataset.py
+++ b/modelscope/msdatasets/ms_dataset.py
@@ -517,3 +517,10 @@ class MsDataset:
    def to_hf_dataset(self) -> Dataset:
        self._hf_ds.reset_format()
        return self._hf_ds
+
+    @staticmethod
+    def interleave_datasets(datasets: List[Any],
+                            probabilities: Optional[List[float]] = None,
+                            seed: Optional[int] = None):
+        from datasets import interleave_datasets
+        return interleave_datasets(datasets, probabilities, seed)
--- a/modelscope/outputs.py
+++ b/modelscope/outputs.py
@@ -9,6 +9,7 @@ class OutputKeys(object):
    SCORES = 'scores'
    LABEL = 'label'
    LABELS = 'labels'
+    INPUT_IDS = 'input_ids'
    LABEL_POS = 'label_pos'
    POSES = 'poses'
    CAPTION = 'caption'
--- a/modelscope/pipelines/nlp/init.py
+++ b/modelscope/pipelines/nlp/init.py
@@ -9,9 +9,8 @@ if TYPE_CHECKING:
    from .dialog_state_tracking_pipeline import DialogStateTrackingPipeline
    from .fill_mask_pipeline import FillMaskPipeline
    from .named_entity_recognition_pipeline import NamedEntityRecognitionPipeline
-    from .nli_pipeline import NLIPipeline
-    from .sentence_similarity_pipeline import SentenceSimilarityPipeline
-    from .sentiment_classification_pipeline import SentimentClassificationPipeline
+    from .pair_sentence_classification_pipeline import PairSentenceClassificationPipeline
+    from .single_sentence_classification_pipeline import SingleSentenceClassificationPipeline
    from .sequence_classification_pipeline import SequenceClassificationPipeline
    from .text_generation_pipeline import TextGenerationPipeline
    from .translation_pipeline import TranslationPipeline
@@ -28,10 +27,10 @@ else:
        'dialog_modeling_pipeline': ['DialogModelingPipeline'],
        'dialog_state_tracking_pipeline': ['DialogStateTrackingPipeline'],
        'fill_mask_pipeline': ['FillMaskPipeline'],
-        'nli_pipeline': ['NLIPipeline'],
-        'sentence_similarity_pipeline': ['SentenceSimilarityPipeline'],
-        'sentiment_classification_pipeline':
-        ['SentimentClassificationPipeline'],
+        'single_sentence_classification_pipeline':
+        ['SingleSentenceClassificationPipeline'],
+        'pair_sentence_classification_pipeline':
+        ['PairSentenceClassificationPipeline'],
        'sequence_classification_pipeline': ['SequenceClassificationPipeline'],
        'text_generation_pipeline': ['TextGenerationPipeline'],
        'word_segmentation_pipeline': ['WordSegmentationPipeline'],
--- a/modelscope/pipelines/nlp/fill_mask_pipeline.py
+++ b/modelscope/pipelines/nlp/fill_mask_pipeline.py
@@ -5,11 +5,10 @@ import torch

 from modelscope.metainfo import Pipelines
 from modelscope.models import Model
-from modelscope.models.nlp.masked_language import MaskedLanguageModelBase
 from modelscope.outputs import OutputKeys
 from modelscope.pipelines.base import Pipeline, Tensor
 from modelscope.pipelines.builder import PIPELINES
-from modelscope.preprocessors import FillMaskPreprocessor
+from modelscope.preprocessors import FillMaskPreprocessor, Preprocessor
 from modelscope.utils.config import Config
 from modelscope.utils.constant import ModelFile, Tasks

@@ -21,18 +20,18 @@ _type_map = {'veco': 'roberta', 'sbert': 'bert'}
 class FillMaskPipeline(Pipeline):

    def __init__(self,
-                 model: Union[MaskedLanguageModelBase, str],
-                 preprocessor: Optional[FillMaskPreprocessor] = None,
-                 first_sequence='sentense',
+                 model: Union[Model, str],
+                 preprocessor: Optional[Preprocessor] = None,
+                 first_sequence='sentence',
                 **kwargs):
        """use `model` and `preprocessor` to create a nlp fill mask pipeline for prediction

        Args:
-            model (MaskedLanguageModelBase): a model instance
-            preprocessor (FillMaskPreprocessor): a preprocessor instance
+            model (Model): a model instance
+            preprocessor (Preprocessor): a preprocessor instance
        """
        fill_mask_model = model if isinstance(
-            model, MaskedLanguageModelBase) else Model.from_pretrained(model)
+            model, Model) else Model.from_pretrained(model)

        if preprocessor is None:
            preprocessor = FillMaskPreprocessor(
@@ -73,7 +72,7 @@ class FillMaskPipeline(Pipeline):
    def forward(self, inputs: Dict[str, Any],
                **forward_params) -> Dict[str, Any]:
        with torch.no_grad():
-            return super().forward(inputs, **forward_params)
+            return self.model(inputs, **forward_params)

    def postprocess(self, inputs: Dict[str, Tensor]) -> Dict[str, Tensor]:
        """process the prediction results
@@ -85,8 +84,8 @@ class FillMaskPipeline(Pipeline):
            Dict[str, str]: the prediction results
        """
        import numpy as np
-        logits = inputs['logits'].detach().cpu().numpy()
-        input_ids = inputs['input_ids'].detach().cpu().numpy()
+        logits = inputs[OutputKeys.LOGITS].detach().cpu().numpy()
+        input_ids = inputs[OutputKeys.INPUT_IDS].detach().cpu().numpy()
        pred_ids = np.argmax(logits, axis=-1)
        model_type = self.model.config.model_type
        process_type = model_type if model_type in self.mask_id else _type_map[
--- a/modelscope/pipelines/nlp/named_entity_recognition_pipeline.py
+++ b/modelscope/pipelines/nlp/named_entity_recognition_pipeline.py
@@ -4,11 +4,10 @@ import torch

 from modelscope.metainfo import Pipelines
 from modelscope.models import Model
-from modelscope.models.nlp import TransformerCRFForNamedEntityRecognition
 from modelscope.outputs import OutputKeys
-from modelscope.pipelines.base import Pipeline, Tensor
+from modelscope.pipelines.base import Pipeline
 from modelscope.pipelines.builder import PIPELINES
-from modelscope.preprocessors import NERPreprocessor
+from modelscope.preprocessors import NERPreprocessor, Preprocessor
 from modelscope.utils.constant import Tasks

 __all__ = ['NamedEntityRecognitionPipeline']
@@ -20,13 +19,12 @@ __all__ = ['NamedEntityRecognitionPipeline']
 class NamedEntityRecognitionPipeline(Pipeline):

    def __init__(self,
-                 model: Union[TransformerCRFForNamedEntityRecognition, str],
-                 preprocessor: Optional[NERPreprocessor] = None,
+                 model: Union[Model, str],
+                 preprocessor: Optional[Preprocessor] = None,
                 **kwargs):

        model = model if isinstance(model,
-                                    TransformerCRFForNamedEntityRecognition
-                                    ) else Model.from_pretrained(model)
+                                    Model) else Model.from_pretrained(model)
        if preprocessor is None:
            preprocessor = NERPreprocessor(model.model_dir)
        model.eval()
--- a/modelscope/pipelines/nlp/nli_pipeline.py
+++ b/modelscope/pipelines/nlp/nli_pipeline.py
@@ -1,73 +0,0 @@
-import uuid
-from typing import Any, Dict, Union
-
-import numpy as np
-import torch
-
-from modelscope.metainfo import Pipelines
-from modelscope.models import Model
-from modelscope.models.nlp import SbertForNLI
-from modelscope.outputs import OutputKeys
-from modelscope.pipelines.base import Pipeline
-from modelscope.pipelines.builder import PIPELINES
-from modelscope.preprocessors import NLIPreprocessor
-from modelscope.utils.constant import Tasks
-
-__all__ = ['NLIPipeline']
-
-
-@PIPELINES.register_module(Tasks.nli, module_name=Pipelines.nli)
-class NLIPipeline(Pipeline):
-
-    def __init__(self,
-                 model: Union[SbertForNLI, str],
-                 preprocessor: NLIPreprocessor = None,
-                 first_sequence='first_sequence',
-                 second_sequence='second_sequence',
-                 **kwargs):
-        """use `model` and `preprocessor` to create a nlp text classification pipeline for prediction
-
-        Args:
-            model (SbertForNLI): a model instance
-            preprocessor (NLIPreprocessor): a preprocessor instance
-        """
-        assert isinstance(model, str) or isinstance(model, SbertForNLI), \
-            'model must be a single str or SbertForNLI'
-        model = model if isinstance(
-            model, SbertForNLI) else Model.from_pretrained(model)
-        if preprocessor is None:
-            preprocessor = NLIPreprocessor(
-                model.model_dir,
-                first_sequence=first_sequence,
-                second_sequence=second_sequence)
-        model.eval()
-        super().__init__(model=model, preprocessor=preprocessor, **kwargs)
-        assert len(model.id2label) > 0
-
-    def forward(self, inputs: Dict[str, Any],
-                **forward_params) -> Dict[str, Any]:
-        with torch.no_grad():
-            return super().forward(inputs, **forward_params)
-
-    def postprocess(self,
-                    inputs: Dict[str, Any],
-                    topk: int = 5) -> Dict[str, str]:
-        """process the prediction results
-
-        Args:
-            inputs (Dict[str, Any]): _description_
-
-        Returns:
-            Dict[str, str]: the prediction results
-        """
-
-        probs = inputs['probabilities'][0]
-        num_classes = probs.shape[0]
-        topk = min(topk, num_classes)
-        top_indices = np.argpartition(probs, -topk)[-topk:]
-        cls_ids = top_indices[np.argsort(probs[top_indices])]
-        probs = probs[cls_ids].tolist()
-
-        cls_names = [self.model.id2label[cid] for cid in cls_ids]
-
-        return {OutputKeys.SCORES: probs, OutputKeys.LABELS: cls_names}
--- a/modelscope/pipelines/nlp/pair_sentence_classification_pipeline.py
+++ b/modelscope/pipelines/nlp/pair_sentence_classification_pipeline.py
@@ -0,0 +1,37 @@
+from typing import Union
+
+from modelscope.models.base import Model
+from ...metainfo import Pipelines
+from ...preprocessors import (PairSentenceClassificationPreprocessor,
+                              Preprocessor)
+from ...utils.constant import Tasks
+from ..builder import PIPELINES
+from .sequence_classification_pipeline_base import \
+    SequenceClassificationPipelineBase
+
+__all__ = ['PairSentenceClassificationPipeline']
+
+
+@PIPELINES.register_module(Tasks.nli, module_name=Pipelines.nli)
+@PIPELINES.register_module(
+    Tasks.sentence_similarity, module_name=Pipelines.sentence_similarity)
+class PairSentenceClassificationPipeline(SequenceClassificationPipelineBase):
+
+    def __init__(self,
+                 model: Union[Model, str],
+                 preprocessor: Preprocessor = None,
+                 first_sequence='first_sequence',
+                 second_sequence='second_sequence',
+                 **kwargs):
+        """use `model` and `preprocessor` to create a nlp pair sentence classification pipeline for prediction
+
+        Args:
+            model (Model): a model instance
+            preprocessor (Preprocessor): a preprocessor instance
+        """
+        if preprocessor is None:
+            preprocessor = PairSentenceClassificationPreprocessor(
+                model.model_dir if isinstance(model, Model) else model,
+                first_sequence=first_sequence,
+                second_sequence=second_sequence)
+        super().__init__(model=model, preprocessor=preprocessor, **kwargs)
--- a/modelscope/pipelines/nlp/sentence_similarity_pipeline.py
+++ b/modelscope/pipelines/nlp/sentence_similarity_pipeline.py
@@ -1,73 +0,0 @@
-from typing import Any, Dict, Union
-
-import numpy as np
-import torch
-
-from modelscope.metainfo import Pipelines
-from modelscope.models import Model
-from modelscope.models.nlp import SbertForSentenceSimilarity
-from modelscope.outputs import OutputKeys
-from modelscope.pipelines.base import Input, Pipeline
-from modelscope.pipelines.builder import PIPELINES
-from modelscope.preprocessors import SentenceSimilarityPreprocessor
-from modelscope.utils.constant import Tasks
-
-__all__ = ['SentenceSimilarityPipeline']
-
-
-@PIPELINES.register_module(
-    Tasks.sentence_similarity, module_name=Pipelines.sentence_similarity)
-class SentenceSimilarityPipeline(Pipeline):
-
-    def __init__(self,
-                 model: Union[Model, str],
-                 preprocessor: SentenceSimilarityPreprocessor = None,
-                 first_sequence='first_sequence',
-                 second_sequence='second_sequence',
-                 **kwargs):
-        """use `model` and `preprocessor` to create a nlp sentence similarity pipeline for prediction
-
-        Args:
-            model (SbertForSentenceSimilarity): a model instance
-            preprocessor (SentenceSimilarityPreprocessor): a preprocessor instance
-        """
-        assert isinstance(model, str) or isinstance(model, SbertForSentenceSimilarity), \
-            'model must be a single str or SbertForSentenceSimilarity'
-        sc_model = model if isinstance(
-            model,
-            SbertForSentenceSimilarity) else Model.from_pretrained(model)
-        if preprocessor is None:
-            preprocessor = SentenceSimilarityPreprocessor(
-                sc_model.model_dir,
-                first_sequence=first_sequence,
-                second_sequence=second_sequence)
-        sc_model.eval()
-        super().__init__(model=sc_model, preprocessor=preprocessor, **kwargs)
-
-        assert hasattr(self.model, 'id2label'), \
-            'id2label map should be initalizaed in init function.'
-
-    def forward(self, inputs: Dict[str, Any],
-                **forward_params) -> Dict[str, Any]:
-        with torch.no_grad():
-            return super().forward(inputs, **forward_params)
-
-    def postprocess(self, inputs: Dict[str, Any],
-                    **postprocess_params) -> Dict[str, str]:
-        """process the prediction results
-
-        Args:
-            inputs (Dict[str, Any]): _description_
-
-        Returns:
-            Dict[str, str]: the prediction results
-        """
-
-        probs = inputs['probabilities'][0]
-        num_classes = probs.shape[0]
-        top_indices = np.argpartition(probs, -num_classes)[-num_classes:]
-        cls_ids = top_indices[np.argsort(-probs[top_indices], axis=-1)]
-        probs = probs[cls_ids].tolist()
-        cls_names = [self.model.id2label[cid] for cid in cls_ids]
-        b = 0
-        return {OutputKeys.SCORES: probs[b], OutputKeys.LABELS: cls_names[b]}
--- a/modelscope/pipelines/nlp/sentiment_classification_pipeline.py
+++ b/modelscope/pipelines/nlp/sentiment_classification_pipeline.py
@@ -1,74 +0,0 @@
-from typing import Any, Dict, Union
-
-import numpy as np
-import torch
-
-from modelscope.metainfo import Pipelines
-from modelscope.models import Model
-from modelscope.models.nlp import SequenceClassificationModel
-from modelscope.outputs import OutputKeys
-from modelscope.pipelines.base import Pipeline
-from modelscope.pipelines.builder import PIPELINES
-from modelscope.preprocessors import SentimentClassificationPreprocessor
-from modelscope.utils.constant import Tasks
-
-__all__ = ['SentimentClassificationPipeline']
-
-
-@PIPELINES.register_module(
-    Tasks.sentiment_classification,
-    module_name=Pipelines.sentiment_classification)
-class SentimentClassificationPipeline(Pipeline):
-
-    def __init__(self,
-                 model: Union[SequenceClassificationModel, str],
-                 preprocessor: SentimentClassificationPreprocessor = None,
-                 first_sequence='first_sequence',
-                 second_sequence='second_sequence',
-                 **kwargs):
-        """use `model` and `preprocessor` to create a nlp text classification pipeline for prediction
-
-        Args:
-            model (SequenceClassificationModel): a model instance
-            preprocessor (SentimentClassificationPreprocessor): a preprocessor instance
-        """
-        assert isinstance(model, str) or isinstance(model, SequenceClassificationModel), \
-            'model must be a single str or SentimentClassification'
-        model = model if isinstance(
-            model,
-            SequenceClassificationModel) else Model.from_pretrained(model)
-        if preprocessor is None:
-            preprocessor = SentimentClassificationPreprocessor(
-                model.model_dir,
-                first_sequence=first_sequence,
-                second_sequence=second_sequence)
-        model.eval()
-        super().__init__(model=model, preprocessor=preprocessor, **kwargs)
-        assert len(model.id2label) > 0
-
-    def forward(self, inputs: Dict[str, Any],
-                **forward_params) -> Dict[str, Any]:
-        with torch.no_grad():
-            return super().forward(inputs, **forward_params)
-
-    def postprocess(self,
-                    inputs: Dict[str, Any],
-                    topk: int = 5) -> Dict[str, str]:
-        """process the prediction results
-
-        Args:
-            inputs (Dict[str, Any]): _description_
-
-        Returns:
-            Dict[str, str]: the prediction results
-        """
-
-        probs = inputs['probabilities'][0]
-        num_classes = probs.shape[0]
-        topk = min(topk, num_classes)
-        top_indices = np.argpartition(probs, -topk)[-topk:]
-        cls_ids = top_indices[np.argsort(probs[top_indices])]
-        probs = probs[cls_ids].tolist()
-
-        cls_names = [self.model.id2label[cid] for cid in cls_ids]
-        return {OutputKeys.SCORES: probs, OutputKeys.LABELS: cls_names}
--- a/modelscope/pipelines/nlp/sequence_classification_pipeline_base.py
+++ b/modelscope/pipelines/nlp/sequence_classification_pipeline_base.py
@@ -0,0 +1,60 @@
+from typing import Any, Dict, Union
+
+import numpy as np
+import torch
+
+from modelscope.models.base import Model
+from modelscope.outputs import OutputKeys
+from ...preprocessors import Preprocessor
+from ..base import Pipeline
+
+
+class SequenceClassificationPipelineBase(Pipeline):
+
+    def __init__(self, model: Union[Model, str], preprocessor: Preprocessor,
+                 **kwargs):
+        """use `model` and `preprocessor` to create a nlp text classification pipeline for prediction
+
+        Args:
+            model (str or Model): a model instance
+            preprocessor (Preprocessor): a preprocessor instance
+        """
+        assert isinstance(model, str) or isinstance(model, Model), \
+            'model must be a single str or Model'
+        model = model if isinstance(model,
+                                    Model) else Model.from_pretrained(model)
+        assert preprocessor is not None
+        model.eval()
+        super().__init__(model=model, preprocessor=preprocessor, **kwargs)
+        self.id2label = kwargs.get('id2label')
+        if self.id2label is None and hasattr(self.preprocessor, 'id2label'):
+            self.id2label = self.preprocessor.id2label
+        assert self.id2label is not None, 'Cannot convert id to the original label, please pass in the mapping ' \
+                                          'as a parameter or make sure the preprocessor has the attribute.'
+
+    def forward(self, inputs: Dict[str, Any],
+                **forward_params) -> Dict[str, Any]:
+        with torch.no_grad():
+            return self.model(inputs, **forward_params)
+
+    def postprocess(self,
+                    inputs: Dict[str, Any],
+                    topk: int = 5) -> Dict[str, str]:
+        """process the prediction results
+
+        Args:
+            inputs (Dict[str, Any]): _description_
+            topk (int): The topk probs to take
+        Returns:
+            Dict[str, str]: the prediction results
+        """
+
+        probs = inputs[OutputKeys.PROBABILITIES][0]
+        num_classes = probs.shape[0]
+        topk = min(topk, num_classes)
+        top_indices = np.argpartition(probs, -topk)[-topk:]
+        cls_ids = top_indices[np.argsort(probs[top_indices])]
+        probs = probs[cls_ids].tolist()
+
+        cls_names = [self.id2label[cid] for cid in cls_ids]
+        return {OutputKeys.SCORES: probs, OutputKeys.LABELS: cls_names}
--- a/modelscope/pipelines/nlp/single_sentence_classification_pipeline.py
+++ b/modelscope/pipelines/nlp/single_sentence_classification_pipeline.py
@@ -0,0 +1,35 @@
+from typing import Union
+
+from ...metainfo import Pipelines
+from ...models import Model
+from ...preprocessors import (Preprocessor,
+                              SingleSentenceClassificationPreprocessor)
+from ...utils.constant import Tasks
+from ..builder import PIPELINES
+from .sequence_classification_pipeline_base import \
+    SequenceClassificationPipelineBase
+
+__all__ = ['SingleSentenceClassificationPipeline']
+
+
+@PIPELINES.register_module(
+    Tasks.sentiment_classification,
+    module_name=Pipelines.sentiment_classification)
+class SingleSentenceClassificationPipeline(SequenceClassificationPipelineBase):
+
+    def __init__(self,
+                 model: Union[Model, str],
+                 preprocessor: Preprocessor = None,
+                 first_sequence='first_sequence',
+                 **kwargs):
+        """use `model` and `preprocessor` to create a nlp single sentence classification pipeline for prediction
+
+        Args:
+            model (Model): a model instance
+            preprocessor (Preprocessor): a preprocessor instance
+        """
+        if preprocessor is None:
+            preprocessor = SingleSentenceClassificationPreprocessor(
+                model.model_dir if isinstance(model, Model) else model,
+                first_sequence=first_sequence)
+        super().__init__(model=model, preprocessor=preprocessor, **kwargs)
--- a/modelscope/pipelines/nlp/text_generation_pipeline.py
+++ b/modelscope/pipelines/nlp/text_generation_pipeline.py
@@ -3,7 +3,7 @@ from typing import Any, Dict, Optional, Union
 import torch

 from modelscope.metainfo import Pipelines
-from modelscope.models.base import TorchModel
+from modelscope.models.base import Model
 from modelscope.pipelines.base import Pipeline, Tensor
 from modelscope.pipelines.builder import PIPELINES
 from modelscope.preprocessors import TextGenerationPreprocessor
@@ -17,7 +17,7 @@ __all__ = ['TextGenerationPipeline']
 class TextGenerationPipeline(Pipeline):

    def __init__(self,
-                 model: Union[TorchModel, str],
+                 model: Union[Model, str],
                 preprocessor: Optional[TextGenerationPreprocessor] = None,
                 **kwargs):
        """use `model` and `preprocessor` to create a nlp text generation pipeline for prediction
@@ -26,8 +26,8 @@ class TextGenerationPipeline(Pipeline):
            model (PalmForTextGeneration): a model instance
            preprocessor (TextGenerationPreprocessor): a preprocessor instance
        """
-        model = model if isinstance(
-            model, TorchModel) else TorchModel.from_pretrained(model)
+        model = model if isinstance(model,
+                                    Model) else Model.from_pretrained(model)
        if preprocessor is None:
            preprocessor = TextGenerationPreprocessor(
                model.model_dir,
--- a/modelscope/pipelines/nlp/translation_pipeline.py
+++ b/modelscope/pipelines/nlp/translation_pipeline.py
@@ -4,11 +4,9 @@ from typing import Any, Dict
 import numpy as np
 import tensorflow as tf

-from modelscope.hub.snapshot_download import snapshot_download
 from modelscope.metainfo import Pipelines
-from modelscope.models.nlp import CsanmtForTranslation
 from modelscope.outputs import OutputKeys
-from modelscope.pipelines.base import Pipeline, Tensor
+from modelscope.pipelines.base import Pipeline
 from modelscope.pipelines.builder import PIPELINES
 from modelscope.utils.constant import ModelFile, Tasks
 from modelscope.utils.logger import get_logger
--- a/modelscope/pipelines/nlp/word_segmentation_pipeline.py
+++ b/modelscope/pipelines/nlp/word_segmentation_pipeline.py
@@ -4,11 +4,11 @@ import torch

 from modelscope.metainfo import Pipelines
 from modelscope.models import Model
-from modelscope.models.nlp import SbertForTokenClassification
 from modelscope.outputs import OutputKeys
 from modelscope.pipelines.base import Pipeline, Tensor
 from modelscope.pipelines.builder import PIPELINES
-from modelscope.preprocessors import TokenClassificationPreprocessor
+from modelscope.preprocessors import (Preprocessor,
+                                      TokenClassificationPreprocessor)
 from modelscope.utils.constant import Tasks

 __all__ = ['WordSegmentationPipeline']
@@ -18,33 +18,35 @@ __all__ = ['WordSegmentationPipeline']
    Tasks.word_segmentation, module_name=Pipelines.word_segmentation)
 class WordSegmentationPipeline(Pipeline):

-    def __init__(
-            self,
-            model: Union[SbertForTokenClassification, str],
-            preprocessor: Optional[TokenClassificationPreprocessor] = None,
-            **kwargs):
+    def __init__(self,
+                 model: Union[Model, str],
+                 preprocessor: Optional[Preprocessor] = None,
+                 **kwargs):
        """use `model` and `preprocessor` to create a nlp word segmentation pipeline for prediction

        Args:
-            model (StructBertForTokenClassification): a model instance
-            preprocessor (TokenClassificationPreprocessor): a preprocessor instance
+            model (Model): a model instance
+            preprocessor (Preprocessor): a preprocessor instance
        """
-        model = model if isinstance(
-            model,
-            SbertForTokenClassification) else Model.from_pretrained(model)
+        model = model if isinstance(model,
+                                    Model) else Model.from_pretrained(model)
        if preprocessor is None:
            preprocessor = TokenClassificationPreprocessor(model.model_dir)
        model.eval()
        super().__init__(model=model, preprocessor=preprocessor, **kwargs)
-        self.tokenizer = preprocessor.tokenizer
-        self.config = model.config
-        assert len(self.config.id2label) > 0
-        self.id2label = self.config.id2label
+        self.id2label = kwargs.get('id2label')
+        if self.id2label is None and hasattr(self.preprocessor, 'id2label'):
+            self.id2label = self.preprocessor.id2label
+        assert self.id2label is not None, 'Cannot convert id to the original label, please pass in the mapping ' \
+                                          'as a parameter or make sure the preprocessor has the attribute.'

    def forward(self, inputs: Dict[str, Any],
                **forward_params) -> Dict[str, Any]:
+        text = inputs.pop(OutputKeys.TEXT)
        with torch.no_grad():
-            return super().forward(inputs, **forward_params)
+            return {
+                **self.model(inputs, **forward_params), OutputKeys.TEXT: text
+            }

    def postprocess(self, inputs: Dict[str, Any],
                    **postprocess_params) -> Dict[str, str]:
--- a/modelscope/pipelines/nlp/zero_shot_classification_pipeline.py
+++ b/modelscope/pipelines/nlp/zero_shot_classification_pipeline.py
@@ -5,11 +5,11 @@ from scipy.special import softmax

 from modelscope.metainfo import Pipelines
 from modelscope.models import Model
-from modelscope.models.nlp import SbertForZeroShotClassification
 from modelscope.outputs import OutputKeys
 from modelscope.pipelines.base import Pipeline
 from modelscope.pipelines.builder import PIPELINES
-from modelscope.preprocessors import ZeroShotClassificationPreprocessor
+from modelscope.preprocessors import (Preprocessor,
+                                      ZeroShotClassificationPreprocessor)
 from modelscope.utils.constant import Tasks

 __all__ = ['ZeroShotClassificationPipeline']
@@ -21,19 +21,18 @@ __all__ = ['ZeroShotClassificationPipeline']
 class ZeroShotClassificationPipeline(Pipeline):

    def __init__(self,
-                 model: Union[SbertForZeroShotClassification, str],
-                 preprocessor: ZeroShotClassificationPreprocessor = None,
+                 model: Union[Model, str],
+                 preprocessor: Preprocessor = None,
                 **kwargs):
-        """use `model` and `preprocessor` to create a nlp text classification pipeline for prediction
+        """use `model` and `preprocessor` to create a nlp zero-shot text classification pipeline for prediction
        Args:
-            model (SbertForZeroShotClassification): a model instance
-            preprocessor (SentimentClassificationPreprocessor): a preprocessor instance
+            model (Model): a model instance
+            preprocessor (Preprocessor): a preprocessor instance
        """
-        assert isinstance(model, str) or isinstance(model, SbertForZeroShotClassification), \
-            'model must be a single str or SbertForZeroShotClassification'
-        model = model if isinstance(
-            model,
-            SbertForZeroShotClassification) else Model.from_pretrained(model)
+        assert isinstance(model, str) or isinstance(model, Model), \
+            'model must be a single str or Model'
+        model = model if isinstance(model,
+                                    Model) else Model.from_pretrained(model)
        self.entailment_id = 0
        self.contradiction_id = 2
        if preprocessor is None:
@@ -58,7 +57,7 @@ class ZeroShotClassificationPipeline(Pipeline):
    def forward(self, inputs: Dict[str, Any],
                **forward_params) -> Dict[str, Any]:
        with torch.no_grad():
-            return super().forward(inputs, **forward_params)
+            return self.model(inputs, **forward_params)

    def postprocess(self,
                    inputs: Dict[str, Any],
@@ -70,7 +69,7 @@ class ZeroShotClassificationPipeline(Pipeline):
        Returns:
            Dict[str, Any]: the prediction results
        """
-        logits = inputs['logits']
+        logits = inputs[OutputKeys.LOGITS]
        if multi_label or len(candidate_labels) == 1:
            logits = logits[..., [self.contradiction_id, self.entailment_id]]
            scores = softmax(logits, axis=-1)[..., 1]
--- a/modelscope/preprocessors/init.py
+++ b/modelscope/preprocessors/init.py
@@ -18,11 +18,11 @@ if TYPE_CHECKING:
                              MPlugVisualQuestionAnsweringPreprocessor)
    from .nlp import (Tokenize, SequenceClassificationPreprocessor,
                      TextGenerationPreprocessor,
-                      TokenClassificationPreprocessor, NLIPreprocessor,
-                      SentimentClassificationPreprocessor,
-                      SentenceSimilarityPreprocessor, FillMaskPreprocessor,
-                      ZeroShotClassificationPreprocessor, NERPreprocessor,
-                      TextErrorCorrectionPreprocessor)
+                      TokenClassificationPreprocessor,
+                      SingleSentenceClassificationPreprocessor,
+                      PairSentenceClassificationPreprocessor,
+                      FillMaskPreprocessor, ZeroShotClassificationPreprocessor,
+                      NERPreprocessor, TextErrorCorrectionPreprocessor)
    from .space import (DialogIntentPredictionPreprocessor,
                        DialogModelingPreprocessor,
                        DialogStateTrackingPreprocessor)
@@ -46,8 +46,8 @@ else:
        'nlp': [
            'Tokenize', 'SequenceClassificationPreprocessor',
            'TextGenerationPreprocessor', 'TokenClassificationPreprocessor',
-            'NLIPreprocessor', 'SentimentClassificationPreprocessor',
-            'SentenceSimilarityPreprocessor', 'FillMaskPreprocessor',
+            'SingleSentenceClassificationPreprocessor',
+            'PairSentenceClassificationPreprocessor', 'FillMaskPreprocessor',
            'ZeroShotClassificationPreprocessor', 'NERPreprocessor',
            'TextErrorCorrectionPreprocessor'
        ],
--- a/modelscope/preprocessors/base.py
+++ b/modelscope/preprocessors/base.py
@@ -1,5 +1,5 @@
 # Copyright (c) Alibaba, Inc. and its affiliates.
-
+import os
 from abc import ABC, abstractmethod
 from typing import Any, Dict

@@ -10,6 +10,8 @@ class Preprocessor(ABC):

    def __init__(self, *args, **kwargs):
        self._mode = ModeKeys.INFERENCE
+        self.device = int(
+            os.environ['LOCAL_RANK']) if 'LOCAL_RANK' in os.environ else None
        pass

    @abstractmethod
--- a/modelscope/preprocessors/nlp.py
+++ b/modelscope/preprocessors/nlp.py
@@ -2,14 +2,14 @@

 import os.path as osp
 import uuid
-from typing import Any, Dict, Optional, Union
+from typing import Any, Dict, Iterable, Optional, Tuple, Union

 from transformers import AutoTokenizer

-from modelscope.metainfo import Preprocessors
-from modelscope.models import Model
+from modelscope.metainfo import Models, Preprocessors
+from modelscope.outputs import OutputKeys
 from modelscope.utils.constant import Fields, InputFields, ModeKeys
-from modelscope.utils.hub import parse_label_mapping
+from modelscope.utils.hub import get_model_type, parse_label_mapping
 from modelscope.utils.type_assert import type_assert
 from .base import Preprocessor
 from .builder import PREPROCESSORS
@@ -17,8 +17,8 @@ from .builder import PREPROCESSORS
 __all__ = [
    'Tokenize', 'SequenceClassificationPreprocessor',
    'TextGenerationPreprocessor', 'TokenClassificationPreprocessor',
-    'NLIPreprocessor', 'SentimentClassificationPreprocessor',
-    'FillMaskPreprocessor', 'SentenceSimilarityPreprocessor',
+    'PairSentenceClassificationPreprocessor',
+    'SingleSentenceClassificationPreprocessor', 'FillMaskPreprocessor',
    'ZeroShotClassificationPreprocessor', 'NERPreprocessor',
    'TextErrorCorrectionPreprocessor'
 ]
@@ -38,99 +38,6 @@ class Tokenize(Preprocessor):
        return data


-class NLPPreprocessorBase(Preprocessor):
-
-    def __init__(self, model_dir: str, *args, **kwargs):
-        """preprocess the data via the vocab.txt from the `model_dir` path
-
-        Args:
-            model_dir (str): model path
-        """
-
-        super().__init__(*args, **kwargs)
-        self.model_dir: str = model_dir
-        self.first_sequence: str = kwargs.pop('first_sequence',
-                                              'first_sequence')
-        self.second_sequence = kwargs.pop('second_sequence', 'second_sequence')
-        self.tokenize_kwargs = kwargs
-        self.tokenizer = self.build_tokenizer(model_dir)
-        self.label2id = parse_label_mapping(self.model_dir)
-
-    def build_tokenizer(self, model_dir):
-        from sofa import SbertTokenizer
-        return SbertTokenizer.from_pretrained(model_dir)
-
-    @type_assert(object, object)
-    def __call__(self, data: Union[str, tuple, Dict]) -> Dict[str, Any]:
-        """process the raw input data
-
-        Args:
-            data (tuple): [sentence1, sentence2]
-                sentence1 (str): a sentence
-                    Example:
-                        'you are so handsome.'
-                sentence2 (str): a sentence
-                    Example:
-                        'you are so beautiful.'
-        Returns:
-            Dict[str, Any]: the preprocessed data
-        """
-
-        text_a, text_b = None, None
-        if isinstance(data, str):
-            text_a = data
-        elif isinstance(data, tuple):
-            assert len(data) == 2
-            text_a, text_b = data
-        elif isinstance(data, dict):
-            text_a = data.get(self.first_sequence)
-            text_b = data.get(self.second_sequence, None)
-
-        rst = self.tokenizer(text_a, text_b, **self.tokenize_kwargs)
-        if self._mode == ModeKeys.TRAIN:
-            rst = {k: v.squeeze() for k, v in rst.items()}
-            if self.label2id is not None and 'label' in data:
-                rst['label'] = self.label2id[str(data['label'])]
-        return rst
-
-
-@PREPROCESSORS.register_module(
-    Fields.nlp, module_name=Preprocessors.nli_tokenizer)
-class NLIPreprocessor(NLPPreprocessorBase):
-
-    def __init__(self, model_dir: str, *args, **kwargs):
-        kwargs['truncation'] = True
-        kwargs['padding'] = False
-        kwargs['return_tensors'] = 'pt'
-        kwargs['max_length'] = kwargs.pop('sequence_length', 128)
-        super().__init__(model_dir, *args, **kwargs)
-
-
-@PREPROCESSORS.register_module(
-    Fields.nlp, module_name=Preprocessors.sen_cls_tokenizer)
-class SentimentClassificationPreprocessor(NLPPreprocessorBase):
-
-    def __init__(self, model_dir: str, *args, **kwargs):
-        kwargs['truncation'] = True
-        kwargs['padding'] = 'max_length'
-        kwargs['return_tensors'] = 'pt'
-        kwargs['max_length'] = kwargs.pop('sequence_length', 128)
-        super().__init__(model_dir, *args, **kwargs)
-
-
-@PREPROCESSORS.register_module(
-    Fields.nlp, module_name=Preprocessors.sen_sim_tokenizer)
-class SentenceSimilarityPreprocessor(NLPPreprocessorBase):
-
-    def __init__(self, model_dir: str, *args, **kwargs):
-        kwargs['truncation'] = True
-        kwargs['padding'] = False if 'padding' not in kwargs else kwargs[
-            'padding']
-        kwargs['return_tensors'] = 'pt'
-        kwargs['max_length'] = kwargs.pop('sequence_length', 128)
-        super().__init__(model_dir, *args, **kwargs)
-
-
@PREPROCESSORS.register_module(
    Fields.nlp, module_name=Preprocessors.bert_seq_cls_tokenizer)
 class SequenceClassificationPreprocessor(Preprocessor):
@@ -197,155 +104,154 @@ class SequenceClassificationPreprocessor(Preprocessor):
        return rst


-@PREPROCESSORS.register_module(
-    Fields.nlp, module_name='bert-seq-cls-tokenizer-finetune')
-class SentenceSimilarityFinetunePreprocessor(SentenceSimilarityPreprocessor):
-    """Sentence similarity preprocessor in the finetune scenario
+class NLPTokenizerPreprocessorBase(Preprocessor):

-    Mainly added the label mapping procedure.
-    """
+    def __init__(self, model_dir: str, pair: bool, mode: str, **kwargs):
+        """preprocess the data via the vocab.txt from the `model_dir` path

-    def __init__(self, model_dir: str, *args, **kwargs):
-        kwargs['padding'] = 'max_length'
-        super().__init__(model_dir, *args, **kwargs)
+        Args:
+            model_dir (str): model path
+        """

+        super().__init__(**kwargs)
+        self.model_dir: str = model_dir
+        self.first_sequence: str = kwargs.pop('first_sequence',
+                                              'first_sequence')
+        self.second_sequence = kwargs.pop('second_sequence', 'second_sequence')
+        self.pair = pair
+        self._mode = mode
+        self.label = kwargs.pop('label', OutputKeys.LABEL)
+        self.label2id = None
+        if 'label2id' in kwargs:
+            self.label2id = kwargs.pop('label2id')
+        if self.label2id is None:
+            self.label2id = parse_label_mapping(self.model_dir)

-@PREPROCESSORS.register_module(
-    Fields.nlp, module_name=Preprocessors.text_gen_tokenizer)
-class TextGenerationPreprocessor(NLPPreprocessorBase):
+        self.tokenize_kwargs = kwargs
+        self.tokenizer = self.build_tokenizer(model_dir)

-    def __init__(self, model_dir: str, tokenizer=None, *args, **kwargs):
-        self.tokenizer = self.build_tokenizer(
-            model_dir) if tokenizer is None else tokenizer
-        kwargs['truncation'] = True
-        kwargs['padding'] = True
-        kwargs['return_tensors'] = 'pt'
-        kwargs['return_token_type_ids'] = False
-        kwargs['max_length'] = kwargs.pop('sequence_length', 128)
-        super().__init__(model_dir, *args, **kwargs)
-
-    @staticmethod
-    def get_roberta_tokenizer_dir(model_dir: str) -> Optional[str]:
-        import os
-        for name in os.listdir(model_dir):
-            full_name = os.path.join(model_dir, name)
-            if 'roberta' in name and os.path.isdir(full_name):
-                return full_name
-
-    def build_tokenizer(self, model_dir: str):
-        roberta_tokenizer_dir = self.get_roberta_tokenizer_dir(model_dir)
-        if roberta_tokenizer_dir:
-            from transformers import RobertaTokenizer
-            return RobertaTokenizer.from_pretrained(
-                roberta_tokenizer_dir, do_lower_case=False)
-        return super().build_tokenizer(model_dir)
-
-
-@PREPROCESSORS.register_module(
-    Fields.nlp, module_name='palm-text-gen-tokenizer-finetune')
-class TextGenerationFinetunePreprocessor(TextGenerationPreprocessor):
-
-    @type_assert(object, dict)
-    def __call__(self, data: dict) -> Dict[str, Any]:
-        src_txt = data['src_txt']
-        tgt_txt = data['tgt_txt']
-        src_rst = super().__call__(src_txt)
-        tgt_rst = super().__call__(tgt_txt)
-        src_rst = {k: v.squeeze() for k, v in src_rst.items()}
-        tgt_rst = {k: v.squeeze() for k, v in tgt_rst.items()}
-
-        return {
-            'src': src_rst['input_ids'],
-            'tgt': tgt_rst['input_ids'],
-            'mask_src': src_rst['attention_mask']
-        }
-
-
-@PREPROCESSORS.register_module(Fields.nlp)
-class FillMaskPreprocessor(NLPPreprocessorBase):
-
-    def __init__(self, model_dir: str, *args, **kwargs):
-        kwargs['truncation'] = True
-        kwargs['padding'] = 'max_length'
-        kwargs['return_tensors'] = 'pt'
-        kwargs['max_length'] = kwargs.pop('sequence_length', 128)
-        kwargs['return_token_type_ids'] = True
-        super().__init__(model_dir, *args, **kwargs)
+    @property
+    def id2label(self):
+        if self.label2id is not None:
+            return {id: label for label, id in self.label2id.items()}
+        return None

    def build_tokenizer(self, model_dir):
-        from modelscope.utils.hub import get_model_type
        model_type = get_model_type(model_dir)
-        if model_type in ['sbert', 'structbert', 'bert']:
-            from sofa import SbertTokenizer
-            return SbertTokenizer.from_pretrained(model_dir, use_fast=False)
-        elif model_type == 'veco':
-            from sofa import VecoTokenizer
-            return VecoTokenizer.from_pretrained(model_dir, use_fast=False)
+        if model_type in (Models.structbert, Models.gpt3, Models.palm):
+            from modelscope.models.nlp.structbert import SbertTokenizerFast
+            return SbertTokenizerFast.from_pretrained(model_dir)
+        elif model_type == Models.veco:
+            from modelscope.models.nlp.veco import VecoTokenizerFast
+            return VecoTokenizerFast.from_pretrained(model_dir)
        else:
-            # TODO Only support veco & sbert
-            raise RuntimeError(f'Unsupported model type: {model_type}')
+            return AutoTokenizer.from_pretrained(model_dir)

-
-@PREPROCESSORS.register_module(
-    Fields.nlp, module_name=Preprocessors.token_cls_tokenizer)
-class TokenClassificationPreprocessor(NLPPreprocessorBase):
-
-    def __init__(self, model_dir: str, *args, **kwargs):
-        super().__init__(model_dir, *args, **kwargs)
-
-    @type_assert(object, str)
-    def __call__(self, data: Union[str, Dict]) -> Dict[str, Any]:
+    def __call__(self, data: Union[str, Tuple, Dict]) -> Dict[str, Any]:
        """process the raw input data

        Args:
-            data (str): a sentence
-                Example:
-                    'you are so handsome.'
-
+            data (tuple): [sentence1, sentence2]
+                sentence1 (str): a sentence
+                    Example:
+                        'you are so handsome.'
+                sentence2 (str): a sentence
+                    Example:
+                        'you are so beautiful.'
        Returns:
            Dict[str, Any]: the preprocessed data
        """

-        # preprocess the data for the model input
-        if isinstance(data, dict):
-            data = data[self.first_sequence]
-        text = data.replace(' ', '').strip()
-        tokens = []
-        for token in text:
-            token = self.tokenizer.tokenize(token)
-            tokens.extend(token)
-        input_ids = self.tokenizer.convert_tokens_to_ids(tokens)
-        input_ids = self.tokenizer.build_inputs_with_special_tokens(input_ids)
-        attention_mask = [1] * len(input_ids)
-        token_type_ids = [0] * len(input_ids)
-        return {
-            'text': text,
-            'input_ids': input_ids,
-            'attention_mask': attention_mask,
-            'token_type_ids': token_type_ids
-        }
+        text_a, text_b, labels = self.parse_text_and_label(data)
+        output = self.tokenizer(
+            text_a,
+            text_b,
+            return_tensors='pt' if self._mode == ModeKeys.INFERENCE else None,
+            **self.tokenize_kwargs)
+        self.labels_to_id(labels, output)
+        return output
+
+    def parse_text_and_label(self, data):
+        text_a, text_b, labels = None, None, None
+        if isinstance(data, str):
+            text_a = data
+        elif isinstance(data, tuple) or isinstance(data, list):
+            if len(data) == 3:
+                text_a, text_b, labels = data
+            elif len(data) == 2:
+                if self.pair:
+                    text_a, text_b = data
+                else:
+                    text_a, labels = data
+        elif isinstance(data, dict):
+            text_a = data.get(self.first_sequence)
+            text_b = data.get(self.second_sequence)
+            labels = data.get(self.label)
+
+        return text_a, text_b, labels
+
+    def labels_to_id(self, labels, output):
+
+        def label_can_be_mapped(label):
+            return isinstance(label, str) or isinstance(label, int)
+
+        if labels is not None:
+            if isinstance(labels, Iterable) and all([label_can_be_mapped(label) for label in labels]) \
+                    and self.label2id is not None:
+                output[OutputKeys.LABEL] = [
+                    self.label2id[str(label)] for label in labels
+                ]
+            elif label_can_be_mapped(labels) and self.label2id is not None:
+                output[OutputKeys.LABEL] = self.label2id[str(labels)]
+            else:
+                output[OutputKeys.LABEL] = labels
+
+
+@PREPROCESSORS.register_module(
+    Fields.nlp, module_name=Preprocessors.nli_tokenizer)
+@PREPROCESSORS.register_module(
+    Fields.nlp, module_name=Preprocessors.sen_sim_tokenizer)
+class PairSentenceClassificationPreprocessor(NLPTokenizerPreprocessorBase):
+
+    def __init__(self, model_dir: str, mode=ModeKeys.INFERENCE, **kwargs):
+        kwargs['truncation'] = kwargs.get('truncation', True)
+        kwargs['padding'] = kwargs.get(
+            'padding', False if mode == 'inference' else 'max_length')
+        kwargs['max_length'] = kwargs.pop('sequence_length', 128)
+        super().__init__(model_dir, pair=True, mode=mode, **kwargs)
+
+
+@PREPROCESSORS.register_module(
+    Fields.nlp, module_name=Preprocessors.sen_cls_tokenizer)
+class SingleSentenceClassificationPreprocessor(NLPTokenizerPreprocessorBase):
+
+    def __init__(self, model_dir: str, mode=ModeKeys.INFERENCE, **kwargs):
+        kwargs['truncation'] = kwargs.get('truncation', True)
+        kwargs['padding'] = kwargs.get(
+            'padding', False if mode == 'inference' else 'max_length')
+        kwargs['max_length'] = kwargs.pop('sequence_length', 128)
+        super().__init__(model_dir, pair=False, mode=mode, **kwargs)


@PREPROCESSORS.register_module(
    Fields.nlp, module_name=Preprocessors.zero_shot_cls_tokenizer)
-class ZeroShotClassificationPreprocessor(NLPPreprocessorBase):
+class ZeroShotClassificationPreprocessor(NLPTokenizerPreprocessorBase):

-    def __init__(self, model_dir: str, *args, **kwargs):
+    def __init__(self, model_dir: str, mode=ModeKeys.INFERENCE, **kwargs):
        """preprocess the data via the vocab.txt from the `model_dir` path

        Args:
            model_dir (str): model path
        """
        self.sequence_length = kwargs.pop('sequence_length', 512)
-        super().__init__(model_dir, *args, **kwargs)
+        super().__init__(model_dir, pair=False, mode=mode, **kwargs)

-    @type_assert(object, str)
-    def __call__(self, data, hypothesis_template: str,
+    def __call__(self, data: Union[str, Dict], hypothesis_template: str,
                 candidate_labels: list) -> Dict[str, Any]:
        """process the raw input data

        Args:
-            data (str): a sentence
+            data (str or dict): a sentence
                Example:
                    'you are so handsome.'

@@ -363,11 +269,203 @@ class ZeroShotClassificationPreprocessor(NLPPreprocessorBase):
            padding=True,
            truncation=True,
            max_length=self.sequence_length,
-            return_tensors='pt',
-            truncation_strategy='only_first')
+            truncation_strategy='only_first',
+            return_tensors='pt' if self._mode == ModeKeys.INFERENCE else None)
        return features


+@PREPROCESSORS.register_module(
+    Fields.nlp, module_name=Preprocessors.text_gen_tokenizer)
+class TextGenerationPreprocessor(NLPTokenizerPreprocessorBase):
+
+    def __init__(self,
+                 model_dir: str,
+                 tokenizer=None,
+                 mode=ModeKeys.INFERENCE,
+                 **kwargs):
+        self.tokenizer = self.build_tokenizer(
+            model_dir) if tokenizer is None else tokenizer
+        kwargs['truncation'] = kwargs.get('truncation', True)
+        kwargs['padding'] = kwargs.get('padding', True)
+        kwargs['return_token_type_ids'] = kwargs.get('return_token_type_ids',
+                                                     False)
+        kwargs['max_length'] = kwargs.pop('sequence_length', 128)
+        super().__init__(model_dir, pair=False, mode=mode, **kwargs)
+
+    @staticmethod
+    def get_roberta_tokenizer_dir(model_dir: str) -> Optional[str]:
+        import os
+        for name in os.listdir(model_dir):
+            full_name = os.path.join(model_dir, name)
+            if 'roberta' in name and os.path.isdir(full_name):
+                return full_name
+
+    def build_tokenizer(self, model_dir: str):
+        roberta_tokenizer_dir = self.get_roberta_tokenizer_dir(model_dir)
+        if roberta_tokenizer_dir:
+            from transformers import RobertaTokenizer
+            return RobertaTokenizer.from_pretrained(
+                roberta_tokenizer_dir, do_lower_case=False)
+        return super().build_tokenizer(model_dir)
+
+    def __call__(self, data: Union[Dict, str]) -> Dict[str, Any]:
+        if self._mode == 'inference':
+            return super().__call__(data)
+        src_txt = data['src_txt']
+        tgt_txt = data['tgt_txt']
+        src_rst = super().__call__(src_txt)
+        tgt_rst = super().__call__(tgt_txt)
+
+        return {
+            'src': src_rst['input_ids'],
+            'tgt': tgt_rst['input_ids'],
+            'mask_src': src_rst['attention_mask']
+        }
+
+
+@PREPROCESSORS.register_module(Fields.nlp, module_name=Preprocessors.fill_mask)
+class FillMaskPreprocessor(NLPTokenizerPreprocessorBase):
+
+    def __init__(self, model_dir: str, mode=ModeKeys.INFERENCE, **kwargs):
+        kwargs['truncation'] = kwargs.get('truncation', True)
+        kwargs['padding'] = kwargs.get('padding', 'max_length')
+        kwargs['max_length'] = kwargs.pop('sequence_length', 128)
+        kwargs['return_token_type_ids'] = kwargs.get('return_token_type_ids',
+                                                     True)
+        super().__init__(model_dir, pair=False, mode=mode, **kwargs)
+
+
+@PREPROCESSORS.register_module(
+    Fields.nlp,
+    module_name=Preprocessors.word_segment_text_to_label_preprocessor)
+class WordSegmentationBlankSetToLabelPreprocessor(Preprocessor):
+
+    def __init__(self, **kwargs):
+        super().__init__(**kwargs)
+        self.first_sequence: str = kwargs.pop('first_sequence',
+                                              'first_sequence')
+        self.label = kwargs.pop('label', OutputKeys.LABELS)
+
+    def __call__(self, data: str) -> Union[Dict[str, Any], Tuple]:
+        data = data.split(' ')
+        data = list(filter(lambda x: len(x) > 0, data))
+
+        def produce_train_sample(words):
+            chars = []
+            labels = []
+            for word in words:
+                chars.extend(list(word))
+                if len(word) == 1:
+                    labels.append('S-CWS')
+                else:
+                    labels.extend(['B-CWS'] + ['I-CWS'] * (len(word) - 2)
+                                  + ['E-CWS'])
+            assert len(chars) == len(labels)
+            return chars, labels
+
+        chars, labels = produce_train_sample(data)
+        return {
+            self.first_sequence: chars,
+            self.label: labels,
+        }
+
+
+@PREPROCESSORS.register_module(
+    Fields.nlp, module_name=Preprocessors.token_cls_tokenizer)
+class TokenClassificationPreprocessor(NLPTokenizerPreprocessorBase):
+
+    def __init__(self, model_dir: str, mode=ModeKeys.INFERENCE, **kwargs):
+        kwargs['truncation'] = kwargs.get('truncation', True)
+        kwargs['padding'] = kwargs.get(
+            'padding', False if mode == ModeKeys.INFERENCE else 'max_length')
+        kwargs['max_length'] = kwargs.pop('sequence_length', 128)
+        kwargs['is_split_into_words'] = kwargs.pop(
+            'is_split_into_words',
+            False if mode == ModeKeys.INFERENCE else True)
+        self.label_all_tokens = kwargs.pop('label_all_tokens', False)
+        super().__init__(model_dir, pair=False, mode=mode, **kwargs)
+
+    def __call__(self, data: Union[str, Dict]) -> Dict[str, Any]:
+        """process the raw input data
+
+        Args:
+            data (str): a sentence
+                Example:
+                    'you are so handsome.'
+
+        Returns:
+            Dict[str, Any]: the preprocessed data
+        """
+
+        # preprocess the data for the model input
+        # if isinstance(data, dict):
+        #     data = data[self.first_sequence]
+        # text = data.replace(' ', '').strip()
+        # tokens = []
+        # for token in text:
+        #     token = self.tokenizer.tokenize(token)
+        #     tokens.extend(token)
+        # input_ids = self.tokenizer.convert_tokens_to_ids(tokens)
+        # input_ids = self.tokenizer.build_inputs_with_special_tokens(input_ids)
+        # attention_mask = [1] * len(input_ids)
+        # token_type_ids = [0] * len(input_ids)
+
+        # new code to deal with labels
+        # tokenized_inputs = self.tokenizer(data, truncation=True, is_split_into_words=True)
+
+        text_a = None
+        labels_list = None
+        if isinstance(data, str):
+            text_a = data
+        elif isinstance(data, dict):
+            text_a = data.get(self.first_sequence)
+            labels_list = data.get(self.label)
+        tokenized_inputs = self.tokenizer(
+            text_a,
+            return_tensors='pt' if self._mode == ModeKeys.INFERENCE else None,
+            **self.tokenize_kwargs)
+
+        if labels_list is not None:
+            assert self.label2id is not None
+            # Map that sends B-Xxx label to its I-Xxx counterpart
+            b_to_i_label = []
+            label_enumerate_values = [
+                k for k, v in sorted(
+                    self.label2id.items(), key=lambda item: item[1])
+            ]
+            for idx, label in enumerate(label_enumerate_values):
+                if label.startswith('B-') and label.replace(
+                        'B-', 'I-') in label_enumerate_values:
+                    b_to_i_label.append(
+                        label_enumerate_values.index(
+                            label.replace('B-', 'I-')))
+                else:
+                    b_to_i_label.append(idx)
+
+            label_row = [self.label2id[lb] for lb in labels_list]
+            word_ids = tokenized_inputs.word_ids()
+            previous_word_idx = None
+            label_ids = []
+            for word_idx in word_ids:
+                if word_idx is None:
+                    label_ids.append(-100)
+                elif word_idx != previous_word_idx:
+                    label_ids.append(label_row[word_idx])
+                else:
+                    if self.label_all_tokens:
+                        label_ids.append(b_to_i_label[label_row[word_idx]])
+                    else:
+                        label_ids.append(-100)
+                previous_word_idx = word_idx
+            labels = label_ids
+            tokenized_inputs['labels'] = labels
+            # new code end
+
+        if self._mode == ModeKeys.INFERENCE:
+            tokenized_inputs[OutputKeys.TEXT] = text_a
+        return tokenized_inputs
+
+
@PREPROCESSORS.register_module(
    Fields.nlp, module_name=Preprocessors.ner_tokenizer)
 class NERPreprocessor(Preprocessor):
--- a/modelscope/preprocessors/space/dialog_state_tracking_preprocessor.py
+++ b/modelscope/preprocessors/space/dialog_state_tracking_preprocessor.py
@@ -24,7 +24,7 @@ class DialogStateTrackingPreprocessor(Preprocessor):
        """
        super().__init__(*args, **kwargs)

-        from sofa.models.space import SpaceConfig, SpaceTokenizer
+        from modelscope.models.nlp.space import SpaceConfig, SpaceTokenizer
        self.model_dir: str = model_dir
        self.config = SpaceConfig.from_pretrained(self.model_dir)
        self.tokenizer = SpaceTokenizer.from_pretrained(self.model_dir)
--- a/modelscope/task_datasets/init.py
+++ b/modelscope/task_datasets/init.py
@@ -7,12 +7,14 @@ if TYPE_CHECKING:
    from .base import TaskDataset
    from .builder import TASK_DATASETS, build_task_dataset
    from .torch_base_dataset import TorchTaskDataset
+    from .veco_dataset import VecoDataset

 else:
    _import_structure = {
        'base': ['TaskDataset'],
        'builder': ['TASK_DATASETS', 'build_task_dataset'],
        'torch_base_dataset': ['TorchTaskDataset'],
+        'veco_dataset': ['VecoDataset'],
    }
    import sys

--- a/modelscope/task_datasets/base.py
+++ b/modelscope/task_datasets/base.py
@@ -1,6 +1,6 @@
 # Copyright (c) Alibaba, Inc. and its affiliates.
 from abc import ABC, abstractmethod
-from typing import Any, List, Tuple
+from typing import Any, List, Tuple, Union


 class TaskDataset(ABC):
@@ -8,7 +8,7 @@ class TaskDataset(ABC):
    """

    def __init__(self,
-                 datasets: Tuple[Any, List[Any]],
+                 datasets: Union[Any, List[Any]],
                 mode,
                 preprocessor=None,
                 **kwargs):
@@ -18,7 +18,7 @@ class TaskDataset(ABC):
        self._inner_dataset = self.prepare_dataset(datasets)

    @abstractmethod
-    def prepare_dataset(self, datasets: Tuple[Any, List[Any]]) -> Any:
+    def prepare_dataset(self, datasets: Union[Any, List[Any]]) -> Any:
        """Prepare a dataset.

        User can process the input datasets in a whole dataset perspective.
--- a/modelscope/task_datasets/torch_base_dataset.py
+++ b/modelscope/task_datasets/torch_base_dataset.py
@@ -1,5 +1,5 @@
 # Copyright (c) Alibaba, Inc. and its affiliates.
-from typing import Any, List, Tuple
+from typing import Any, List, Tuple, Union

 from torch.utils.data import ConcatDataset, Dataset

@@ -14,7 +14,7 @@ class TorchTaskDataset(TaskDataset, Dataset):
    """

    def __init__(self,
-                 datasets: Tuple[Any, List[Any]],
+                 datasets: Union[Any, List[Any]],
                 mode,
                 preprocessor=None,
                 **kwargs):
@@ -26,7 +26,7 @@ class TorchTaskDataset(TaskDataset, Dataset):
    def __len__(self):
        return len(self._inner_dataset)

-    def prepare_dataset(self, datasets: Tuple[Any, List[Any]]) -> Any:
+    def prepare_dataset(self, datasets: Union[Any, List[Any]]) -> Any:
        """Prepare a dataset.

        User can process the input datasets in a whole dataset perspective.
--- a/modelscope/task_datasets/veco_dataset.py
+++ b/modelscope/task_datasets/veco_dataset.py
@@ -0,0 +1,76 @@
+# Copyright (c) Alibaba, Inc. and its affiliates.
+from typing import Any, List, Union
+
+import numpy as np
+from datasets import Dataset, IterableDataset, concatenate_datasets
+
+from modelscope.metainfo import Models
+from modelscope.utils.constant import Tasks
+from .builder import TASK_DATASETS
+from .torch_base_dataset import TorchTaskDataset
+
+
+@TASK_DATASETS.register_module(module_name=Models.veco, group_key=Tasks.nli)
+class VecoDataset(TorchTaskDataset):
+
+    def __init__(self,
+                 datasets: Union[Any, List[Any]],
+                 mode,
+                 preprocessor=None,
+                 **kwargs):
+        self.seed = kwargs.get('seed', 42)
+        self.permutation = None
+        self.datasets = None
+        super().__init__(datasets, mode, preprocessor, **kwargs)
+
+    def switch_dataset(self, idx):
+        """Switch dataset in evaluation.
+
+        Veco evaluates dataset one by one.
+
+        Args:
+            idx: The index of the dataset
+        """
+        if self.mode == 'train':
+            raise ValueError(
+                'Only support switch dataset in the evaluation loop')
+        if idx >= len(self.datasets):
+            raise ValueError(
+                'Index is bigger than the number of the datasets.')
+        self._inner_dataset = self.datasets[idx]
+
+    def __getitem__(self, item):
+        if self.permutation is not None:
+            item = self.permutation[item]
+        return super().__getitem__(item)
+
+    def prepare_dataset(self, datasets: Union[Any, List[Any]]) -> Any:
+        """Compose all the datasets.
+
+        If the mode is 'train', all datasets will be mixed together, if the mode is 'eval',
+        the datasets will be kept and returns the first one.
+
+        Args:
+            datasets: The datasets to be composed.
+
+        Returns: The final dataset.
+        """
+        if not isinstance(datasets, (list, tuple)):
+            datasets = [datasets]
+        if self.mode == 'train':
+            if len(datasets) == 1:
+                return datasets[0]
+            elif all([
+                    isinstance(dataset, (Dataset, IterableDataset))
+                    for dataset in datasets
+            ]):
+                dataset = concatenate_datasets(list(datasets))
+                return dataset.shuffle(seed=self.seed)
+            else:
+                generator = np.random.default_rng(self.seed)
+                _len = sum([len(dataset) for dataset in datasets])
+                self.permutation = generator.permutation(_len)
+            return super().prepare_dataset(datasets)
+        else:
+            self.datasets = datasets
+            return self.datasets[0]
--- a/modelscope/trainers/init.py
+++ b/modelscope/trainers/init.py
@@ -4,4 +4,5 @@ from .cv import (ImageInstanceSegmentationTrainer,
                 ImagePortraitEnhancementTrainer)
 from .multi_modal import CLIPTrainer
 from .nlp import SequenceClassificationTrainer
+from .nlp_trainer import NlpEpochBasedTrainer, VecoTrainer
 from .trainer import EpochBasedTrainer
--- a/modelscope/trainers/hooks/evaluation_hook.py
+++ b/modelscope/trainers/hooks/evaluation_hook.py
@@ -32,6 +32,7 @@ class EvaluationHook(Hook):
    def do_evaluate(self, trainer):
        """Evaluate the results."""
        eval_res = trainer.evaluate()
+        trainer.data_loader = trainer.train_dataloader
        for name, val in eval_res.items():
            trainer.log_buffer.output[name] = val

--- a/modelscope/trainers/hooks/lr_scheduler_hook.py
+++ b/modelscope/trainers/hooks/lr_scheduler_hook.py
@@ -21,9 +21,6 @@ class LrSchedulerHook(Hook):
    def __init__(self, by_epoch=True, warmup=None) -> None:
        super().__init__()
        self.by_epoch = by_epoch
-        if not self.by_epoch:
-            raise ValueError('We only support ``by_epoch=True`` now!')
-
        self.warmup = warmup
        self.warmup_lr_scheduler = None

@@ -49,6 +46,11 @@ class LrSchedulerHook(Hook):
        return lr

    def before_train_iter(self, trainer):
+        if not self.by_epoch:
+            if self.warmup_lr_scheduler is not None:
+                self.warmup_lr_scheduler.step()
+            else:
+                trainer.lr_scheduler.step()
        trainer.log_buffer.output[LogKeys.LR] = self._get_log_lr(trainer)

    def before_train_epoch(self, trainer):
--- a/modelscope/trainers/nlp_trainer.py
+++ b/modelscope/trainers/nlp_trainer.py
@@ -0,0 +1,192 @@
+import os
+from typing import Callable, Dict, Optional, Tuple, Union
+
+import torch
+from torch import nn
+from torch.utils.data import Dataset
+
+from modelscope.hub.snapshot_download import snapshot_download
+from modelscope.metrics.builder import build_metric
+from modelscope.models.base import Model, TorchModel
+from modelscope.msdatasets import MsDataset
+from modelscope.preprocessors import Preprocessor, build_preprocessor
+from modelscope.utils.config import Config, ConfigDict
+from modelscope.utils.constant import (DEFAULT_MODEL_REVISION, ModeKeys,
+                                       ModelFile, Tasks)
+from .base import TRAINERS
+from .trainer import EpochBasedTrainer
+
+
+@TRAINERS.register_module(module_name='NlpEpochBasedTrainer')
+class NlpEpochBasedTrainer(EpochBasedTrainer):
+
+    def __init__(
+            self,
+            model: Optional[Union[TorchModel, nn.Module, str]] = None,
+            cfg_file: Optional[str] = None,
+            cfg_modify_fn: Optional[Callable] = None,
+            arg_parse_fn: Optional[Callable] = None,
+            data_collator: Optional[Callable] = None,
+            train_dataset: Optional[Union[MsDataset, Dataset]] = None,
+            eval_dataset: Optional[Union[MsDataset, Dataset]] = None,
+            preprocessor: Optional[Preprocessor] = None,
+            optimizers: Tuple[torch.optim.Optimizer,
+                              torch.optim.lr_scheduler._LRScheduler] = (None,
+                                                                        None),
+            model_revision: Optional[str] = DEFAULT_MODEL_REVISION,
+            **kwargs):
+        """Add code to adapt with nlp models.
+
+        Args:
+            cfg_modify_fn: An input fn which is used to modify the cfg read out of the file.
+        """
+
+        if isinstance(model, str):
+            if os.path.exists(model):
+                model_dir = model if os.path.isdir(model) else os.path.dirname(
+                    model)
+            else:
+                model_dir = snapshot_download(model, revision=model_revision)
+            cfg_file = os.path.join(model_dir, ModelFile.CONFIGURATION)
+        else:
+            assert cfg_file is not None, 'Config file should not be None if model is an nn.Module class'
+            model_dir = os.path.dirname(cfg_file)
+
+        self.cfg_modify_fn = cfg_modify_fn
+        self.cfg = self.rebuild_config(Config.from_file(cfg_file))
+        try:
+            labels = self.cfg.dataset.train.labels
+        except AttributeError:
+            labels = None
+
+        self.label2id = None
+        self.num_labels = None
+        if labels is not None and len(labels) > 0:
+            self.label2id = {label: idx for idx, label in enumerate(labels)}
+            self.id2label = {idx: label for idx, label in enumerate(labels)}
+            self.num_labels = len(labels)
+
+        def build_dataset_keys(cfg):
+            if cfg is not None:
+                input_keys = {
+                    'first_sequence': getattr(cfg, 'first_sequence', None),
+                    'second_sequence': getattr(cfg, 'second_sequence', None),
+                    'label': getattr(cfg, 'label', None),
+                }
+            else:
+                input_keys = {}
+
+            return {k: v for k, v in input_keys.items() if v is not None}
+
+        self.train_keys = build_dataset_keys(
+            self.cfg.dataset.train if hasattr(self.cfg, 'dataset')
+            and hasattr(self.cfg.dataset, 'train') else None)
+        # TODO eval may has special keys, which is now not supported.
+        # because there is only one preprocessor in the trainer, and it only supports one group of keys.
+        self.eval_keys = self.train_keys
+
+        super().__init__(
+            model=model_dir,
+            cfg_file=cfg_file,
+            arg_parse_fn=arg_parse_fn,
+            data_collator=data_collator,
+            preprocessor=preprocessor,
+            optimizers=optimizers,
+            model_revision=model_revision,
+            train_dataset=train_dataset,
+            eval_dataset=eval_dataset,
+            **kwargs)
+
+    def rebuild_config(self, cfg: Config):
+        if self.cfg_modify_fn is not None:
+            return self.cfg_modify_fn(cfg)
+        return cfg
+
+    def build_model(self) -> Union[nn.Module, TorchModel]:
+        """ Instantiate a pytorch model and return.
+
+        By default, we will create a model using config from configuration file. You can
+        override this method in a subclass.
+
+        """
+        model_args = {} if self.num_labels is None else {
+            'num_labels': self.num_labels
+        }
+        model = Model.from_pretrained(
+            self.model_dir, cfg_dict=self.cfg, **model_args)
+        if not isinstance(model, nn.Module) and hasattr(model, 'model'):
+            return model.model
+        elif isinstance(model, nn.Module):
+            return model
+
+    def build_preprocessor(self) -> Preprocessor:
+        """Build the preprocessor.
+
+        User can override this method to implement custom logits.
+
+        Returns: The preprocessor instance.
+
+        """
+        model_args = {} if self.label2id is None else {
+            'label2id': self.label2id
+        }
+        cfg = ConfigDict({
+            **getattr(self.cfg, 'preprocessor'),
+            'model_dir':
+            self.model_dir,
+            **model_args,
+            'mode':
+            ModeKeys.TRAIN,
+            **self.train_keys,
+        })
+        return build_preprocessor(cfg, Tasks.find_field_by_task(self.cfg.task))
+
+
+@TRAINERS.register_module(module_name='VecoTrainer')
+class VecoTrainer(NlpEpochBasedTrainer):
+
+    def evaluate(self, checkpoint_path=None):
+        """Veco evaluates the datasets one by one.
+
+        """
+        from modelscope.task_datasets import VecoDataset
+        self.model.eval()
+        self._mode = ModeKeys.EVAL
+        metric_values = {}
+
+        if self.eval_dataset is None:
+            val_data = self.cfg.dataset.val
+            self.eval_dataset = self.build_dataset(
+                val_data, mode=ModeKeys.EVAL)
+
+        idx = 0
+        dataset_cnt = 1
+        if isinstance(self.eval_dataset, VecoDataset):
+            self.eval_dataset.switch_dataset(idx)
+            dataset_cnt = len(self.eval_dataset.datasets)
+
+        while True:
+            self.eval_dataloader = self._build_dataloader_with_dataset(
+                self.eval_dataset, **self.cfg.evaluation.get('dataloader', {}))
+            self.data_loader = self.eval_dataloader
+
+            metric_classes = [
+                build_metric(metric, default_args={'trainer': self})
+                for metric in self.metrics
+            ]
+            self.evaluation_loop(self.eval_dataloader, checkpoint_path,
+                                 metric_classes)
+
+            for m_idx, metric_cls in enumerate(metric_classes):
+                if f'eval_dataset[{idx}]' not in metric_values:
+                    metric_values[f'eval_dataset[{idx}]'] = {}
+                metric_values[f'eval_dataset[{idx}]'][
+                    self.metrics[m_idx]] = metric_cls.evaluate()
+
+            idx += 1
+            if idx < dataset_cnt:
+                self.eval_dataset.switch_dataset(idx)
+            else:
+                break
+
+        return metric_values
--- a/modelscope/trainers/trainer.py
+++ b/modelscope/trainers/trainer.py
@@ -22,7 +22,8 @@ from modelscope.models.base import Model, TorchModel
 from modelscope.msdatasets.ms_dataset import MsDataset
 from modelscope.preprocessors import build_preprocessor
 from modelscope.preprocessors.base import Preprocessor
-from modelscope.task_datasets import TorchTaskDataset, build_task_dataset
+from modelscope.task_datasets.builder import build_task_dataset
+from modelscope.task_datasets.torch_base_dataset import TorchTaskDataset
 from modelscope.trainers.hooks.builder import HOOKS
 from modelscope.trainers.hooks.priority import Priority, get_priority
 from modelscope.trainers.lrscheduler.builder import build_lr_scheduler
@@ -30,12 +31,12 @@ from modelscope.trainers.optimizer.builder import build_optimizer
 from modelscope.utils.config import Config, ConfigDict
 from modelscope.utils.constant import (DEFAULT_MODEL_REVISION, Hubs, ModeKeys,
                                       ModelFile, Tasks, TrainerStages)
+from modelscope.utils.file_utils import func_receive_dict_inputs
 from modelscope.utils.logger import get_logger
 from modelscope.utils.registry import build_from_cfg
 from modelscope.utils.tensor_utils import torch_default_data_collator
 from modelscope.utils.torch_utils import (broadcast, create_device,
                                          get_dist_info, init_dist)
-from modelscope.utils.utils import if_func_receive_dict_inputs
 from .base import BaseTrainer
 from .builder import TRAINERS
 from .default_config import DEFAULT_CONFIG
@@ -87,6 +88,7 @@ class EpochBasedTrainer(BaseTrainer):
                                                                        None),
            model_revision: Optional[str] = DEFAULT_MODEL_REVISION,
            **kwargs):
+
        if isinstance(model, str):
            if os.path.exists(model):
                self.model_dir = model if os.path.isdir(
@@ -108,9 +110,9 @@ class EpochBasedTrainer(BaseTrainer):
            self.model = model

        super().__init__(cfg_file, arg_parse_fn)
-
        # add default config
        self.cfg.merge_from_dict(self._get_default_config(), force=False)
+        self.cfg = self.rebuild_config(self.cfg)

        if 'work_dir' in kwargs:
            self.work_dir = kwargs['work_dir']
@@ -130,9 +132,9 @@ class EpochBasedTrainer(BaseTrainer):
        self.device = create_device(device_name == 'cpu')

        self.train_dataset = self.to_task_dataset(
-            train_dataset, mode='train', preprocessor=self.preprocessor)
+            train_dataset, mode=ModeKeys.TRAIN, preprocessor=self.preprocessor)
        self.eval_dataset = self.to_task_dataset(
-            eval_dataset, mode='eval', preprocessor=self.preprocessor)
+            eval_dataset, mode=ModeKeys.EVAL, preprocessor=self.preprocessor)

        self.data_collator = data_collator if data_collator is not None else torch_default_data_collator
        self.metrics = self.get_metrics()
@@ -168,6 +170,14 @@ class EpochBasedTrainer(BaseTrainer):
            if not is_parallel(self.model) and self._dist:
                self.model = self.to_parallel(self.model)

+    def rebuild_config(self, cfg: Config):
+        """A method used to rebuild the config, any subclass can override this method.
+
+        Returns: The rebuilt config
+
+        """
+        return cfg
+
    @property
    def mode(self):
        return self._mode
@@ -203,7 +213,7 @@ class EpochBasedTrainer(BaseTrainer):
        return self._max_epochs * len(self.data_loader)

    def to_task_dataset(self,
-                        datasets: Tuple[Dataset, List[Dataset]],
+                        datasets: Union[Dataset, List[Dataset]],
                        mode: str,
                        preprocessor: Optional[Preprocessor] = None):
        """Build the task specific dataset processor for this trainer.
@@ -229,17 +239,13 @@ class EpochBasedTrainer(BaseTrainer):
                cfg = ConfigDict(
                    type=self.cfg.task, mode=mode, datasets=datasets)
                return build_task_dataset(cfg, self.cfg.task)
-            elif isinstance(datasets,
-                            Dataset) or (isinstance(datasets, List)
-                                         and isinstance(datasets[0], Dataset)):
-                cfg = ConfigDict(
-                    type=self.cfg.model.type, mode=mode, datasets=datasets)
-                return build_task_dataset(cfg, self.cfg.task)
            else:
-                raise ValueError(
-                    f'invalid datasets type: {type(datasets)}, '
-                    f'expected  `MsDataset`, `torch.utils.data.Dataset` or list of them.'
-                )
+                cfg = ConfigDict(
+                    type=self.cfg.model.type,
+                    mode=mode,
+                    datasets=datasets,
+                    preprocessor=preprocessor)
+                return build_task_dataset(cfg, self.cfg.task)
        except Exception:
            if isinstance(datasets, (List, Tuple)) or preprocessor is not None:
                return TorchTaskDataset(
@@ -262,8 +268,11 @@ class EpochBasedTrainer(BaseTrainer):
        # TODO @wenmeng.zwm @jiangnana.jnn add support for different preprocessor
        # when they are different ones in training and evaluation
        cfg = ConfigDict({
-            **getattr(self.cfg, 'preprocessor'), 'model_dir':
-            self.model_dir
+            **getattr(self.cfg, 'preprocessor'),
+            'model_dir':
+            self.model_dir,
+            'mode':
+            ModeKeys.TRAIN,
        })
        return build_preprocessor(cfg, Tasks.find_field_by_task(self.cfg.task))

@@ -324,6 +333,8 @@ class EpochBasedTrainer(BaseTrainer):
                **self.cfg.evaluation.get('dataloader', {}))
        self.data_loader = self.eval_dataloader
        metric_classes = [build_metric(metric) for metric in self.metrics]
+        for m in metric_classes:
+            m.trainer = self
        metric_values = self.evaluation_loop(self.eval_dataloader,
                                             checkpoint_path, metric_classes)

@@ -338,10 +349,9 @@ class EpochBasedTrainer(BaseTrainer):
        """ Instantiate a pytorch model and return.

        By default, we will create a model using config from configuration file. You can
-        subclass and override this method in a subclass.
+        override this method in a subclass.

        """
-        # TODO temp implementation, waiting for @zhangzhicheng
        model = Model.from_pretrained(self.model_dir)
        if not isinstance(model, nn.Module) and hasattr(model, 'model'):
            return model.model
@@ -412,9 +422,8 @@ class EpochBasedTrainer(BaseTrainer):
        self._mode = ModeKeys.TRAIN
        inputs = self.collate_fn(inputs)
        # call model forward but not __call__ to skip postprocess
-        if isinstance(
-                inputs,
-                Mapping) and not if_func_receive_dict_inputs(model.forward):
+        if isinstance(inputs,
+                      Mapping) and not func_receive_dict_inputs(model.forward):
            train_outputs = model.forward(**inputs)
        else:
            train_outputs = model.forward(inputs)
@@ -495,7 +504,7 @@ class EpochBasedTrainer(BaseTrainer):
        if self.eval_dataset is None:
            val_data = self.cfg.dataset.val
            self.eval_dataset = self.build_dataset(
-                val_data, mode=ModeKeys.TRAIN)
+                val_data, mode=ModeKeys.EVAL)

        batch_size = self.cfg.evaluation.batch_size
        workers = self.cfg.evaluation.workers
@@ -523,7 +532,8 @@ class EpochBasedTrainer(BaseTrainer):
        )
        torch_dataset = dataset.to_torch_dataset(
            preprocessors=self.preprocessor, )
-        return torch_dataset
+        dataset = self.to_task_dataset(torch_dataset, mode)
+        return dataset

    def create_optimizer_and_scheduler(self):
        """ Create optimizer and lr scheduler
--- a/modelscope/trainers/utils/inference.py
+++ b/modelscope/trainers/utils/inference.py
@@ -10,9 +10,9 @@ import torch
 from torch import distributed as dist
 from tqdm import tqdm

+from modelscope.utils.file_utils import func_receive_dict_inputs
 from modelscope.utils.torch_utils import (broadcast, get_dist_info, is_master,
                                          make_tmp_dir)
-from modelscope.utils.utils import if_func_receive_dict_inputs


 def single_gpu_test(model,
@@ -37,18 +37,19 @@ def single_gpu_test(model,
            if data_collate_fn is not None:
                data = data_collate_fn(data)
            with torch.no_grad():
-                if isinstance(data,
-                              Mapping) and not if_func_receive_dict_inputs(
-                                  model.forward):
-
-                    result = model(**data)
+                if isinstance(data, Mapping) and not func_receive_dict_inputs(
+                        model.forward):
+                    result = model.forward(**data)
                else:
-                    result = model(data)
+                    result = model.forward(data)
            if metric_classes is not None:
                for metric_cls in metric_classes:
                    metric_cls.add(result, data)

-            batch_size = len(result)
+            if isinstance(data, dict):
+                batch_size = len(next(iter(data.values())))
+            else:
+                batch_size = len(data)
            for _ in range(batch_size):
                pbar.update()

@@ -101,16 +102,18 @@ def multi_gpu_test(model,
                data = data_collate_fn(data)
            data_list.append(data)
            with torch.no_grad():
-                if isinstance(data,
-                              Mapping) and not if_func_receive_dict_inputs(
-                                  model.forward):
-                    result = model(**data)
+                if isinstance(data, Mapping) and not func_receive_dict_inputs(
+                        model.forward):
+                    result = model.forward(**data)
                else:
-                    result = model(data)
+                    result = model.forward(data)
            results.append(result)

            if rank == 0:
-                batch_size = len(result)
+                if isinstance(data, dict):
+                    batch_size = len(next(iter(data.values())))
+                else:
+                    batch_size = len(data)
                batch_size_all = batch_size * world_size
                count += batch_size_all
                if count > len(dataset):
--- a/Show More
+++ b/Show More