merge master

2025-12-16 08:17:45 +01:00 · 2023-02-19 19:15:22 +08:00
parent c8fc021824 350e76d296
commit f5ba729a1e
13 changed files with 552 additions and 16 deletions
--- a/README.md
+++ b/README.md
@@ -30,11 +30,151 @@

 # Introduction

-[ModelScope]( https://www.modelscope.cn) is a “Model-as-a-Service” (MaaS) platform that seeks to bring together most advanced machine learning models from the AI community, and to streamline the process of leveraging AI models in real applications. The core ModelScope library enables developers to perform inference, training and evaluation, through rich layers of API designs that facilitate a unified experience across state-of-the-art models from different AI domains.
+[ModelScope]( https://www.modelscope.cn) is built upon the notion of “Model-as-a-Service” (MaaS). It seeks to bring together most advanced machine learning models from the AI community, and streamlines the process of leveraging AI models in real-world applications. The core ModelScope library open-sourced in this repository provides the interfaces and implementations that allow developers to perform  model inference, training and evaluation.

-The Python library offers the layered-APIs necessary for model contributors to integrate models from CV, NLP, Speech, Multi-Modality, as well as Scientific-computation, into the ModelScope ecosystem. Implementations for all these different models are encapsulated within the library in a way that allows easy and unified access. With such integration, model inference, finetuning, and evaluations can be done with only a few lines of codes. In the meantime, flexibilities are provided so that different components in the model applications can be customized as well, where necessary.

-Apart from harboring implementations of various models, ModelScope library also enables the necessary interactions with ModelScope backend services, particularly with the Model-Hub and Dataset-Hub. Such interactions facilitate management of  various entities (models and datasets) to be performed seamlessly under-the-hood, including entity lookup, version control, cache management, and many others.
+In particular, with rich layers of API-abstraction, the ModelScope library offers unified experience to explore state-of-the-art models spanning across domains such as CV, NLP, Speech, Multi-Modality, and Scientific-computation. Model contributors of different areas can integrate models into the ModelScope ecosystem through the layered-APIs, allowing easy and unified access to their models. Once integrated, model inference, fine-tuning, and evaluations can be done with only a few lines of codes. In the meantime, flexibilities are also provided so that different components in the model applications can be customized wherever necessary.
+
+Apart from harboring implementations of a wide range of different models, ModelScope library also enables the necessary interactions with ModelScope backend services, particularly with the Model-Hub and Dataset-Hub. Such interactions facilitate management of  various entities (models and datasets) to be performed seamlessly under-the-hood, including entity lookup, version control, cache management, and many others.
+
+# Models and Online Accessibility
+
+Hundreds of models are made publicly available on [ModelScope]( https://www.modelscope.cn)  (600+ and counting), covering the latest development in areas such as NLP, CV, Audio, Multi-modality, and AI for Science, etc. Many of these models represent the SOTA in their specific fields, and made their open-sourced debut on ModelScope. Users can visit ModelScope([modelscope.cn](http://www.modelscope.cn)) and experience first-hand how these models perform via online experience, with just a few clicks. Immediate developer-experience is also possible through the ModelScope Notebook, which is backed by ready-to-use CPU/GPU development environment in the cloud - only one click away on [ModelScope](https://www.modelscope.cn).
+
+Some representative examples include:
+
+NLP:
+
+* [nlp_gpt3_text-generation_2.7B](https://modelscope.cn/models/damo/nlp_gpt3_text-generation_2.7B)
+
+* [ChatYuan-large](https://modelscope.cn/models/ClueAI/ChatYuan-large)
+
+* [mengzi-t5-base](https://modelscope.cn/models/langboat/mengzi-t5-base)
+
+* [nlp_csanmt_translation_en2zh](https://modelscope.cn/models/damo/nlp_csanmt_translation_en2zh)
+
+* [nlp_raner_named-entity-recognition_chinese-base-news](https://modelscope.cn/models/damo/nlp_raner_named-entity-recognition_chinese-base-news)
+
+* [nlp_structbert_word-segmentation_chinese-base](https://modelscope.cn/models/damo/nlp_structbert_word-segmentation_chinese-base)
+
+* [Erlangshen-RoBERTa-330M-Sentiment](https://modelscope.cn/models/fengshenbang/Erlangshen-RoBERTa-330M-Sentiment)
+
+* [nlp_convai_text2sql_pretrain_cn](https://modelscope.cn/models/damo/nlp_convai_text2sql_pretrain_cn)
+
+Audio:
+
+* [speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch)
+
+* [speech_sambert-hifigan_tts_zh-cn_16k](https://modelscope.cn/models/damo/speech_sambert-hifigan_tts_zh-cn_16k)
+
+* [speech_charctc_kws_phone-xiaoyun](https://modelscope.cn/models/damo/speech_charctc_kws_phone-xiaoyun)
+
+* [u2pp_conformer-asr-cn-16k-online](https://modelscope.cn/models/wenet/u2pp_conformer-asr-cn-16k-online)
+
+* [speech_frcrn_ans_cirm_16k](https://modelscope.cn/models/damo/speech_frcrn_ans_cirm_16k)
+
+* [speech_dfsmn_aec_psm_16k](https://modelscope.cn/models/damo/speech_dfsmn_aec_psm_16k)
+
+
+CV:
+
+* [cv_tinynas_object-detection_damoyolo](https://modelscope.cn/models/damo/cv_tinynas_object-detection_damoyolo)
+
+* [cv_unet_person-image-cartoon_compound-models](https://modelscope.cn/models/damo/cv_unet_person-image-cartoon_compound-models)
+
+* [cv_convnextTiny_ocr-recognition-general_damo](https://modelscope.cn/models/damo/cv_convnextTiny_ocr-recognition-general_damo)
+
+* [cv_resnet18_human-detection](https://modelscope.cn/models/damo/cv_resnet18_human-detection)
+
+* [cv_resnet50_face-detection_retinaface](https://modelscope.cn/models/damo/cv_resnet50_face-detection_retinaface)
+
+* [cv_unet_image-matting](https://modelscope.cn/models/damo/cv_unet_image-matting)
+
+* [cv_F3Net_product-segmentation](https://modelscope.cn/models/damo/cv_F3Net_product-segmentation)
+
+* [cv_resnest101_general_recognition](https://modelscope.cn/models/damo/cv_resnest101_general_recognition)
+
+
+Multi-Modal:
+
+* [multi-modal_clip-vit-base-patch16_zh](https://modelscope.cn/models/damo/multi-modal_clip-vit-base-patch16_zh)
+
+* [ofa_pretrain_base_zh](https://modelscope.cn/models/damo/ofa_pretrain_base_zh)
+
+* [Taiyi-Stable-Diffusion-1B-Chinese-v0.1](https://modelscope.cn/models/fengshenbang/Taiyi-Stable-Diffusion-1B-Chinese-v0.1)
+
+* [mplug_visual-question-answering_coco_large_en](https://modelscope.cn/models/damo/mplug_visual-question-answering_coco_large_en)
+
+AI for Science:
+
+* [uni-fold-monomer](https://modelscope.cn/models/DPTech/uni-fold-monomer/summary)
+
+* [uni-fold-multimer](https://modelscope.cn/models/DPTech/uni-fold-multimer/summary)
+
+# QuickTour
+
+We provide unified interface for inference using `pipeline`, fine-tuning and evaluation using `Trainer` for different tasks.
+
+For any given task with any type of input (image, text, audio, video...), inference pipeline can be implemented with only a few lines of code, which will automatically load the underlying model to get inference result, as is exemplified below:
+
+```python
+>>> from modelscope.pipelines import pipeline
+>>> word_segmentation = pipeline('word-segmentation',model='damo/nlp_structbert_word-segmentation_chinese-base')
+>>> word_segmentation('今天天气不错，适合出去游玩')
+{'output': '今天 天气 不错 ， 适合 出去 游玩'}
+```
+
+Given an image, portrait matting (aka. background-removal) can be accomplished with the following code snippet:
+
+![image](https://resouces.modelscope.cn/document/docdata/2023-2-16_20:53/dist/ModelScope%20Library%E6%95%99%E7%A8%8B/resources/1656989748829-9ab3aa9b-461d-44f8-98fb-c85bc6f670f9.png)
+
+```python
+>>> import cv2
+>>> from modelscope.pipelines import pipeline
+
+>>> portrait_matting = pipeline('portrait-matting')
+>>> result = portrait_matting('https://modelscope.oss-cn-beijing.aliyuncs.com/test/images/image_matting.png')
+>>> cv2.imwrite('result.png', result['output_img'])
+```
+
+The output image with the background removed is:
+![image](https://resouces.modelscope.cn/document/docdata/2023-2-16_20:53/dist/ModelScope%20Library%E6%95%99%E7%A8%8B/resources/1656989768092-5470f8ac-cda8-4703-ac98-dbb6fd675b34.png)
+
+Fine-tuning and evaluation can also be done with a few more lines of code to set up training dataset and trainer, with the heavy-lifting work of training and evaluation a model encapsulated in the implementation of  `traner.train()` and
+`trainer.evaluate()`  interfaces.
+
+For example, the gpt3 base model (1.3B) can be fine-tuned with the chinese-poetry dataset, resulting in a model that can be used for chinese-poetry generation.
+
+```python
+>>> from modelscope.metainfo import Trainers
+>>> from modelscope.msdatasets import MsDataset
+>>> from modelscope.trainers import build_trainer
+
+>>> train_dataset = MsDataset.load('chinese-poetry-collection', split='train'). remap_columns({'text1': 'src_txt'})
+>>> eval_dataset = MsDataset.load('chinese-poetry-collection', split='test').remap_columns({'text1': 'src_txt'})
+>>> max_epochs = 10
+>>> tmp_dir = './gpt3_poetry'
+
+>>> kwargs = dict(
+     model='damo/nlp_gpt3_text-generation_1.3B',
+     train_dataset=train_dataset,
+     eval_dataset=eval_dataset,
+     max_epochs=max_epochs,
+     work_dir=tmp_dir)
+
+>>> trainer = build_trainer(name=Trainers.gpt3_trainer, default_args=kwargs)
+>>> trainer.train()
+```
+
+# Why should I use ModelScope library
+
+1. A unified and concise user interface is abstracted for different tasks and different models. Model inferences and training can be implemented by as few as 3 and 10 lines of code, respectively. It is convenient for users to explore models in different fields in the ModelScope community. All models integrated into ModelScope are ready to use, which makes it easy to get started with AI, in both educational and industrial settings.
+
+2. ModelScope offers a model-centric development and application experience. It streamlines the support for model training, inference, export and deployment, and facilitates users to build their own MLOps based on the ModelScope ecosystem.
+
+3. For the model inference and training process, a modular design is put in place, and a wealth of functional module implementations are provided, which is convenient for users to customize their own model inference, training and other processes.
+
+4. For distributed model training, especially for large models, it provides rich training strategy support, including data parallel, model parallel, hybrid parallel and so on.

 # Models and Online Demos

@@ -183,9 +323,10 @@ For example, we use the gpt3 1.3B model to load the chinese poetry dataset and f
 # Installation

 ## Docker
-ModelScope Library currently supports tensorflow and pytorch deep learning framework for model training and inference, and it is tested and run on Python 3.7+, Pytorch 1.8+, Tensorflow1.15 or Tensorflow2.0+.

-In order to allow everyone to directly use all the models on the ModelScope platform without configuring the environment, ModelScope provides official docker image for developers who need it. Based on the official image, you can skip all environment installation and configuration and use it directly. Currently, the latest version of the CPU image and GPU image we provide can be obtained from the following address
+ModelScope Library currently supports popular deep learning framework for model training and inference, including PyTorch, TensorFlow and ONNX. All releases are tested and run on Python 3.7+, Pytorch 1.8+, Tensorflow1.15 or Tensorflow2.0+.
+
+To allow out-of-box usage for all the models on ModelScope, official docker images are provided for all releases. Based on the docker image, developers can skip all environment installation and configuration and use it directly. Currently, the latest version of the CPU image and GPU image can be obtained from:

 CPU docker image
 ```shell
@@ -199,20 +340,20 @@ registry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu20.04-cuda11.

 ## Setup Local Python Environment

-Also you can setup your local python environment using pip and conda.  We suggest to use [anaconda](https://docs.anaconda.com/anaconda/install/) to create your python environment:
+One can also set up local ModelScope environment using pip and conda.  We suggest [anaconda](https://docs.anaconda.com/anaconda/install/) for creating local python environment:

 ```shell
 conda create -n modelscope python=3.7
 conda activate modelscope
 ```

-Then you can install pytorch or tensorflow according to your model requirements.
+PyTorch or TensorFlow can be installed separately according to each model's requirements.
 * Install pytorch [doc](https://pytorch.org/get-started/locally/)
 * Install tensorflow [doc](https://www.tensorflow.org/install/pip)

-After installing the necessary framework, you can install modelscope library as follows:
+After installing the necessary machine-learning framework, you can install modelscope library as follows:

-If you only want to download models and datasets, install modelscope framework
+If you only want to play around with the modelscope framework, of trying out model/dataset download, you can install the core modelscope components:
 ```shell
 pip install modelscope
 ```
@@ -243,7 +384,7 @@ pip install modelscope[science] -f https://modelscope.oss-cn-beijing.aliyuncs.co
 ```

 `Notes`:
-1. Currently, some audio-task models only support python3.7, tensorflow1.15.4 Linux environments. Most other models can be installed and used on windows and Mac (x86).
+1. Currently, some audio-task models only support python3.7, tensorflow1.15.4 Linux environments. Most other models can be installed and used on Windows and Mac (x86).

 2. Some models in the audio field use the third-party library SoundFile for wav file processing. On the Linux system, users need to manually install libsndfile of SoundFile([doc link](https://github.com/bastibe/python-soundfile#installation)). On Windows and MacOS, it will be installed automatically without user operation. For example, on Ubuntu, you can use following commands:
    ```shell
--- a/modelscope/metainfo.py
+++ b/modelscope/metainfo.py
@@ -405,6 +405,7 @@ class Pipelines(object):
    dialog_state_tracking = 'dialog-state-tracking'
    zero_shot_classification = 'zero-shot-classification'
    text_error_correction = 'text-error-correction'
+    word_alignment = 'word-alignment'
    plug_generation = 'plug-generation'
    gpt3_generation = 'gpt3-generation'
    gpt_moe_generation = 'gpt-moe-generation'
@@ -925,6 +926,7 @@ class Preprocessors(object):
    sbert_token_cls_tokenizer = 'sbert-token-cls-tokenizer'
    zero_shot_cls_tokenizer = 'zero-shot-cls-tokenizer'
    text_error_correction = 'text-error-correction'
+    word_alignment = 'word-alignment'
    sentence_embedding = 'sentence-embedding'
    text_ranking = 'text-ranking'
    sequence_labeling_tokenizer = 'sequence-labeling-tokenizer'
--- a/modelscope/models/nlp/bert/init.py
+++ b/modelscope/models/nlp/bert/init.py
@@ -17,6 +17,7 @@ if TYPE_CHECKING:
    from .token_classification import BertForTokenClassification
    from .document_segmentation import BertForDocumentSegmentation
    from .siamese_uie import SiameseUieModel
+    from .word_alignment import MBertForWordAlignment
 else:
    _import_structure = {
        'backbone': [
--- a/modelscope/models/nlp/bert/word_alignment.py
+++ b/modelscope/models/nlp/bert/word_alignment.py
@@ -0,0 +1,173 @@
+# Copyright (c) Alibaba, Inc. and its affiliates.
+# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.
+# Copyright (c) 2018, NVIDIA CORPORATION.  All rights reserved.
+# All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import torch
+import torch.nn as nn
+import torch.utils.checkpoint
+
+from modelscope.metainfo import Models
+from modelscope.models.builder import MODELS
+from modelscope.outputs import WordAlignmentOutput
+from modelscope.utils import logger as logging
+from modelscope.utils.constant import Tasks
+from .backbone import BertModel, BertPreTrainedModel
+
+logger = logging.get_logger()
+
+
+@MODELS.register_module(Tasks.word_alignment, module_name=Models.bert)
+class MBertForWordAlignment(BertPreTrainedModel):
+    r"""MBert Model for the Word Alignment task.
+
+    Code for EMNLP Findings 2022 paper, "Third-Party Aligner for Neural Word Alignments".
+    https://arxiv.org/abs/2211.04198
+
+    Parameters:
+        config (:class:`~modelscope.models.nlp.structbert.SbertConfig`): Model configuration class with
+            all the parameters of the model.
+            Initializing with a config file does not load the weights associated with the model, only the
+            configuration. Check out the :meth:`~transformers.PreTrainedModel.from_pretrained` method to load the model
+            weights.
+    """
+    _keys_to_ignore_on_load_unexpected = [r'pooler']
+    _keys_to_ignore_on_load_missing = [
+        r'position_ids', r'predictions.decoder.bias'
+    ]
+
+    def __init__(self, config, **kwargs):
+        super().__init__(config)
+
+        if config.is_decoder:
+            logger.warning(
+                'If you want to use `BertForMaskedLM` make sure `config.is_decoder=False` for '
+                'bi-directional self-attention.')
+        config.num_hidden_layers = kwargs.get('encoder_layers', 8)
+
+        self.bert = BertModel(config, add_pooling_layer=False)
+
+        # Initialize weights and apply final processing
+        self.post_init()
+
+    def forward(
+        self,
+        src_input_ids=None,
+        src_attention_mask=None,
+        src_b2w_map=None,
+        tgt_input_ids=None,
+        tgt_attention_mask=None,
+        tgt_b2w_map=None,
+        threshold=0.001,
+        bpe_level=False,
+    ):
+        """
+        Args: src_input_ids:
+            Indices of source input sequence tokens in the vocabulary.
+        src_attention_mask:
+            Source mask to avoid performing attention on padding token indices.
+        src_b2w_map:
+            Word order numner of subword in source sequence.
+        tgt_input_ids:
+            Indices of target input sequence tokens in the vocabulary.
+        tgt_attention_mask:
+            Target mask to avoid performing attention on padding token indices.
+        tgt_b2w_map:
+            Word order numner of subword in target sequence.
+        threshold:
+            The threshold used to extract alignment.
+        bpe_level:
+            Return subword-level alignment or not.
+        Example:
+            {
+            'src_input_ids': LongTensor([[2478,242,24,4]]),
+            'src_attention_mask': BoolTensor([[1,1,1,1]]),
+            'src_b2w_map': LongTensor([[0,1,2,3]]),
+            'tgt_input_ids': LongTensor([[1056,356,934,263,7]]),
+            'tgt_attention_mask': BoolTensor([[1,1,1,1,1]]),
+            'tgt_b2w_map': longtensor([[0,1,1,2,3]]),
+            'threshold': 0.001,
+            'bpe_level': False,
+            }
+        Returns `modelscope.outputs.WordAlignmentOutput`
+        """
+        with torch.no_grad():
+            src_encoder_out = self.bert(
+                input_ids=src_input_ids,
+                attention_mask=src_attention_mask.float(),
+                head_mask=None,
+                inputs_embeds=None,
+                output_hidden_states=True,
+            )
+            tgt_encoder_out = self.bert(
+                input_ids=tgt_input_ids,
+                attention_mask=tgt_attention_mask.float(),
+                head_mask=None,
+                inputs_embeds=None,
+                output_hidden_states=True,
+            )
+
+            atten_mask_src = (1 - (
+                (src_input_ids != 101) & (src_input_ids != 102)
+                & src_attention_mask)[:, None, None, :].float()) * -10000
+            atten_mask_tgt = (1 - (
+                (tgt_input_ids != 101) & (tgt_input_ids != 102)
+                & tgt_attention_mask)[:, None, None, :].float()) * -10000
+
+            src_align_out = src_encoder_out[0]
+            tgt_align_out = tgt_encoder_out[0]
+
+            bpe_sim = torch.bmm(src_align_out, tgt_align_out.transpose(1, 2))
+
+        attention_scores_src = bpe_sim.unsqueeze(1) + atten_mask_tgt
+        attention_scores_tgt = bpe_sim.unsqueeze(1) + atten_mask_src.transpose(
+            -1, -2)
+
+        attention_probs_src = nn.Softmax(dim=-1)(attention_scores_src)
+        attention_probs_tgt = nn.Softmax(dim=-2)(attention_scores_tgt)
+
+        align_matrix = (attention_probs_src > threshold) * (
+            attention_probs_tgt > threshold)
+        align_matrix = align_matrix.squeeze(1)
+
+        len_src = (atten_mask_src == 0).sum(dim=-1).unsqueeze(-1)
+        len_tgt = (atten_mask_tgt == 0).sum(dim=-1).unsqueeze(-1)
+
+        attention_probs_src = nn.Softmax(dim=-1)(
+            attention_scores_src / torch.sqrt(len_src.float()))
+        attention_probs_tgt = nn.Softmax(dim=-2)(
+            attention_scores_tgt / torch.sqrt(len_tgt.float()))
+
+        word_aligns = []
+
+        for idx, (line_align, b2w_src, b2w_tgt) in enumerate(
+                zip(align_matrix, src_b2w_map, tgt_b2w_map)):
+            aligns = dict()
+            non_specials = torch.where(line_align)
+            for i, j in zip(*non_specials):
+                if not bpe_level:
+                    word_pair = (src_b2w_map[idx][i - 1].item(),
+                                 tgt_b2w_map[idx][j - 1].item())
+                    if word_pair not in aligns:
+                        aligns[word_pair] = bpe_sim[idx][i, j].item()
+                    else:
+                        aligns[word_pair] = max(aligns[word_pair],
+                                                bpe_sim[idx][i, j].item())
+                else:
+                    aligns[(i.item() - 1,
+                            j.item() - 1)] = bpe_sim[idx][i, j].item()
+            word_aligns.append(aligns)
+
+        return WordAlignmentOutput(predictions=word_aligns)
--- a/modelscope/outputs/nlp_outputs.py
+++ b/modelscope/outputs/nlp_outputs.py
@@ -1,5 +1,5 @@
 from dataclasses import dataclass
-from typing import Optional, Tuple, Union
+from typing import List, Optional, Tuple, Union

 import numpy as np

@@ -328,6 +328,14 @@ class TextErrorCorrectionOutput(ModelOutputBase):
    predictions: np.ndarray = None


+@dataclass
+class WordAlignmentOutput(ModelOutputBase):
+    """The output class for word alignment models.
+    """
+
+    predictions: List = None
+
+
@dataclass
 class TextGenerationModelOutput(ModelOutputBase):
    """The output class for text generation models.
--- a/modelscope/outputs/outputs.py
+++ b/modelscope/outputs/outputs.py
@@ -709,6 +709,11 @@ TASK_OUTPUTS = {
    #    "output": "我想吃苹果"
    # }
    Tasks.text_error_correction: [OutputKeys.OUTPUT],
+    # word_alignment result for a single sample
+    # {
+    #    "output": "0-0 1-3 2-4 3-1 4-2 5-5"
+    # }
+    Tasks.word_alignment: [OutputKeys.OUTPUT],
    Tasks.sentence_embedding: [OutputKeys.TEXT_EMBEDDING, OutputKeys.SCORES],
    Tasks.text_ranking: [OutputKeys.SCORES],

--- a/modelscope/pipelines/nlp/init.py
+++ b/modelscope/pipelines/nlp/init.py
@@ -25,6 +25,7 @@ if TYPE_CHECKING:
    from .summarization_pipeline import SummarizationPipeline
    from .translation_quality_estimation_pipeline import TranslationQualityEstimationPipeline
    from .text_error_correction_pipeline import TextErrorCorrectionPipeline
+    from .word_alignment_pipeline import WordAlignmentPipeline
    from .text_generation_pipeline import TextGenerationPipeline, TextGenerationT5Pipeline
    from .fid_dialogue_pipeline import FidDialoguePipeline
    from .token_classification_pipeline import TokenClassificationPipeline
@@ -70,6 +71,7 @@ else:
        ['TableQuestionAnsweringPipeline'],
        'text_classification_pipeline': ['TextClassificationPipeline'],
        'text_error_correction_pipeline': ['TextErrorCorrectionPipeline'],
+        'word_alignment_pipeline': ['WordAlignmentPipeline'],
        'text_generation_pipeline':
        ['TextGenerationPipeline', 'TextGenerationT5Pipeline'],
        'fid_dialogue_pipeline': ['FidDialoguePipeline'],
--- a/modelscope/pipelines/nlp/word_alignment_pipeline.py
+++ b/modelscope/pipelines/nlp/word_alignment_pipeline.py
@@ -0,0 +1,68 @@
+# Copyright (c) Alibaba, Inc. and its affiliates.
+
+from typing import Any, Dict, Optional, Union
+
+import numpy as np
+
+from modelscope.metainfo import Pipelines
+from modelscope.models import Model
+from modelscope.outputs import OutputKeys
+from modelscope.pipelines.base import Pipeline
+from modelscope.pipelines.builder import PIPELINES
+from modelscope.preprocessors import WordAlignmentPreprocessor
+from modelscope.utils.constant import Tasks
+
+__all__ = ['WordAlignmentPipeline']
+
+
+@PIPELINES.register_module(
+    Tasks.word_alignment, module_name=Pipelines.word_alignment)
+class WordAlignmentPipeline(Pipeline):
+
+    def __init__(self,
+                 model: Union[Model, str],
+                 preprocessor: WordAlignmentPreprocessor = None,
+                 config_file: str = None,
+                 device: str = 'gpu',
+                 auto_collate=True,
+                 sequence_length=128,
+                 **kwargs):
+        """Use `model` and `preprocessor` to create a nlp text dual encoder then generates the text representation.
+        Args:
+            model (str or Model): Supply either a local model dir which supported the WS task,
+            or a model id from the model hub, or a torch model instance.
+            preprocessor (Preprocessor): A WordAlignmentPreprocessor.
+            kwargs (dict, `optional`):
+                Extra kwargs passed into the preprocessor's constructor.
+         Example:
+            >>> from modelscope.pipelines import pipeline
+            >>> from modelscope.utils.constant import Tasks
+            >>> model_id = 'damo/Third-Party-Supervised-Word-Aligner-mBERT-base-zhen'
+            >>> input = {"sentence_pair": '贝利 在 墨西哥 推出 自传 。||| pele promotes autobiography in mexico .'}
+            >>> pipeline_ins = pipeline(Tasks.word_alignment, model=model_id)
+            >>> print(pipeline_ins(input)['output'])
+        """
+        super().__init__(
+            model=model,
+            preprocessor=preprocessor,
+            config_file=config_file,
+            device=device,
+            auto_collate=auto_collate)
+        if preprocessor is None:
+            self.preprocessor = WordAlignmentPreprocessor.from_pretrained(
+                self.model.model_dir,
+                sequence_length=sequence_length,
+                **kwargs)
+
+    def forward(self, inputs: Dict[str, Any],
+                **forward_params) -> Dict[str, Any]:
+        return self.model(**inputs, **forward_params)
+
+    def postprocess(self, inputs: Dict[str, Any]) -> Dict[str, Any]:
+
+        align = []
+        for k in inputs[0][0].keys():
+            align.append(f'{k[0]}-{k[1]}')
+        align = ' '.join(align)
+
+        return {OutputKeys.OUTPUT: align}
--- a/modelscope/preprocessors/init.py
+++ b/modelscope/preprocessors/init.py
@@ -30,9 +30,9 @@ if TYPE_CHECKING:
        TextGenerationSentencePiecePreprocessor,
        TokenClassificationTransformersPreprocessor,
        TextErrorCorrectionPreprocessor, TextGenerationT5Preprocessor,
-        TextGenerationTransformersPreprocessor, Tokenize,
-        WordSegmentationBlankSetToLabelPreprocessor, CodeGeeXPreprocessor,
-        MGLMSummarizationPreprocessor,
+        WordAlignmentPreprocessor, TextGenerationTransformersPreprocessor,
+        Tokenize, WordSegmentationBlankSetToLabelPreprocessor,
+        CodeGeeXPreprocessor, MGLMSummarizationPreprocessor,
        ZeroShotClassificationTransformersPreprocessor,
        TextGenerationJiebaPreprocessor, SentencePiecePreprocessor,
        DialogIntentPredictionPreprocessor, DialogModelingPreprocessor,
--- a/modelscope/preprocessors/nlp/init.py
+++ b/modelscope/preprocessors/nlp/init.py
@@ -4,6 +4,7 @@ from typing import TYPE_CHECKING
 from modelscope.utils.import_utils import LazyImportModule

 if TYPE_CHECKING:
+    from .word_alignment_preprocessor import WordAlignmentPreprocessor
    from .text_error_correction import TextErrorCorrectionPreprocessor
    from .text_generation_preprocessor import TextGenerationJiebaPreprocessor
    from .bert_seq_cls_tokenizer import Tokenize
@@ -66,6 +67,9 @@ else:
        'text_error_correction': [
            'TextErrorCorrectionPreprocessor',
        ],
+        'word_alignment_preprocessor': [
+            'WordAlignmentPreprocessor',
+        ],
        'mglm_summarization_preprocessor': ['MGLMSummarizationPreprocessor'],
        'token_classification_thai_preprocessor': [
            'NERPreprocessorThai',
--- a/modelscope/preprocessors/nlp/word_alignment_preprocessor.py
+++ b/modelscope/preprocessors/nlp/word_alignment_preprocessor.py
@@ -0,0 +1,131 @@
+# Copyright (c) Alibaba, Inc. and its affiliates.
+
+import itertools
+import os
+import os.path as osp
+from typing import Any, Dict, Optional, Union
+
+import numpy as np
+import torch
+
+from modelscope.metainfo import Preprocessors
+from modelscope.preprocessors.base import Preprocessor
+from modelscope.preprocessors.builder import PREPROCESSORS
+from modelscope.utils.constant import Fields, ModeKeys
+from modelscope.utils.hub import get_model_type
+from modelscope.utils.logger import get_logger
+from .transformers_tokenizer import NLPTokenizer
+
+
+@PREPROCESSORS.register_module(
+    Fields.nlp, module_name=Preprocessors.word_alignment)
+class WordAlignmentPreprocessor(Preprocessor):
+    """The tokenizer preprocessor used in word alignment .
+    """
+
+    def __init__(self,
+                 model_dir: str,
+                 sequence_pair='sentence_pair',
+                 mode=ModeKeys.INFERENCE,
+                 use_fast: bool = False,
+                 sequence_length: int = None,
+                 **kwargs):
+        """The preprocessor for word alignment task.
+
+        Args:
+            model_dir: The model dir used to initialize the tokenizer.
+            sequence_pair: The key of the sequence pair.
+            mode: The mode for the preprocessor.
+            use_fast: Use the fast tokenizer or not.
+            sequence_length: The max sequence length which the model supported,
+                will be passed into tokenizer as the 'max_length' param.
+            **kwargs: Extra args input.
+                {sequence_length: The sequence length which the model supported.}
+        """
+        self.sequence_pair = sequence_pair
+
+        kwargs[
+            'sequence_length'] = sequence_length if sequence_length is not None else kwargs.get(
+                'max_length', 128)
+        self.max_length = kwargs['sequence_length']
+        kwargs.pop('max_length', None)
+        model_type = None
+
+        if model_dir is not None:
+            model_type = get_model_type(model_dir)
+        self.nlp_tokenizer = NLPTokenizer(
+            model_dir, model_type, use_fast=use_fast, tokenize_kwargs=kwargs)
+        super().__init__(mode=mode)
+
+    def __call__(self, data: Dict, **kwargs) -> Dict[str, Any]:
+        """process the raw input data
+
+        Args:
+            data Dict:
+                Example:
+                    {"sentence_pair": "贝利 在 墨西哥 推出 自传 。||| pele promotes autobiography in mexico ."}
+        Returns:
+            Dict[str, Any]: the preprocessed data
+        """
+        sentence_pair = data[self.sequence_pair]
+        source_sentences, target_sentences = sentence_pair.split('|||')
+        # src_lang = data.get("src_lang", 'en_XX')
+        # tgt_lang = data.get("tgt_lang", 'en_XX')
+        if 'return_tensors' not in kwargs:
+            kwargs[
+                'return_tensors'] = 'pt' if self.mode == ModeKeys.INFERENCE else None
+
+        sent_src, sent_tgt = source_sentences.strip().split(
+        ), target_sentences.strip().split()
+
+        token_src = [
+            self.nlp_tokenizer.tokenizer.tokenize(word) for word in sent_src
+        ]
+        token_tgt = [
+            self.nlp_tokenizer.tokenizer.tokenize(word) for word in sent_tgt
+        ]
+        wid_src = [
+            self.nlp_tokenizer.tokenizer.convert_tokens_to_ids(x)
+            for x in token_src
+        ]
+        wid_tgt = [
+            self.nlp_tokenizer.tokenizer.convert_tokens_to_ids(x)
+            for x in token_tgt
+        ]
+
+        ids_tgt = self.nlp_tokenizer.tokenizer.prepare_for_model(
+            list(itertools.chain(*wid_tgt)),
+            return_tensors='pt',
+            max_length=self.max_length,
+            prepend_batch_axis=True)['input_ids']
+        ids_src = self.nlp_tokenizer.tokenizer.prepare_for_model(
+            list(itertools.chain(*wid_src)),
+            return_tensors='pt',
+            max_length=self.max_length,
+            prepend_batch_axis=True)['input_ids']
+
+        bpe2word_map_src = []
+        for i, word_list in enumerate(token_src):
+            bpe2word_map_src += [i for x in word_list]
+        bpe2word_map_src = torch.Tensor(bpe2word_map_src).type_as(
+            ids_src).view(1, -1)
+        bpe2word_map_tgt = []
+        for i, word_list in enumerate(token_tgt):
+            bpe2word_map_tgt += [i for x in word_list]
+        bpe2word_map_tgt = torch.Tensor(bpe2word_map_tgt).type_as(
+            ids_tgt).view(1, -1)
+        attention_mask_src = (
+            ids_src != self.nlp_tokenizer.tokenizer.pad_token_id)
+        attention_mask_tgt = (
+            ids_tgt != self.nlp_tokenizer.tokenizer.pad_token_id)
+
+        return {
+            'src_input_ids': ids_src,
+            'src_attention_mask': attention_mask_src,
+            'src_b2w_map': bpe2word_map_src,
+            'tgt_input_ids': ids_tgt,
+            'tgt_attention_mask': attention_mask_tgt,
+            'tgt_b2w_map': bpe2word_map_tgt,
+            'threshold': 0.001,
+            'bpe_level': False
+        }
--- a/modelscope/utils/constant.py
+++ b/modelscope/utils/constant.py
@@ -186,6 +186,7 @@ class NLPTasks(object):
    zero_shot_classification = 'zero-shot-classification'
    backbone = 'backbone'
    text_error_correction = 'text-error-correction'
+    word_alignment = 'word-alignment'
    faq_question_answering = 'faq-question-answering'
    information_extraction = 'information-extraction'
    document_segmentation = 'document-segmentation'
--- a/tests/run.py
+++ b/tests/run.py
@@ -600,7 +600,7 @@ if __name__ == '__main__':
    parser.add_argument(
        '--level', default=0, type=int, help='2 -- all, 1 -- p1, 0 -- p0')
    parser.add_argument(
-        '--disable_profile', action='store_true', help='disable profiling')
+        '--profile', action='store_true', help='enable profiling')
    parser.add_argument(
        '--run_config',
        default=None,
@@ -634,7 +634,7 @@ if __name__ == '__main__':
    set_test_level(args.level)
    os.environ['REGRESSION_BASELINE'] = '1'
    logger.info(f'TEST LEVEL: {test_level()}')
-    if not args.disable_profile:
+    if args.profile:
        from utils import profiler
        logger.info('enable profile ...')
        profiler.enable()