Merge branch 'master-github' into master-merge-github0406

2025-12-25 12:39:25 +01:00 · 2023-04-06 15:15:37 +08:00
parent 9ec8f14f6f da6995021c
commit 7b7e772a9f
29 changed files with 2050 additions and 44 deletions
--- a/.github/ISSUE_TEMPLATE/bug_report.md
+++ b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -0,0 +1,34 @@
+---
+name: Bug report
+about: Create a bug report to help us improve
+title: ''
+labels: ''
+assignees: Firmament-cyou, tastelikefeet, wangxingjun778, wenmengzhou, zzclynn
+
+---
+
+Thanks for your error report and we appreciate it a lot.
+
+**Checklist**
+
+* I have searched the tutorial on modelscope  [doc-site](https://modelscope.cn/docs)
+* I have searched related issues but cannot get the expected help.
+* The bug has not been fixed in the latest version.
+
+**Describe the bug**
+A clear and concise description of what the bug is.
+
+**To Reproduce**
+* What command or script did you run?
+>  A placeholder for the command.
+* Did you make any modifications on the code or config? Did you understand what you have modified?
+* What dataset did you use?
+
+
+**Your Environments (__required__)**
+* OS: `uname -a`
+* CPU: `lscpu`
+* Commit id (e.g. `a3ffc7d8`)
+* You may add addition that may be helpful for locating the problem, such as
+    * How you installed PyTorch [e.g., pip, conda, source]
+    * Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)
--- a/.github/ISSUE_TEMPLATE/feature_request.md
+++ b/.github/ISSUE_TEMPLATE/feature_request.md
@@ -0,0 +1,20 @@
+---
+name: Feature request
+about: Suggest an idea for this project
+title: ''
+labels: ''
+assignees: tastelikefeet, wangxingjun778, wenmengzhou, yingdachen, zzclynn
+
+---
+
+**Describe the feature**
+Features description
+
+**Motivation**
+A clear and concise description of the motivation of the feature. Ex1. It is inconvenient when [....]. Ex2. There is a recent paper [....], which is very helpful for [....].
+
+**Related resources**
+ If there is an official code release or third-party implementations, please also provide the information here, which would be very helpful.
+
+**Additional context**
+Add any other context or screenshots about the feature request here. If you would like to implement the feature and create a PR, please leave a comment here and that would be much appreciated.
--- a/.github/ISSUE_TEMPLATE/question.md
+++ b/.github/ISSUE_TEMPLATE/question.md
@@ -0,0 +1,17 @@
+---
+name: Question
+about: Describe this issue template's purpose here.
+title: ''
+labels: ''
+assignees: zzclynn
+
+---
+
+**General Question**
+
+Before asking a question, make sure you have:
+
+* Searched the tutorial on modelscope  [doc-site](https://modelscope.cn/docs)
+* Googled your question.
+* Searched related issues but cannot get the expected help.
+* The bug has not been fixed in the latest version.
--- a/CODE_OF_CONDUCT.md
+++ b/CODE_OF_CONDUCT.md
@@ -0,0 +1,133 @@
+
+# Contributor Covenant Code of Conduct
+
+## Our Pledge
+
+We as members, contributors, and leaders pledge to make participation in our
+community a harassment-free experience for everyone, regardless of age, body
+size, visible or invisible disability, ethnicity, sex characteristics, gender
+identity and expression, level of experience, education, socio-economic status,
+nationality, personal appearance, race, caste, color, religion, or sexual
+identity and orientation.
+
+We pledge to act and interact in ways that contribute to an open, welcoming,
+diverse, inclusive, and healthy community.
+
+## Our Standards
+
+Examples of behavior that contributes to a positive environment for our
+community include:
+
+* Demonstrating empathy and kindness toward other people
+* Being respectful of differing opinions, viewpoints, and experiences
+* Giving and gracefully accepting constructive feedback
+* Accepting responsibility and apologizing to those affected by our mistakes,
+  and learning from the experience
+* Focusing on what is best not just for us as individuals, but for the overall
+  community
+
+Examples of unacceptable behavior include:
+
+* The use of sexualized language or imagery, and sexual attention or advances of
+  any kind
+* Trolling, insulting or derogatory comments, and personal or political attacks
+* Public or private harassment
+* Publishing others' private information, such as a physical or email address,
+  without their explicit permission
+* Other conduct which could reasonably be considered inappropriate in a
+  professional setting
+
+## Enforcement Responsibilities
+
+Community leaders are responsible for clarifying and enforcing our standards of
+acceptable behavior and will take appropriate and fair corrective action in
+response to any behavior that they deem inappropriate, threatening, offensive,
+or harmful.
+
+Community leaders have the right and responsibility to remove, edit, or reject
+comments, commits, code, wiki edits, issues, and other contributions that are
+not aligned to this Code of Conduct, and will communicate reasons for moderation
+decisions when appropriate.
+
+## Scope
+
+This Code of Conduct applies within all community spaces, and also applies when
+an individual is officially representing the community in public spaces.
+Examples of representing our community include using an official e-mail address,
+posting via an official social media account, or acting as an appointed
+representative at an online or offline event.
+
+## Enforcement
+
+Instances of abusive, harassing, or otherwise unacceptable behavior may be
+reported to the community leaders responsible for enforcement at
+feedback@huggingface.co.
+All complaints will be reviewed and investigated promptly and fairly.
+
+All community leaders are obligated to respect the privacy and security of the
+reporter of any incident.
+
+## Enforcement Guidelines
+
+Community leaders will follow these Community Impact Guidelines in determining
+the consequences for any action they deem in violation of this Code of Conduct:
+
+### 1. Correction
+
+**Community Impact**: Use of inappropriate language or other behavior deemed
+unprofessional or unwelcome in the community.
+
+**Consequence**: A private, written warning from community leaders, providing
+clarity around the nature of the violation and an explanation of why the
+behavior was inappropriate. A public apology may be requested.
+
+### 2. Warning
+
+**Community Impact**: A violation through a single incident or series of
+actions.
+
+**Consequence**: A warning with consequences for continued behavior. No
+interaction with the people involved, including unsolicited interaction with
+those enforcing the Code of Conduct, for a specified period of time. This
+includes avoiding interactions in community spaces as well as external channels
+like social media. Violating these terms may lead to a temporary or permanent
+ban.
+
+### 3. Temporary Ban
+
+**Community Impact**: A serious violation of community standards, including
+sustained inappropriate behavior.
+
+**Consequence**: A temporary ban from any sort of interaction or public
+communication with the community for a specified period of time. No public or
+private interaction with the people involved, including unsolicited interaction
+with those enforcing the Code of Conduct, is allowed during this period.
+Violating these terms may lead to a permanent ban.
+
+### 4. Permanent Ban
+
+**Community Impact**: Demonstrating a pattern of violation of community
+standards, including sustained inappropriate behavior, harassment of an
+individual, or aggression toward or disparagement of classes of individuals.
+
+**Consequence**: A permanent ban from any sort of public interaction within the
+community.
+
+## Attribution
+
+This Code of Conduct is adapted from the [Contributor Covenant][homepage],
+version 2.1, available at
+[https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1].
+
+Community Impact Guidelines were inspired by
+[Mozilla's code of conduct enforcement ladder][Mozilla CoC].
+
+For answers to common questions about this code of conduct, see the FAQ at
+[https://www.contributor-covenant.org/faq][FAQ]. Translations are available at
+[https://www.contributor-covenant.org/translations][translations].
+
+[homepage]: https://www.contributor-covenant.org
+[v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
+[Mozilla CoC]: https://github.com/mozilla/diversity
+[FAQ]: https://www.contributor-covenant.org/faq
+[translations]: https://www.contributor-covenant.org/translations
--- a/docs/source/develop.md
+++ b/docs/source/develop.md
@@ -117,25 +117,35 @@ sudo apt-get install git-lfs
 git lfs install
 ```

-2. track your data type using git lfs, for example, to track png files
-```bash
+2. We use a public read model repository from ModelScope to store test data. The repository has been added by default as a submodule with the path data/test. To clone it, use the following command:
+```shell
+git clone git@github.com:modelscope/modelscope.git --recursive
+```
+
+3. Each time you add new data, go to the data/test directory (note that you are now in the submodule's git directory), check if you are on the master branch, and pull the latest master branch:
+```shell
+git branch
+git checkout master
+git pull origin master
+```
+
+4. Track your new test data type, and update and commit the new files on the master branch:
+```shell
+cd data/test/
 git lfs track "*.png"
+git add test.png
+git commit -m "add test.png"
+git push origin master
 ```

-3. add your test files to `data/test/` folder, you can make directories if you need.
-```bash
-git add data/test/test.png
+5. Return to the modelscope directory and commit the submodule update:
+```shell
+cd ../../
+git add data/test
+git commit -m "update test data"
 ```

-4. commit your test data to remote branch
-```bash
-git commit -m "xxx"
-```
-
-To pull data from remote repo, just as the same way you pull git files.
-```bash
-git pull origin branch_name
-```
+Note: By default, we grant write permissions to all members of the ModelScope organization. If you encounter any permission issues, please send an email to ModelScope's official email address ([contact@modelscope.cn](contact@modelscope.cn)), and a dedicated person will contact you via email.



--- a/docs/source/develop_cn.md
+++ b/docs/source/develop_cn.md
@@ -0,0 +1,152 @@
+# 开发
+## 1. 代码风格
+我们采用 [PEP8](https://www.python.org/dev/peps/pep-0008/) 作为首选的代码风格。
+我们使用以下工具进行代码检查和格式化：
+
+- [flake8](http://flake8.pycqa.org/en/latest/): 语法检查器
+- [yapf](https://github.com/google/yapf): 格式化工具
+- [isort](https://github.com/timothycrosley/isort): 导入排序
+
+yapf 和 isort 的样式配置可以在 [setup.cfg](https://chat.openai.com/setup.cfg) 中找到。 我们使用 [pre-commit hook](https://pre-commit.com/) 在每次提交时自动检查和格式化 **flake8**、**yapf**、**seed-isort-config**、**isort**、**trailing whitespaces**，修复 **end-of-files**，对 **requirements.txt** 进行排序。 预提交钩子的配置存储在 [.pre-commit-config](https://chat.openai.com/.pre-commit-config.yaml) 中。 克隆仓库后，您需要安装并初始化预提交钩子。
+```bash
+pip install -r requirements/tests.txt
+```
+在仓库文件夹中运行
+```bash
+pre-commit install
+```
+这样每次提交时，代码检查器和格式化工具都会生效。
+如果您想使用预提交钩子检查所有文件，可以运行
+```bash
+pre-commit run --all-files
+```
+如果您只想格式化和检查代码，可以运行
+```bash
+make linter
+```
+## 2. 测试
+### 2.1 测试级别
+主要有三个测试级别：
+
+- 级别 0：用于测试框架的基本接口和功能，例如 **tests/trainers/test_trainer_base.py**
+- 级别 1：重要的功能测试，测试端到端工作流，例如 **tests/pipelines/test_image_matting.py**
+- 级别 2：针对不同算法领域的所有实现模块（如模型、流程）的场景测试。
+
+默认测试级别为 0，仅运行级别 0 的测试用例，您可以通过环境变量 **TEST_LEVEL** 设置测试级别。
+```bash
+# 运行所有测试
+TEST_LEVEL=2 make test
+# 运行重要功能测试
+TEST_LEVEL=1 make test
+# 运行核心单元测试和基本功能测试
+make test
+```
+编写测试用例时，您应该为测试用例分配一个测试级别，如下所示。如果保持默认值，测试级别将为 0，在每个测试阶段都会运行。
+test_module.py 文件
+```python
+from modelscope.utils.test_utils import test_level
+
+class ImageCartoonTest(unittest.TestCase):
+    @unittest.skipUnless(test_level() >= 1, 'skip test in current test level')
+    def test_run_by_direct_model_download(self):
+        pass
+
+```
+### 2.2 运行测试
+
+1. 运行自己的单个测试用例以测试自己实现的功能。您可以直接运行测试文件，如果无法运行，请检查环境变量 **TEST_LEVEL** 是否存在，如果存在，请取消设置。
+```bash
+python tests/path/to/your_test.py
+
+```
+
+2. 在开始代码审查之前，请记住在本地环境中运行核心测试，默认情况下只会运行级别为 0 的测试用例。
+```bash
+make tests
+```
+
+3. 在您开始代码审查后，将触发持续集成测试，该测试将运行级别为 1 的测试用例。
+4. 每天凌晨 0 点，使用 master 分支运行每日回归测试，覆盖所有测试用例。
+### 2.3 测试数据存储
+由于我们需要大量的测试数据，包括图像、视频和模型，因此我们使用 git lfs 存储这些大文件。
+
+1. 安装 git-lfs（版本>= 2.5.0） 对于 Mac
+```bash
+brew install git-lfs
+git lfs install
+```
+对于 CentOS，请从 git-lfs GitHub 发布[网站](https://github.com/git-lfs/git-lfs/releases/tag/v3.2.0)下载 rpm 文件，然后执行
+```bash
+sudo rpm -ivh your_rpm_file_name.rpm
+git lfs install
+```
+对于 Ubuntu
+```bash
+curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
+sudo apt-get install git-lfs
+git lfs install
+
+```
+
+2. 我们使用 ModelScope 的一个公共读取模型仓库来存储测试数据。该仓库已默认添加为子模块，路径为 data/test。要克隆它，请使用以下命令：
+```
+
+git clone git@github.com:modelscope/modelscope.git --recursive
+```
+
+3. 每次添加新数据时，进入 data/test 目录（注意此时您已在子模块的 git 目录中），检查是否在 master 分支上，并拉取最新的 master 分支：
+```
+
+git branch
+git checkout master
+git pull origin master
+```
+
+4. 跟踪新的测试数据类型，并在 master 分支上更新并提交新文件：
+```
+
+cd data/test/
+git lfs track "*.png"
+git add test.png
+git commit -m "add test.png"
+git push origin master
+```
+
+5. 返回到 modelscope 目录，提交子模块更新：
+```
+
+cd ../../
+git add data/test
+git commit -m "update test data"
+```
+注意：默认情况下，我们会为 ModelScope 组织下的所有成员授权写权限。如果遇到权限问题，请发送电子邮件至 ModelScope 官方邮箱（[contact@modelscope.cn](https://chat.openai.com/contact@modelscope.cn)），我们将有专人与您通过电子邮件联系。
+## 开发和代码审查
+
+1. 获取最新的 master 代码并为本地开发检出一个新分支。
+```
+git pull origin master --rebase
+git checkout -b dev/my-dev-branch
+```
+注意：将 "dev/my-dev-branch" 替换为有意义的分支名称。我们建议为每次更改使用一个新的 dev 分支。
+
+2. 进行本地更改。
+3. 提交本地更改。
+```shell
+git add .
+git commit -m "[to #42322933] my commit message"
+```
+
+ 4. 推送更改：
+```
+
+ git push --set-upstream origin dev/my-dev-branch
+ bash make whl
+```
+注意，以后您可以使用 'git push' 命令多次推送到相同的分支。
+
+ 5. 在 github 上创建一个 pull 请求，将您的代码合并到 master 分支中。
+
+## 构建 pip 软件包
+```bash
+make whl
+```
--- a/modelscope/hub/file_download.py
+++ b/modelscope/hub/file_download.py
@@ -160,7 +160,7 @@ def model_file_download(
 def get_file_download_url(model_id: str, file_path: str, revision: str):
    """Format file download url according to `model_id`, `revision` and `file_path`.
    e.g., Given `model_id=john/bert`, `revision=master`, `file_path=README.md`,
-    the resulted download url is: https://modelscope.co/api/v1/models/john/bert/repo?Revision=master&FilePath=README.md
+    the resulted download url is: https://modelscope.cn/api/v1/models/john/bert/repo?Revision=master&FilePath=README.md

    Args:
        model_id (str): The model_id.
--- a/modelscope/metainfo.py
+++ b/modelscope/metainfo.py
@@ -154,6 +154,7 @@ class Models(object):
    T5 = 'T5'
    mglm = 'mglm'
    codegeex = 'codegeex'
+    glm130b = 'glm130b'
    bloom = 'bloom'
    unite = 'unite'
    megatron_bert = 'megatron-bert'
@@ -445,6 +446,7 @@ class Pipelines(object):
    mglm_text_summarization = 'mglm-text-summarization'
    codegeex_code_translation = 'codegeex-code-translation'
    codegeex_code_generation = 'codegeex-code-generation'
+    glm130b_text_generation = 'glm130b-text-generation'
    translation_en_to_de = 'translation_en_to_de'  # keep it underscore
    translation_en_to_ro = 'translation_en_to_ro'  # keep it underscore
    translation_en_to_fr = 'translation_en_to_fr'  # keep it underscore
--- a/modelscope/models/nlp/init.py
+++ b/modelscope/models/nlp/init.py
@@ -18,6 +18,7 @@ if TYPE_CHECKING:
    )
    from .bloom import BloomModel
    from .codegeex import CodeGeeXForCodeTranslation, CodeGeeXForCodeGeneration
+    from .glm_130b import GLM130bForTextGeneration
    from .csanmt import CsanmtForTranslation
    from .deberta_v2 import DebertaV2ForMaskedLM, DebertaV2Model
    from .gpt_neo import GPTNeoModel
@@ -89,6 +90,7 @@ else:
        'csanmt': ['CsanmtForTranslation'],
        'codegeex':
        ['CodeGeeXForCodeTranslation', 'CodeGeeXForCodeGeneration'],
+        'glm_130b': ['GLM130bForTextGeneration'],
        'deberta_v2': ['DebertaV2ForMaskedLM', 'DebertaV2Model'],
        'heads': ['TextClassificationHead'],
        'hf_transformers': ['TransformersModel'],
--- a/modelscope/models/nlp/dgds/init.py
+++ b/modelscope/models/nlp/dgds/init.py
@@ -5,7 +5,7 @@ from modelscope.utils.import_utils import LazyImportModule

 if TYPE_CHECKING:
    from .document_grounded_dialog_generate import DocumentGroundedDialogGenerateModel
-    from .document_grounded_dialog_retrieval import DocumentGroundedDialogRerankModel
+    from .document_grounded_dialog_rerank import DocumentGroundedDialogRerankModel
    from .document_grounded_dialog_retrieval import DocumentGroundedDialogRetrievalModel
 else:
    _import_structure = {
--- a/modelscope/models/nlp/dgds/backbone.py
+++ b/modelscope/models/nlp/dgds/backbone.py
@@ -13,7 +13,6 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-"""PyTorch BERT model."""

 from __future__ import absolute_import, division, print_function
 import os.path
--- a/modelscope/models/nlp/glm_130b/LICENSE
+++ b/modelscope/models/nlp/glm_130b/LICENSE
@@ -0,0 +1,201 @@
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright Aohan Zeng
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
--- a/modelscope/models/nlp/glm_130b/MODEL_LICENSE
+++ b/modelscope/models/nlp/glm_130b/MODEL_LICENSE
@@ -0,0 +1,33 @@
+The GLM-130B License
+
+1. Definitions
+
+“Licensor” means the GLM-130B Model Team that distributes its Software.
+
+“Software” means the GLM-130B model parameters made available under this license.
+
+2. License Grant
+
+Subject to the terms and conditions of this License, the Licensor hereby grants to you a non-exclusive, worldwide, non-transferable, non-sublicensable, revocable, royalty-free copyright license to use the Software solely for your non-commercial research purposes.
+
+The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
+
+3. Restriction
+
+You will not use, copy, modify, merge, publish, distribute, reproduce, or create derivative works of the Software, in whole or in part, for any commercial, military, or illegal purposes.
+
+You will not use the Software for any act that may undermine China's national security and national unity, harm the public interest of society, or infringe upon the rights and interests of human beings.
+
+4. Disclaimer
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+5. Limitation of Liability
+
+EXCEPT TO THE EXTENT PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL THEORY, WHETHER BASED IN TORT, NEGLIGENCE, CONTRACT, LIABILITY, OR OTHERWISE WILL ANY LICENSOR BE LIABLE TO YOU FOR ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES, OR ANY OTHER COMMERCIAL LOSSES, EVEN IF THE LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
+
+6. Dispute Resolution
+
+This license shall be governed and construed in accordance with the laws of People’s Republic of China. Any dispute arising from or in connection with this License shall be submitted to Haidian District People's Court in Beijing.
+
+Note that the license is subject to update to a more comprehensive version.  For any questions related to the license and copyright, please contact us at glm-130b@googlegroups.com.
--- a/modelscope/models/nlp/glm_130b/init.py
+++ b/modelscope/models/nlp/glm_130b/init.py
@@ -0,0 +1,20 @@
+# Modified by Zhipu.AI
+# Original Copyright (c) Alibaba, Inc. and its affiliates.
+from typing import TYPE_CHECKING, Union
+
+from modelscope.utils.import_utils import LazyImportModule
+
+if TYPE_CHECKING:
+    from .text_generation import GLM130bForTextGeneration
+else:
+    _import_structure = {'text_generation': ['GLM130bForTextGeneration']}
+
+    import sys
+
+    sys.modules[__name__] = LazyImportModule(
+        __name__,
+        globals()['__file__'],
+        _import_structure,
+        module_spec=__spec__,
+        extra_objects={},
+    )
--- a/modelscope/models/nlp/glm_130b/generation/init.py
+++ b/modelscope/models/nlp/glm_130b/generation/init.py
@@ -0,0 +1,2 @@
+# Copyright (c) 2022 Zhipu.AI
+from .strategies import BaseStrategy, BeamSearchStrategy
--- a/modelscope/models/nlp/glm_130b/generation/strategies.py
+++ b/modelscope/models/nlp/glm_130b/generation/strategies.py
@@ -0,0 +1,240 @@
+# Copyright (c) 2022 Zhipu.AI
+import numpy as np
+import torch
+import torch.nn.functional as F
+from SwissArmyTransformer.generation.sampling_strategies.base_strategy import \
+    top_k_logits
+
+
+class BaseStrategy:
+
+    def __init__(self,
+                 batch_size,
+                 invalid_slices=[],
+                 temperature=1.,
+                 top_k=200,
+                 eps=1e-4,
+                 top_p=0.0,
+                 end_tokens=None):
+        self.batch_size = batch_size
+        self.invalid_slices = invalid_slices
+        self.temperature = temperature
+        self.topk = top_k
+        self.top_p = top_p
+        self.eps = eps
+        if end_tokens is None:
+            end_tokens = []
+        self.end_tokens = end_tokens
+        self._is_done = np.zeros(self.batch_size, dtype=bool)
+
+    @property
+    def is_done(self) -> bool:
+        return self._is_done.all()
+
+    def forward(self, logits, tokens, mems, temperature=None):
+        logits = logits.view(-1, logits.size(-1))
+        batch_size = tokens.shape[0]
+        if temperature is None:
+            temperature = self.temperature
+        logits = logits / temperature
+        for invalid_slice in self.invalid_slices:
+            logits[..., invalid_slice] = -65504
+
+        logits = top_k_logits(logits, self.topk, self.top_p)
+        probs = F.softmax(
+            logits.float(),
+            dim=-1)  # float is essetial, due to a bug in Pytorch
+        pred = torch.multinomial(probs, num_samples=1)
+        for i in range(self.batch_size):
+            if i >= batch_size:
+                self._is_done[i] = True
+            elif self._is_done[i]:
+                pred[i] = -1
+            elif pred[i].item() in self.end_tokens:
+                self._is_done[i] = True
+        tokens = torch.cat((tokens, pred.view(tokens.shape[:-1] + (1, ))),
+                           dim=-1)
+        return tokens, mems
+
+    def finalize(self, tokens, mems):
+        self._is_done = np.zeros(self.batch_size, dtype=bool)
+        return tokens, mems
+
+
+class BeamSearchStrategy:
+
+    def __init__(
+        self,
+        batch_size,
+        num_beams,
+        length_penalty=1.0,
+        consider_end=False,
+        end_tokens=[],
+        invalid_slices=[],
+        no_repeat_ngram_size=0,
+        min_gen_length=0,
+        deterministic=False,
+    ):
+        self.batch_size = batch_size
+        self.num_beams = num_beams
+        self.length_penalty = length_penalty
+        self.end_tokens = end_tokens
+        self.ngram = no_repeat_ngram_size
+        self.min_gen_length = min_gen_length
+        self.invalid_slices = invalid_slices
+        self.consider_end = consider_end
+        self.deterministic = deterministic
+        self._init_cache()
+
+    def _init_cache(self):
+        self.end_beams = [[] for _ in range(self.batch_size)
+                          ]  # list of LongTensors
+        self.end_beams_penalized_scores = [[] for _ in range(self.batch_size)
+                                           ]  # list of LongTensors
+        self.cached_beam_scores = 0  # [batch_size]
+        self.cached_beam_ngram_bans = [[{} for _ in range(self.num_beams)]
+                                       for _ in range(self.batch_size)]
+        self.length_generated = 0
+        self._is_done = np.zeros(self.batch_size, dtype=bool)
+
+    def _add_end_beams(self, score, beam, batch_idx):
+        score = score / ((5.0 + len(beam))
+                         / 6)**self.length_penalty  # Magic number for OpenNMT
+        for i in range(len(self.end_beams[batch_idx]), -1, -1):
+            if i == 0 or score < self.end_beams_penalized_scores[batch_idx][
+                    i - 1]:
+                break
+        self.end_beams[batch_idx].insert(i, beam)
+        self.end_beams_penalized_scores[batch_idx].insert(i, score)
+
+        self.end_beams[batch_idx] = self.end_beams[batch_idx][:self.num_beams]
+        self.end_beams_penalized_scores[
+            batch_idx] = self.end_beams_penalized_scores[batch_idx][:self.
+                                                                    num_beams]
+
+    @property
+    def is_done(self) -> bool:
+        return self._is_done.all()
+
+    def forward(self, logits, tokens, mems):
+        batch_size, num_beams, vocab_size = logits.shape
+        seq_len = tokens.shape[-1]
+        logits = logits.float()
+        for invalid_slice in self.invalid_slices:
+            logits[..., invalid_slice] = -65504
+        if self.min_gen_length > self.length_generated:
+            for end_token in self.end_tokens:
+                logits[..., end_token] = -65504
+        if self.ngram > 0 and seq_len > self.ngram:
+            for batch_idx in range(batch_size):
+                for i in range(num_beams):
+                    ngram_prefix = tokens[batch_idx, i,
+                                          -(self.ngram
+                                            - 1):].tolist()  # TODO ngram=1
+                    for banned_index in self.cached_beam_ngram_bans[batch_idx][
+                            i].get(tuple(ngram_prefix), []):
+                        logits[batch_idx, i, banned_index] = -65504
+
+        next_token_scores = F.log_softmax(
+            logits, dim=-1)  # [batch_size, vocab_size]
+        prev_scores = self.cached_beam_scores
+        if isinstance(prev_scores, torch.Tensor):
+            prev_scores = prev_scores[..., None].expand_as(next_token_scores)
+        next_token_scores = next_token_scores + prev_scores
+
+        next_token_scores = next_token_scores.view(batch_size,
+                                                   num_beams * vocab_size)
+
+        probs = F.softmax(next_token_scores, dim=-1)
+        if num_beams < self.num_beams:  # First token
+            probs = probs[..., :vocab_size]
+        if self.deterministic:
+            next_tokens = torch.topk(
+                probs, k=(max(1, len(self.end_tokens)) + 1)
+                * self.num_beams).indices  # [2*nb]
+        else:
+            next_tokens = torch.multinomial(
+                probs,
+                num_samples=(max(1, len(self.end_tokens)) + 1)
+                * self.num_beams)  # [2*nb]
+        next_token_scores = next_token_scores[
+            torch.arange(batch_size).unsqueeze(1), next_tokens]
+        next_token_scores, _indices = torch.sort(
+            next_token_scores, descending=True, dim=1)
+        next_tokens = next_tokens[torch.arange(batch_size).unsqueeze(1),
+                                  _indices]
+
+        next_indices = torch.div(
+            next_tokens, vocab_size, rounding_mode='trunc')
+        next_tokens = next_tokens % vocab_size
+
+        # select out end beams or continue beams
+        beam_continue_batch, score_continue_batch, mems_continue_batch = [], [], []
+        for batch_idx in range(batch_size):
+            beam_continue = []
+            scores_continue = []
+            bans_continue = []
+            mems_contiue = []
+            for i in range(len(next_tokens[batch_idx])):
+                beam = torch.cat(
+                    (tokens[batch_idx, next_indices[batch_idx,
+                                                    i]], next_tokens[batch_idx,
+                                                                     i:i + 1]))
+                if not self._is_done[batch_idx] and int(
+                        next_tokens[batch_idx, i]) in self.end_tokens:
+                    self._add_end_beams(next_token_scores[batch_idx, i], beam,
+                                        batch_idx)
+                elif len(beam_continue) < self.num_beams:
+                    beam_continue.append(beam)
+                    mems_contiue.append(mems[:, batch_idx,
+                                             next_indices[batch_idx, i]])
+                    # update caches
+                    scores_continue.append(next_token_scores[batch_idx, i])
+                    if self.ngram > 0:
+                        bans = self.cached_beam_ngram_bans[batch_idx][
+                            next_indices[batch_idx, i]].copy()
+                        # TODO ngram=1
+                        ngram_prefix = tuple(
+                            tokens[batch_idx, next_indices[batch_idx, i],
+                                   -(self.ngram - 1):].tolist())
+                        bans[ngram_prefix] = bans.get(
+                            ngram_prefix, tuple()) + (next_tokens[batch_idx,
+                                                                  i], )
+                        bans_continue.append(bans)
+                else:
+                    break
+            beam_continue_batch.append(torch.stack(beam_continue))
+            mems_continue_batch.append(torch.stack(mems_contiue, dim=1))
+            score_continue_batch.append(scores_continue)
+            self.cached_beam_ngram_bans[batch_idx] = bans_continue
+        tokens = torch.stack(beam_continue_batch)
+        mems = torch.stack(mems_continue_batch, dim=1)
+        self.cached_beam_scores = torch.tensor(
+            score_continue_batch, device=logits.device)
+        self.length_generated += 1
+        for batch_idx in range(self.batch_size):
+            if batch_idx >= batch_size:
+                self._is_done[batch_idx] = True
+            elif (len(self.end_beams[batch_idx]) == self.num_beams
+                  and self.end_beams_penalized_scores[batch_idx][-1] >=  # noqa
+                  self.cached_beam_scores[batch_idx].max() /  # noqa
+                  ((5.0 + (seq_len + 1)) / 6)**self.length_penalty):  # noqa
+                self._is_done[batch_idx] = True
+
+        return tokens, mems
+
+    def finalize(self, tokens, mems):
+        if self.consider_end:
+            batch_size, num_beams = tokens.shape[:2]
+            for batch_idx in range(batch_size):
+                if not self._is_done[batch_idx]:
+                    for i in range(num_beams):
+                        self._add_end_beams(
+                            self.cached_beam_scores[batch_idx, i],
+                            tokens[batch_idx, i], batch_idx)
+            mems = None
+            ret = self.end_beams[:batch_size]
+        else:
+            ret = tokens
+        self._init_cache()
+        return ret, mems
--- a/modelscope/models/nlp/glm_130b/initialize.py
+++ b/modelscope/models/nlp/glm_130b/initialize.py
@@ -0,0 +1,161 @@
+# Copyright (c) 2022 Zhipu.AI
+
+import argparse
+import time
+
+import torch
+from SwissArmyTransformer import get_args, get_tokenizer
+from SwissArmyTransformer.arguments import initialize_distributed
+from SwissArmyTransformer.model import GLM130B
+from SwissArmyTransformer.mpu import (get_model_parallel_group,
+                                      get_model_parallel_rank,
+                                      get_model_parallel_world_size)
+from SwissArmyTransformer.training import load_checkpoint
+
+from .quantization import quantize
+
+
+def add_bminf_args(parser):
+    """Arguments for BMInf"""
+    group = parser.add_argument_group('BMInf')
+
+    group.add_argument(
+        '--bminf',
+        action='store_true',
+        help='Use BMInf to support low resource evaluation')
+    group.add_argument(
+        '--bminf-memory-limit',
+        type=int,
+        default=20,
+        help='Max memory for model per GPU (in GB)')
+    return parser
+
+
+def add_quantization_args(parser):
+    group = parser.add_argument_group('Quantization')
+
+    group.add_argument('--quantization-bit-width', type=int, default=4)
+    group.add_argument(
+        '--from-quantized-checkpoint',
+        type=bool,
+        default=True,
+        help='Loading from a quantized checkpoint')
+
+
+def add_initialization_args(parser):
+    group = parser.add_argument_group('Initialization')
+
+    group.add_argument(
+        '--sequential-initialization',
+        action='store_true',
+        help=
+        'Initialize sequentially in tensor parallel group (reduce CPU RAM for initialization)',
+    )
+
+
+def set_up_model_args(args):
+    args.model_parallel_size = 4
+    args.num_layers = 70
+    args.hidden_size = 12288
+    args.inner_hidden_size = 32768
+    args.vocab_size = 150528
+    args.num_attention_heads = 96
+    args.max_sequence_length = 2048
+    args.tokenizer_type = 'icetk-glm-130B'
+    args.layernorm_order = 'post'
+    args.skip_init = True
+    args.fp16 = True
+    args.mode = 'inference'
+    return args
+
+
+def initialize(extra_args_provider):
+    parser = argparse.ArgumentParser(add_help=False)
+    add_bminf_args(parser)
+    add_quantization_args(parser)
+    add_initialization_args(parser)
+    GLM130B.add_model_specific_args(parser)
+    extra_args_provider(parser)
+    known, args_list = parser.parse_known_args()
+    args_list += ['--model-parallel-size', '4', '--mode', 'inference']
+    args = get_args(args_list)
+    args = set_up_model_args(args)
+    args = argparse.Namespace(**vars(args), **vars(known))
+    args.do_train = False
+    initialize_distributed(args)
+    return args
+
+
+def initialize_model_and_tokenizer(args):
+    tokenizer = get_tokenizer(args)
+
+    torch.distributed.barrier()
+    start = time.time()
+
+    for i in range(get_model_parallel_world_size()):
+        if get_model_parallel_rank() == i:
+            # Initialize model
+            model = GLM130B(args).half()
+
+            if args.from_quantized_checkpoint:
+                assert args.quantization_bit_width is not None
+                # Quantize model before moving to GPU
+                model = quantize(model, args.quantization_bit_width)
+
+            # Load checkpoint
+            load_checkpoint(model, args)
+
+            if args.quantization_bit_width is not None and not args.from_quantized_checkpoint:
+                # Quantize model before moving to GPU
+                model = quantize(model, args.quantization_bit_width)
+
+            if args.bminf:
+                import bminf
+
+                if torch.distributed.get_rank() == 0:
+                    print(
+                        f'> BMInf activated, memory limit: {args.bminf_memory_limit} GB'
+                    )
+                with torch.cuda.device(args.device):
+                    model = bminf.wrapper(
+                        model,
+                        quantization=False,
+                        memory_limit=args.bminf_memory_limit << 30)
+            else:
+                model = model.to(args.device)
+        if args.sequential_initialization:
+            torch.distributed.barrier(group=get_model_parallel_group())
+
+    torch.distributed.barrier()
+    if torch.distributed.get_rank() == 0:
+        print(f'> Model initialized in {time.time() - start:.1f}s')
+
+    torch.cuda.empty_cache()
+    model.eval()
+
+    # generate rotary embedding cache
+    original_parallel_output = model.transformer.parallel_output
+    model.transformer.parallel_output = True
+    with torch.no_grad():
+        _, *_ = model(
+            torch.ones(
+                1,
+                args.max_sequence_length,
+                device=torch.cuda.current_device(),
+                dtype=torch.int64),
+            torch.arange(
+                args.max_sequence_length,
+                device=torch.cuda.current_device(),
+                dtype=torch.int64).view(1, -1),
+            torch.randn(
+                1,
+                1,
+                args.max_sequence_length,
+                args.max_sequence_length,
+                device=torch.cuda.current_device(),
+            ) < 0.5,
+        )
+    model.transformer.parallel_output = original_parallel_output
+    torch.distributed.barrier()
+
+    return model, tokenizer
--- a/modelscope/models/nlp/glm_130b/kernels/init.py
+++ b/modelscope/models/nlp/glm_130b/kernels/init.py
@@ -0,0 +1,111 @@
+# Copyright (c) 2022 Zhipu.AI
+import ctypes
+from typing import List
+
+import pkg_resources
+import torch
+from cpm_kernels.kernels.base import (KernelFunction, LazyKernelCModule,
+                                      round_up)
+
+RESOURCE_PACKAGE_NAME = __name__
+
+
+class Kernel:
+
+    def __init__(self, filename: str, function_names: List[str]):
+        filename = filename + '.fatbin'
+        if not pkg_resources.resource_exists(RESOURCE_PACKAGE_NAME, filename):
+            raise RuntimeError('File `%s` not found in `%s`' %
+                               (filename, RESOURCE_PACKAGE_NAME))
+        self.filename = filename
+        self.code = pkg_resources.resource_string(RESOURCE_PACKAGE_NAME,
+                                                  filename)
+        self._function_names = function_names
+        self._cmodule = LazyKernelCModule(self.code)
+
+        for name in self._function_names:
+            setattr(self, name, KernelFunction(self._cmodule, name))
+
+
+kernels = Kernel(
+    'quantization',
+    [
+        'int4WeightCompression',
+        'int4WeightExtractionFloat',
+        'int4WeightExtractionHalf',
+        'int8WeightExtractionFloat',
+        'int8WeightExtractionHalf',
+    ],
+)
+
+
+def compress_int4_weight(weight: torch.Tensor):  # (n, m)
+    with torch.cuda.device(weight.device):
+        n, m = weight.size(0), weight.size(1)
+        assert m % 2 == 0
+        m = m // 2
+        out = torch.empty(n, m, dtype=torch.int8, device='cuda')
+        stream = torch.cuda.current_stream()
+
+        gridDim = (n, 1, 1)
+        blockDim = (min(round_up(m, 32), 1024), 1, 1)
+
+        kernels.int4WeightCompression(
+            gridDim,
+            blockDim,
+            0,
+            stream,
+            [
+                ctypes.c_void_p(weight.data_ptr()),
+                ctypes.c_void_p(out.data_ptr()),
+                ctypes.c_int32(n),
+                ctypes.c_int32(m)
+            ],
+        )
+        return out
+
+
+def extract_weight_to_half(weight: torch.Tensor, scale_list: torch.Tensor,
+                           source_bit_width: int):
+    if source_bit_width == 8:
+        func = kernels.int8WeightExtractionHalf
+    elif source_bit_width == 4:
+        func = kernels.int4WeightExtractionHalf
+    else:
+        assert False, 'Unsupported bit-width'
+
+    with torch.cuda.device(weight.device):
+        n, m = weight.size(0), weight.size(1)
+        out = torch.empty(
+            n, m * (8 // source_bit_width), dtype=torch.half, device='cuda')
+        stream = torch.cuda.current_stream()
+
+        gridDim = (n, 1, 1)
+        blockDim = (min(round_up(m, 32), 1024), 1, 1)
+
+        func(
+            gridDim,
+            blockDim,
+            0,
+            stream,
+            [
+                ctypes.c_void_p(weight.data_ptr()),
+                ctypes.c_void_p(scale_list.data_ptr()),
+                ctypes.c_void_p(out.data_ptr()),
+                ctypes.c_int32(n),
+                ctypes.c_int32(m),
+            ],
+        )
+        return out
+
+
+if __name__ == '__main__':
+    weight = torch.randn(4, 32).to(torch.int8).cuda()
+    scale = torch.ones(weight.size(0)).to(torch.half).cuda()
+
+    print(weight)
+    b = compress_int4_weight(weight)
+    print(b)
+
+    a = extract_weight_to_half(b, scale, source_bit_width=4)
+    print(a)
--- a/modelscope/models/nlp/glm_130b/kernels/quantization.fatbin
+++ b/modelscope/models/nlp/glm_130b/kernels/quantization.fatbin
--- a/modelscope/models/nlp/glm_130b/quantization/init.py
+++ b/modelscope/models/nlp/glm_130b/quantization/init.py
@@ -0,0 +1,67 @@
+# Copyright (c) 2022 Zhipu.AI
+import torch
+
+from .layers import QuantizedColumnParallelLinear, QuantizedRowParallelLinear
+
+
+def quantize(model, weight_bit_width):
+    """Replace fp16 linear with quantized linear"""
+
+    if torch.distributed.get_rank() == 0:
+        print(f'> Quantizing model weight to {weight_bit_width} bits')
+
+    for layer in model.transformer.layers:
+        layer.attention.query_key_value = QuantizedColumnParallelLinear(
+            weight_bit_width=weight_bit_width,
+            weight=layer.attention.query_key_value.weight.to(
+                torch.cuda.current_device()),
+            input_size=layer.attention.query_key_value.input_size,
+            output_size=layer.attention.query_key_value.output_size,
+            bias=True,
+            gather_output=False,
+            params_dtype=torch.half,
+            name='query_key_value',
+            skip_init=True,
+            device=layer.attention.query_key_value.weight.device,
+        )
+        layer.attention.dense = QuantizedRowParallelLinear(
+            weight_bit_width=weight_bit_width,
+            weight=layer.attention.dense.weight.to(
+                torch.cuda.current_device()),
+            input_size=layer.attention.dense.input_size,
+            output_size=layer.attention.dense.output_size,
+            bias=True,
+            input_is_parallel=True,
+            params_dtype=torch.half,
+            name='dense',
+            skip_init=True,
+            device=layer.attention.dense.weight.device,
+        )
+        layer.mlp.dense_h_to_4h = QuantizedColumnParallelLinear(
+            weight_bit_width=weight_bit_width,
+            weight=layer.mlp.dense_h_to_4h.weight.to(
+                torch.cuda.current_device()),
+            input_size=layer.mlp.dense_h_to_4h.input_size,
+            output_size=layer.mlp.dense_h_to_4h.output_size,
+            bias=True,
+            gather_output=False,
+            params_dtype=torch.half,
+            name='dense_h_to_4h',
+            skip_init=True,
+            device=layer.mlp.dense_h_to_4h.weight.device,
+        )
+        layer.mlp.dense_4h_to_h = QuantizedRowParallelLinear(
+            weight_bit_width=weight_bit_width,
+            weight=layer.mlp.dense_4h_to_h.weight.to(
+                torch.cuda.current_device()),
+            input_size=layer.mlp.dense_4h_to_h.input_size,
+            output_size=layer.mlp.dense_4h_to_h.output_size,
+            bias=True,
+            input_is_parallel=True,
+            params_dtype=torch.half,
+            name='dense_h_to_4h',
+            skip_init=True,
+            device=layer.mlp.dense_4h_to_h.weight.device,
+        )
+
+    return model
--- a/modelscope/models/nlp/glm_130b/quantization/functional.py
+++ b/modelscope/models/nlp/glm_130b/quantization/functional.py
@@ -0,0 +1,30 @@
+# Copyright (c) 2022 Zhipu.AI
+import torch
+
+from ..kernels import extract_weight_to_half
+
+
+class W8A16Linear(torch.autograd.Function):
+
+    @staticmethod
+    def forward(ctx, inp: torch.Tensor, quant_w: torch.Tensor,
+                scale_w: torch.Tensor, weight_bit_width):
+        ctx.inp_shape = inp.size()
+        ctx.weight_shape = quant_w.size()
+        ctx.weight_bit_width = weight_bit_width
+        out_features = quant_w.size(0)
+        inp = inp.contiguous().view(-1, inp.size(-1))
+        weight = extract_weight_to_half(quant_w, scale_w, weight_bit_width)
+        output = inp.mm(weight.t())
+        ctx.save_for_backward(inp, quant_w, scale_w)
+        return output.view(*(ctx.inp_shape[:-1] + (out_features, )))
+
+    @staticmethod
+    def backward(ctx, grad_output: torch.Tensor):
+        inp, quant_w, scale_w = ctx.saved_tensors
+        weight = extract_weight_to_half(quant_w, scale_w, ctx.weight_bit_width)
+        grad_output = grad_output.contiguous().view(-1, weight.size(0))
+        grad_input = grad_output.mm(weight)
+        grad_weight = grad_output.t().mm(inp)
+        return grad_input.view(ctx.inp_shape), grad_weight.view(
+            ctx.weight_shape), None
--- a/modelscope/models/nlp/glm_130b/quantization/layers.py
+++ b/modelscope/models/nlp/glm_130b/quantization/layers.py
@@ -0,0 +1,113 @@
+# Copyright (c) 2022 Zhipu.AI
+import torch
+from SwissArmyTransformer.mpu import (ColumnParallelLinear, RowParallelLinear,
+                                      copy_to_model_parallel_region,
+                                      gather_from_model_parallel_region,
+                                      reduce_from_model_parallel_region,
+                                      scatter_to_model_parallel_region)
+from torch.nn.parameter import Parameter
+
+from ..kernels import compress_int4_weight
+from .functional import W8A16Linear
+
+
+class QuantizedColumnParallelLinear(ColumnParallelLinear):
+
+    def __init__(self, weight_bit_width: int, weight=None, *args, **kwargs):
+        super(QuantizedColumnParallelLinear, self).__init__(*args, **kwargs)
+        self.weight_bit_width = weight_bit_width
+
+        shape = self.weight.shape
+        del self.weight
+
+        if weight is None:
+            self.weight = torch.empty(
+                shape[0],
+                shape[1] * weight_bit_width // 8,
+                dtype=torch.int8,
+                device=kwargs['device'])
+            self.weight_scale = torch.empty(
+                shape[0],
+                dtype=kwargs['params_dtype'],
+                device=kwargs['device'])
+        else:
+            self.weight_scale = (
+                weight.abs().max(dim=-1).values / (  # noqa
+                    (2**(weight_bit_width - 1)) - 1)).half()  # noqa
+            self.weight = torch.round(weight / self.weight_scale[:, None]).to(
+                torch.int8)
+            if weight_bit_width == 4:
+                self.weight = compress_int4_weight(self.weight)
+
+        self.weight = Parameter(
+            self.weight.to(kwargs['device']), requires_grad=False)
+        self.weight_scale = Parameter(
+            self.weight_scale.to(kwargs['device']), requires_grad=False)
+
+    def forward(self, input_):
+        # Set up backprop all-reduce.
+        input_parallel = copy_to_model_parallel_region(input_)
+        # Matrix multiply.
+        output_parallel = W8A16Linear.apply(input_parallel, self.weight,
+                                            self.weight_scale,
+                                            self.weight_bit_width)
+        if self.bias is not None:
+            output_parallel = output_parallel + self.bias
+        if self.gather_output:
+            # All-gather across the partitions.
+            output = gather_from_model_parallel_region(output_parallel)
+        else:
+            output = output_parallel
+        return output
+
+
+class QuantizedRowParallelLinear(RowParallelLinear):
+
+    def __init__(self, weight_bit_width: int, weight=None, *args, **kwargs):
+        super(QuantizedRowParallelLinear, self).__init__(*args, **kwargs)
+        self.weight_bit_width = weight_bit_width
+
+        shape = self.weight.shape
+        del self.weight
+
+        if weight is None:
+            self.weight = torch.empty(
+                shape[0],
+                shape[1] * weight_bit_width // 8,
+                dtype=torch.int8,
+                device=kwargs['device'])
+            self.weight_scale = torch.empty(
+                shape[0],
+                dtype=kwargs['params_dtype'],
+                device=kwargs['device'])
+        else:
+            self.weight_scale = (
+                weight.abs().max(dim=-1).values / (  # noqa
+                    (2**(weight_bit_width - 1)) - 1)).half()  # noqa
+            self.weight = torch.round(weight / self.weight_scale[:, None]).to(
+                torch.int8)
+            if weight_bit_width == 4:
+                self.weight = compress_int4_weight(self.weight)
+
+        self.weight = Parameter(
+            self.weight.to(kwargs['device']), requires_grad=False)
+        self.weight_scale = Parameter(
+            self.weight_scale.to(kwargs['device']), requires_grad=False)
+
+    def forward(self, input_):
+        # Set up backprop all-reduce.
+        if self.input_is_parallel:
+            input_parallel = input_
+        else:
+            input_parallel = scatter_to_model_parallel_region(input_)
+        # Matrix multiply.
+        output_parallel = W8A16Linear.apply(input_parallel, self.weight,
+                                            self.weight_scale,
+                                            self.weight_bit_width)
+        # All-reduce across all the partitions.
+        output_ = reduce_from_model_parallel_region(output_parallel)
+        if self.bias is not None:
+            output = output_ + self.bias
+        else:
+            output = output_
+        return output
--- a/modelscope/models/nlp/glm_130b/text_generation.py
+++ b/modelscope/models/nlp/glm_130b/text_generation.py
@@ -0,0 +1,354 @@
+# Copyright (c) 2022 Zhipu.AI
+import copy
+import os
+import random
+import re
+import stat
+import sys
+import time
+from functools import partial
+from typing import Any, Dict, List, Tuple
+
+import torch
+from SwissArmyTransformer import mpu
+from SwissArmyTransformer.generation.autoregressive_sampling import (
+    get_masks_and_position_ids_default, update_mems)
+from SwissArmyTransformer.generation.utils import (generate_continually,
+                                                   timed_name)
+
+from modelscope.metainfo import Models
+from modelscope.models.base import TorchModel
+from modelscope.models.builder import MODELS
+from modelscope.outputs import OutputKeys
+from modelscope.utils.config import Config
+from modelscope.utils.constant import ModelFile, Tasks
+from modelscope.utils.logger import get_logger
+from .generation import BaseStrategy, BeamSearchStrategy
+from .initialize import initialize, initialize_model_and_tokenizer
+
+torch.set_num_threads(24)
+
+logger = get_logger()
+
+
+def batch_filling_sequence(
+        model,
+        seqs,
+        context_lengths,
+        strategy,
+        max_memory_length=100000,
+        get_masks_and_position_ids=get_masks_and_position_ids_default,
+        mems=None,
+        **kw_args):
+    '''
+        seq: [2, 3, 5, ..., -1(to be generated), -1, ...]
+        mems: [num_layers, batch_size, len_mems(index), mem_hidden_size]
+            cache, should be first mems.shape[1] parts of context_tokens.
+            mems are the first-level citizens here, but we don't assume what is memorized.
+            input mems are used when multi-phase generation.
+    '''
+    assert len(seqs.shape) == 2
+
+    # building the initial tokens, attention_mask, and position_ids
+    batch_size, context_length = seqs.shape
+    seqs, attention_mask, position_ids = get_masks_and_position_ids(seqs)
+    tokens = seqs[..., :context_length]
+    if attention_mask.dtype != torch.bool:
+        attention_mask = attention_mask.type_as(next(
+            model.parameters()))  # if fp16
+    # initialize generation
+    counter = context_length - 1  # Last fixed index is ``counter''
+    index = 0 if mems is None else mems.shape[
+        2]  # Next forward starting index, also the length of cache.
+    num_beams = 1
+    # step-by-step generation
+    while counter < seqs.shape[1] - 1:
+        # Now, we want to generate seq[counter + 1],
+        # token[:, index: counter+1] needs forwarding.
+        # forward
+        tokens = tokens.reshape(batch_size * num_beams, -1)
+        mems = mems.reshape(mems.shape[0], batch_size
+                            * num_beams, mems.shape[-2],
+                            mems.shape[-1]) if mems is not None else None
+        logits, *output_per_layers = model(
+            tokens[:, index:],
+            position_ids[..., index:counter + 1],
+            attention_mask[...,
+                           index:counter + 1, :counter + 1],  # TODO memlen
+            mems=mems,
+            **kw_args)
+        mem_kv = [o['mem_kv'] for o in output_per_layers]
+        mems = update_mems(mem_kv, mems, max_memory_length=max_memory_length)
+        if counter == context_length - 1:
+            logits = logits[torch.arange(batch_size), context_lengths - 1]
+        else:
+            logits = logits[:, -1]
+        counter += 1
+        index = counter
+
+        # sampling
+        logits = logits.reshape(batch_size, num_beams, -1)
+        tokens = tokens.reshape(batch_size, num_beams, -1)
+        mems = mems.reshape(mems.shape[0], batch_size, num_beams,
+                            mems.shape[-2], mems.shape[-1])
+        tokens, mems = strategy.forward(logits, tokens, mems)
+        if len(tokens.shape) == 3 and num_beams == 1:
+            num_beams = tokens.shape[1]
+            position_ids = position_ids.unsqueeze(1).expand(
+                batch_size, num_beams, -1).reshape(batch_size * num_beams, -1)
+            attention_mask_shape = attention_mask.shape[-3:]
+            attention_mask = attention_mask.unsqueeze(1).expand(
+                batch_size, num_beams, -1, -1,
+                -1).reshape(batch_size * num_beams, *attention_mask_shape)
+        if strategy.is_done:
+            break
+    return strategy.finalize(tokens, mems)
+
+
+def add_generation_specific_args(parser):
+    parser.add_argument(
+        '--sampling-strategy',
+        type=str,
+        default='BaseStrategy',
+        help='Type of sampling strategy.')
+    parser.add_argument(
+        '--min-gen-length',
+        type=int,
+        default=0,
+        help='The minimum length each blank should generate.')
+    parser.add_argument(
+        '--print-all-beams',
+        action='store_true',
+        help='Print all output generated by beam search strategy.')
+
+
+def isEnglish(s):
+    try:
+        s.encode(encoding='utf-8').decode('ascii')
+    except UnicodeDecodeError:
+        return False
+    else:
+        return True
+
+
+def get_masks_and_position_ids(seq,
+                               mask_position,
+                               max_gen_length,
+                               gmask=False):
+    context_length = seq.shape[1]
+    tokens = torch.nn.functional.pad(
+        seq, (0, max_gen_length), mode='constant', value=-1)
+    attention_mask = torch.ones((1, tokens.shape[-1], tokens.shape[-1]),
+                                device=tokens.device)
+    attention_mask.tril_()
+    attention_mask[..., :context_length - 1] = 1
+    attention_mask.unsqueeze_(1)
+    attention_mask = (attention_mask < 0.5).bool()
+
+    position_ids = torch.arange(
+        tokens.shape[-1], dtype=torch.long, device=tokens.device)
+    if not gmask:
+        position_ids[context_length - 1:] = mask_position
+
+    position_ids = position_ids.unsqueeze(0)
+
+    return tokens, attention_mask, position_ids
+
+
+def fill_blanks(args, raw_text: str, model, tokenizer,
+                strategy) -> Tuple[List[str], List[str], List[List[str]]]:
+    # add MASK
+    generation_mask = '[gMASK]'
+    if '[MASK]' in raw_text:
+        generation_mask = '[MASK]'
+    elif '[sMASK]' in raw_text:
+        generation_mask = '[sMASK]'
+    use_gmask = '[MASK]' not in raw_text and '[sMASK]' not in raw_text
+
+    mask_pattern = r'\[[sg]?MASK\]'
+    text_list = re.split(mask_pattern, raw_text)
+    pattern_list = re.compile(mask_pattern).findall(raw_text)
+    seq = []
+    for i in range(len(pattern_list)):
+        pattern = pattern_list[i]
+        sub_text = text_list[i]
+        seq.extend(tokenizer.tokenize(sub_text))
+        seq.append(tokenizer.get_command(pattern))
+
+    seq.extend(tokenizer.tokenize(text_list[-1]))
+
+    if 'MASK]' not in raw_text:
+        seq += [tokenizer.get_command(generation_mask)]
+        raw_text += ' ' + generation_mask
+    if not raw_text.endswith('MASK]'):
+        seq = seq + [tokenizer.get_command('eos')]
+    if mpu.get_model_parallel_rank() == 0:
+        logger.info('\nInput: {}\n'.format(raw_text))
+    if len(seq) > args.max_sequence_length:
+        raise ValueError('text too long.')
+
+    # generation
+    is_english = isEnglish(raw_text)
+    output_list = [seq]
+    num_output = args.num_beams if args.sampling_strategy == 'BeamSearchStrategy' else 1
+    last_pos, answers, answers_with_style, blanks = (
+        [0] * num_output,
+        ['' for _ in range(num_output)],
+        ['' for _ in range(num_output)],
+        [[] for _ in range(num_output)],
+    )
+
+    # continually detect the first mark position
+    while True:
+        seq = output_list[0]
+        # detect mask position
+        mask_token = tokenizer.get_command(generation_mask)
+        if mask_token not in seq:
+            break
+        mask_position = seq.index(mask_token)
+
+        output_list = []
+
+        input_seq = torch.cuda.LongTensor(
+            [seq + [tokenizer.get_command('sop')]],
+            device=args.device,
+        )
+        output, _ = batch_filling_sequence(
+            model,
+            input_seq,
+            torch.cuda.LongTensor([input_seq.shape[-1]], device=args.device),
+            strategy=strategy,
+            get_masks_and_position_ids=partial(
+                get_masks_and_position_ids,
+                mask_position=mask_position,
+                max_gen_length=args.out_seq_length - input_seq.shape[-1],
+                gmask=use_gmask,
+            ),
+        )
+        if isinstance(output, torch.Tensor):  # different strategies
+            output = output.tolist()
+        output = output[0]  # batch_size = 1
+        output_list.extend(output)
+
+        # clip -1s and fill back generated things into seq
+        for i in range(len(output_list)):
+            output = output_list[i].tolist() if isinstance(
+                output_list[i], torch.Tensor) else output_list[i]
+            try:
+                unfinished = output.index(-1)
+            except ValueError:
+                unfinished = len(output)
+            if output[unfinished - 1] in strategy.end_tokens:
+                unfinished -= 1
+            bog = output.index(tokenizer.get_command('sop'))
+
+            prefix = tokenizer.detokenize(output[last_pos[i]:mask_position])
+            blank = tokenizer.detokenize(output[bog + 1:unfinished])
+            answers_with_style[i] += (
+                prefix + (' ' if is_english else '') +  # noqa
+                ('\033[4m' if use_gmask else '\x1b[0;32m\033[4m') + blank
+                +  # noqa
+                ('\033[0m' if use_gmask else '\033[0m\x1b[0m') +  # noqa
+                (' ' if is_english else ''))  # noqa
+            blanks[i].append(blank)
+            last_pos[i] = mask_position + unfinished - (bog + 1)
+            output_list[i] = output[:mask_position] + output[
+                bog + 1:unfinished] + output[mask_position + 1:bog]
+
+    for i, output in enumerate(output_list):
+        if output[-1] == tokenizer.get_command('eos'):
+            output = output[:-1]
+        answers_with_style[i] += tokenizer.detokenize(output[last_pos[i]:])
+        answers[i] = tokenizer.detokenize(output)
+
+    return answers, answers_with_style, blanks
+
+
+@MODELS.register_module(Tasks.text_generation, module_name=Models.glm130b)
+class GLM130bForTextGeneration(TorchModel):
+
+    def __init__(self, model_dir: str, *args, **kwargs):
+        # """initialize the glm130b model from the `model_dir` path.
+
+        # Args:
+        #     model_dir (str): the model path.
+        # """
+        super().__init__(model_dir, *args, **kwargs)
+        self.cfg = Config.from_file(model_dir + '/' + ModelFile.CONFIGURATION)
+        args = initialize(extra_args_provider=add_generation_specific_args)
+        args.seed = random.randint(1, sys.maxsize - 1)
+        args.sampling_strategy = self.cfg.model.sampling_strategy
+        args.out_seq_length = self.cfg.model.out_seq_length
+        args.min_gen_length = self.cfg.model.min_gen_length
+        args.num_beams = self.cfg.model.num_beams
+        args.length_penalty = self.cfg.model.length_penalty
+        args.no_repeat_ngram_size = self.cfg.model.no_repeat_ngram_size
+        args.temperature = self.cfg.model.temperature
+        args.top_k = self.cfg.model.top_k
+        args.top_p = self.cfg.model.top_p
+        args.load = model_dir
+
+        logger.info('Loading model and tokenizer ...')
+        self.model, self.tokenizer = initialize_model_and_tokenizer(args)
+
+        end_tokens = [
+            self.tokenizer.get_command('eop'),
+            self.tokenizer.get_command('eos')
+        ]
+
+        if args.sampling_strategy == 'BaseStrategy':
+            self.strategy = BaseStrategy(
+                batch_size=1,
+                temperature=args.temperature,
+                top_k=args.top_k,
+                top_p=args.top_p,
+                end_tokens=end_tokens)
+        elif args.sampling_strategy == 'BeamSearchStrategy':
+            self.strategy = BeamSearchStrategy(
+                1,
+                args.num_beams,
+                length_penalty=args.length_penalty,
+                consider_end=True,
+                end_tokens=end_tokens,
+                no_repeat_ngram_size=args.no_repeat_ngram_size,
+                min_gen_length=args.min_gen_length,
+            )
+        else:
+            raise ValueError(f'unknown strategy {args.sampling_strategy}')
+
+        self.args = args
+
+    def func(self, raw_text):
+        answers, answers_with_style, blanks = fill_blanks(
+            self.args, raw_text, self.model, self.tokenizer, self.strategy)
+
+        if mpu.get_model_parallel_rank() == 0:
+            logger.info('Output:' + str(answers_with_style[0]))
+
+        return str(answers_with_style[0])
+
+    def forward(self, input: str) -> Dict[str, str]:
+        raw_text, is_stop = '', False
+        if torch.distributed.get_rank() == 0:
+            raw_text = input
+            if not raw_text:
+                return {OutputKeys.TEXT: 'Query should not be empty!'}
+            if raw_text == 'stop':
+                is_stop = True
+            torch.distributed.broadcast_object_list([raw_text, is_stop])
+        else:
+            info = [raw_text, is_stop]
+            torch.distributed.broadcast_object_list(info)
+            raw_text, is_stop = info
+        if is_stop:
+            return
+        try:
+            start_time = time.time()
+            res = self.func(raw_text)
+            if torch.distributed.get_rank() == 0:
+                logger.info('\nTaken time {:.2f}\n'.format(time.time()
+                                                           - start_time))
+        except (ValueError, FileNotFoundError) as e:
+            return {OutputKeys.TEXT: str(e)}
+        logger.info('Generation finished.')
+        return {OutputKeys.TEXT: res}
--- a/modelscope/pipelines/nlp/init.py
+++ b/modelscope/pipelines/nlp/init.py
@@ -35,6 +35,7 @@ if TYPE_CHECKING:
    from .mglm_text_summarization_pipeline import MGLMTextSummarizationPipeline
    from .codegeex_code_translation_pipeline import CodeGeeXCodeTranslationPipeline
    from .codegeex_code_generation_pipeline import CodeGeeXCodeGenerationPipeline
+    from .glm130b_text_generation_pipeline import GLM130bTextGenerationPipeline
    from .translation_evaluation_pipeline import TranslationEvaluationPipeline
    from .user_satisfaction_estimation_pipeline import UserSatisfactionEstimationPipeline
    from .siamese_uie_pipeline import SiameseUiePipeline
@@ -89,6 +90,7 @@ else:
        ['CodeGeeXCodeTranslationPipeline'],
        'codegeex_code_generation_pipeline':
        ['CodeGeeXCodeGenerationPipeline'],
+        'glm130b_text_generation_pipeline': ['GLM130bTextGenerationPipeline'],
        'translation_evaluation_pipeline': ['TranslationEvaluationPipeline'],
        'user_satisfaction_estimation_pipeline':
        ['UserSatisfactionEstimationPipeline'],
--- a/modelscope/pipelines/nlp/glm130b_text_generation_pipeline.py
+++ b/modelscope/pipelines/nlp/glm130b_text_generation_pipeline.py
@@ -0,0 +1,29 @@
+# Copyright (c) 2022 Zhipu.AI
+
+from typing import Any, Dict, Union
+
+from modelscope.metainfo import Pipelines
+from modelscope.models.nlp import GLM130bForTextGeneration
+from modelscope.pipelines.base import Pipeline
+from modelscope.pipelines.builder import PIPELINES
+from modelscope.utils.constant import Tasks
+
+
+@PIPELINES.register_module(
+    group_key=Tasks.text_generation,
+    module_name=Pipelines.glm130b_text_generation)
+class GLM130bTextGenerationPipeline(Pipeline):
+
+    def __init__(self, model: Union[GLM130bForTextGeneration, str], *args,
+                 **kwargs):
+        model = GLM130bForTextGeneration(model) if isinstance(model,
+                                                              str) else model
+        self.model = model
+
+    def __call__(self, input: str, **forward_params) -> Dict[str, Any]:
+        return self.model(input)
+
+    def postprocess(self, input, **kwargs) -> Dict[str, Any]:
+        """This method will not be called.
+        """
+        return input
--- a/modelscope/preprocessors/nlp/init.py
+++ b/modelscope/preprocessors/nlp/init.py
@@ -34,7 +34,7 @@ if TYPE_CHECKING:
    from .siamese_uie_preprocessor import SiameseUiePreprocessor
    from .document_grounded_dialog_generate_preprocessor import DocumentGroundedDialogGeneratePreprocessor
    from .document_grounded_dialog_retrieval_preprocessor import DocumentGroundedDialogRetrievalPreprocessor
-    from .document_grounded_dialog_retrieval_preprocessor import DocumentGroundedDialogRerankPreprocessor
+    from .document_grounded_dialog_rerank_preprocessor import DocumentGroundedDialogRerankPreprocessor
 else:
    _import_structure = {
        'bert_seq_cls_tokenizer': ['Tokenize'],
--- a/modelscope/utils/plugins.py
+++ b/modelscope/utils/plugins.py
@@ -1,9 +1,12 @@
 # Copyright (c) Alibaba, Inc. and its affiliates.
 # This file is adapted from the AllenNLP library at https://github.com/allenai/allennlp
+# Part of the implementation is borrowed from wimglenn/johnnydep
+
 import copy
 import importlib
 import os
 import pkgutil
+import shutil
 import sys
 import venv
 from contextlib import contextmanager
@@ -263,14 +266,22 @@ def install_module_from_requirements(requirement_path, ):

    """

-    install_args = ['-r', requirement_path]
-    status_code, _, args = PluginsManager.pip_command(
-        'install',
-        install_args,
-    )
-    if status_code != 0:
-        raise ImportError(
-            f'Failed to install requirements from {requirement_path}')
+    install_list = []
+    with open(requirement_path, 'r', encoding='utf-8') as f:
+        requirements = f.read().splitlines()
+        for req in requirements:
+            installed, _ = PluginsManager.check_plugin_installed(req)
+            if not installed:
+                install_list.append(req)
+
+    if len(install_list) > 0:
+        status_code, _, args = PluginsManager.pip_command(
+            'install',
+            install_list,
+        )
+        if status_code != 0:
+            raise ImportError(
+                f'Failed to install requirements from {requirement_path}')


 def import_module_from_file(module_name, file_path):
@@ -298,18 +309,6 @@ def import_module_from_model_dir(model_dir):
        import_module_from_file(module_name, file)


-def install_modelscope_if_need():
-    plugin_installed, version = PluginsManager.check_plugin_installed(
-        'modelscope')
-    if not plugin_installed:
-        status_code, _, args = PluginsManager.pip_command(
-            'install',
-            ['modelscope'],
-        )
-        if status_code != 0:
-            raise ImportError('Failed to install package modelscope')
-
-
 def install_requirements_by_names(plugins: List[str]):
    plugins_manager = PluginsManager()
    uninstalled_plugins = []
@@ -324,20 +323,21 @@ def install_requirements_by_names(plugins: List[str]):
            f'The required packages {",".join(uninstalled_plugins)} are not installed.',
            f'Please run the command `modelscope plugin install {" ".join(uninstalled_plugins)}` to install them.'
        )
-    install_modelscope_if_need()


 def install_requirements_by_files(requirements: List[str]):
    for requirement in requirements:
        install_module_from_requirements(requirement)
-    install_modelscope_if_need()


 def register_plugins_repo(plugins: List[str]) -> None:
    """ Try to install and import plugins from repo"""
    if plugins is not None:
        install_requirements_by_names(plugins)
-        import_plugins(plugins)
+        modules = []
+        for plugin in plugins:
+            modules.extend(get_modules_from_package(plugin))
+        import_plugins(modules)


 def register_modelhub_repo(model_dir, allow_remote=False) -> None:
@@ -351,6 +351,256 @@ def register_modelhub_repo(model_dir, allow_remote=False) -> None:
            pass


+DEFAULT_INDEX = 'https://pypi.org/simple/'
+
+
+def get_modules_from_package(package):
+    """ to get the modules from a installed package
+
+    Args:
+        package: The distribution name or package name
+
+    Returns:
+
+    """
+    from zipfile import ZipFile
+    from tempfile import mkdtemp
+    from subprocess import check_output, STDOUT
+    from glob import glob
+    import hashlib
+    from urllib.parse import urlparse
+    from urllib import request as urllib2
+    from pip._internal.utils.packaging import get_requirement
+    req = get_requirement(package)
+    package = req.name
+
+    def urlretrieve(url, filename, data=None, auth=None):
+        if auth is not None:
+            # https://docs.python.org/2.7/howto/urllib2.html#id6
+            password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
+
+            # Add the username and password.
+            # If we knew the realm, we could use it instead of None.
+            username, password = auth
+            top_level_url = urlparse(url).netloc
+            password_mgr.add_password(None, top_level_url, username, password)
+
+            handler = urllib2.HTTPBasicAuthHandler(password_mgr)
+
+            # create "opener" (OpenerDirector instance)
+            opener = urllib2.build_opener(handler)
+        else:
+            opener = urllib2.build_opener()
+
+        res = opener.open(url, data=data)
+
+        headers = res.info()
+
+        with open(filename, 'wb') as fp:
+            fp.write(res.read())
+
+        return filename, headers
+
+    def compute_checksum(target, algorithm='sha256', blocksize=2**13):
+        hashtype = getattr(hashlib, algorithm)
+        hash_ = hashtype()
+        logger.debug('computing checksum', target=target, algorithm=algorithm)
+        with open(target, 'rb') as f:
+            for chunk in iter(lambda: f.read(blocksize), b''):
+                hash_.update(chunk)
+        result = hash_.hexdigest()
+        logger.debug('computed checksum', result=result)
+        return result
+
+    def _get_pip_version():
+        # try to get pip version without actually importing pip
+        # setuptools gets upset if you import pip before importing setuptools..
+        try:
+            import importlib.metadata  # Python 3.8+
+            return importlib.metadata.version('pip')
+        except Exception:
+            pass
+        import pip
+        return pip.__version__
+
+    def _download_dist(url, scratch_file, index_url, extra_index_url):
+        auth = None
+        if index_url:
+            parsed = urlparse(index_url)
+            if parsed.username and parsed.password and parsed.hostname == urlparse(
+                    url).hostname:
+                # handling private PyPI credentials in index_url
+                auth = (parsed.username, parsed.password)
+        if extra_index_url:
+            parsed = urlparse(extra_index_url)
+            if parsed.username and parsed.password and parsed.hostname == urlparse(
+                    url).hostname:
+                # handling private PyPI credentials in extra_index_url
+                auth = (parsed.username, parsed.password)
+        target, _headers = urlretrieve(url, scratch_file, auth=auth)
+        return target, _headers
+
+    def _get_wheel_args(index_url, env, extra_index_url):
+        args = [
+            sys.executable,
+            '-m',
+            'pip',
+            'wheel',
+            '-vvv',  # --verbose x3
+            '--no-deps',
+            '--no-cache-dir',
+            '--disable-pip-version-check',
+        ]
+        if index_url is not None:
+            args += ['--index-url', index_url]
+            if index_url != DEFAULT_INDEX:
+                hostname = urlparse(index_url).hostname
+                if hostname:
+                    args += ['--trusted-host', hostname]
+        if extra_index_url is not None:
+            args += [
+                '--extra-index-url', extra_index_url, '--trusted-host',
+                urlparse(extra_index_url).hostname
+            ]
+        if env is None:
+            pip_version = _get_pip_version()
+        else:
+            pip_version = dict(env)['pip_version']
+            args[0] = dict(env)['python_executable']
+        pip_major, pip_minor = pip_version.split('.')[0:2]
+        pip_major = int(pip_major)
+        pip_minor = int(pip_minor)
+        if pip_major >= 10:
+            args.append('--progress-bar=off')
+        if (20, 3) <= (pip_major, pip_minor) < (21, 1):
+            # See https://github.com/pypa/pip/issues/9139#issuecomment-735443177
+            args.append('--use-deprecated=legacy-resolver')
+        return args
+
+    def get(dist_name,
+            index_url=None,
+            env=None,
+            extra_index_url=None,
+            tmpdir=None,
+            ignore_errors=False):
+        args = _get_wheel_args(index_url, env, extra_index_url) + [dist_name]
+        scratch_dir = mkdtemp(dir=tmpdir)
+        logger.debug(
+            'wheeling and dealing',
+            scratch_dir=os.path.abspath(scratch_dir),
+            args=' '.join(args))
+        try:
+            out = check_output(
+                args, stderr=STDOUT, cwd=scratch_dir).decode('utf-8')
+        except ChildProcessError as err:
+            out = getattr(err, 'output', b'').decode('utf-8')
+            logger.warning(out)
+            if not ignore_errors:
+                raise
+        logger.debug('wheel command completed ok', dist_name=dist_name)
+        links = []
+        local_links = []
+        lines = out.splitlines()
+        for i, line in enumerate(lines):
+            line = line.strip()
+            if line.startswith('Downloading from URL '):
+                parts = line.split()
+                link = parts[3]
+                links.append(link)
+            elif line.startswith('Downloading '):
+                parts = line.split()
+                last = parts[-1]
+                if len(parts) == 3 and last.startswith('(') and last.endswith(
+                        ')'):
+                    link = parts[-2]
+                elif len(parts) == 4 and parts[-2].startswith(
+                        '(') and last.endswith(')'):
+                    link = parts[-3]
+                    if not urlparse(link).scheme:
+                        # newest pip versions have changed to not log the full url
+                        # in the download event. it is becoming more and more annoying
+                        # to preserve compatibility across a wide range of pip versions
+                        next_line = lines[i + 1].strip()
+                        if next_line.startswith(
+                                'Added ') and ' to build tracker' in next_line:
+                            link = next_line.split(
+                                ' to build tracker')[0].split()[-1]
+                else:
+                    link = last
+                links.append(link)
+            elif line.startswith(
+                    'Source in ') and 'which satisfies requirement' in line:
+                link = line.split()[-1]
+                links.append(link)
+            elif line.startswith('Added ') and ' from file://' in line:
+                [link] = [x for x in line.split() if x.startswith('file://')]
+                local_links.append(link)
+        if not links:
+            # prefer http scheme over file
+            links += local_links
+        links = list(dict.fromkeys(links))  # order-preserving dedupe
+        if not links:
+            logger.warning('could not find download link', out=out)
+            raise Exception('failed to collect dist')
+        if len(links) == 2:
+            # sometimes we collect the same link, once with a url fragment/checksum and once without
+            first, second = links
+            if first.startswith(second):
+                del links[1]
+            elif second.startswith(first):
+                del links[0]
+        if len(links) > 1:
+            logger.debug('more than 1 link collected', out=out, links=links)
+            # Since PEP 517, maybe an sdist will also need to collect other distributions
+            # for the build system, even with --no-deps specified. pendulum==1.4.4 is one
+            # example, which uses poetry and doesn't publish any python37 wheel to PyPI.
+            # However, the dist itself should still be the first one downloaded.
+        link = links[0]
+        whls = glob(os.path.join(os.path.abspath(scratch_dir), '*.whl'))
+        try:
+            [whl] = whls
+        except ValueError:
+            if ignore_errors:
+                whl = ''
+            else:
+                raise
+        url, _sep, checksum = link.partition('#')
+        url = url.replace(
+            '/%2Bf/', '/+f/'
+        )  # some versions of pip did not unquote this fragment in the log
+        if not checksum.startswith('md5=') and not checksum.startswith(
+                'sha256='):
+            # PyPI gives you the checksum in url fragment, as a convenience. But not all indices are so kind.
+            algorithm = 'md5'
+            if os.path.basename(whl).lower() == url.rsplit('/', 1)[-1].lower():
+                target = whl
+            else:
+                scratch_file = os.path.join(scratch_dir, os.path.basename(url))
+                target, _headers = _download_dist(url, scratch_file, index_url,
+                                                  extra_index_url)
+            checksum = compute_checksum(target=target, algorithm=algorithm)
+            checksum = '='.join([algorithm, checksum])
+        result = {'path': whl, 'url': url, 'checksum': checksum}
+        return result
+
+    def discover_import_names(whl_file):
+        logger.debug('finding import names')
+        zipfile = ZipFile(file=whl_file)
+        namelist = zipfile.namelist()
+        [top_level_fname
+         ] = [x for x in namelist if x.endswith('top_level.txt')]
+        all_names = zipfile.read(top_level_fname).decode(
+            'utf-8').strip().splitlines()
+        public_names = [n for n in all_names if not n.startswith('_')]
+        return public_names
+
+    tmpdir = mkdtemp()
+    data = get(package, tmpdir=tmpdir)
+    import_names = discover_import_names(data['path'])
+    shutil.rmtree(tmpdir)
+    return import_names
+
+
 class PluginsManager(object):

    def __init__(self,
@@ -370,11 +620,31 @@ class PluginsManager(object):

    @staticmethod
    def check_plugin_installed(package):
+        """ Check if the plugin is installed, and if the version is valid
+
+        Args:
+            package: the package name need to be installed
+
+        Returns:
+
+        """
+        from pip._internal.utils.packaging import get_requirement, specifiers
+        req = get_requirement(package)
+
        try:
            importlib.reload(pkg_resources)
-            package_meta_info = pkg_resources.working_set.by_key[package]
+            package_meta_info = pkg_resources.working_set.by_key[req.name]
            version = package_meta_info.version
+
+            # To test if the package is installed
            installed = True
+
+            # If installed, test if the version is correct
+            for spec in req.specifier:
+                installed_valid_version = spec.contains(version)
+                if not installed_valid_version:
+                    installed = False
+                    break
        except KeyError:
            version = ''
            installed = False
@@ -402,6 +672,10 @@ class PluginsManager(object):
        options, args = command.parse_args(command_args)

        status_code = command.main(command_args)
+
+        # reload the pkg_resources in order to get the latest pkgs information
+        importlib.reload(pkg_resources)
+
        return status_code, options, args

    def install_plugins(self,
@@ -722,3 +996,4 @@ class EnvsManager(object):

 if __name__ == '__main__':
    install_requirements_by_files(['adaseq'])
+    import_name = get_modules_from_package('pai-easycv')
--- a/tests/pipelines/plugin_remote_pipelines/configuration.json
+++ b/tests/pipelines/plugin_remote_pipelines/configuration.json
@@ -0,0 +1 @@
+{"framework":"pytorch","task":"bilibili","model":{"type":"my-custom-model","scale":2,"weight_path":"weights_v3/up2x-latest-denoise3x.pth","half":true},"pipeline":{"type":"my-custom-pipeline"}}
--- a/tests/pipelines/plugin_remote_pipelines/test_plugin_model.py
+++ b/tests/pipelines/plugin_remote_pipelines/test_plugin_model.py
@@ -15,8 +15,6 @@ class PluginModelTest(unittest.TestCase, DemoCompatibilityCheck):

    def tearDown(self):
        # make sure uninstalled after installing
-        uninstall_args = [self.package, '-y']
-        PluginsManager.pip_command('uninstall', uninstall_args)
        super().tearDown()
        import subprocess
        result = subprocess.run(
				`@@ -0,0 +1 @@`
				`{"framework":"pytorch","task":"bilibili","model":{"type":"my-custom-model","scale":2,"weight_path":"weights_v3/up2x-latest-denoise3x.pth","half":true},"pipeline":{"type":"my-custom-pipeline"}}`