tests/preprocessors/test_nlp.py

# Copyright (c) Alibaba, Inc. and its affiliates.

import unittest

from modelscope.preprocessors import build_preprocessor, nlp
from modelscope.utils.constant import Fields, InputFields
from modelscope.utils.logger import get_logger

logger = get_logger()


class NLPPreprocessorTest(unittest.TestCase):

    def test_tokenize(self):
        cfg = dict(type='Tokenize', tokenizer_name='bert-base-cased')
        preprocessor = build_preprocessor(cfg, Fields.nlp)
        input = {
            InputFields.text:
            'Do not meddle in the affairs of wizards, '
            'for they are subtle and quick to anger.'
        }
        output = preprocessor(input)
        self.assertTrue(InputFields.text in output)
        self.assertEqual(output['input_ids'], [
            101, 2091, 1136, 1143, 13002, 1107, 1103, 5707, 1104, 16678, 1116,
            117, 1111, 1152, 1132, 11515, 1105, 3613, 1106, 4470, 119, 102
        ])
        self.assertEqual(
            output['token_type_ids'],
            [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
        self.assertEqual(
            output['attention_mask'],
            [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])


if __name__ == '__main__':
    unittest.main()
[to #41401401] add preprocessor, model and pipeline * add preprocessor module * add model base and builder * update task constant * add load image preprocessor and its dependency * add pipeline interface and UT covered * support default pipeline for task * add image matting pipeline * refine nlp tokenize interface * add nlp pipeline * fix UT failed * add test for Compose Link: https://code.aone.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/8769235 * add preprocessor module * add test for Compose * fix citest error * fix abs class error * add model base and builder * update task constant * add load image preprocessor and its dependency * add pipeline interface and UT covered * support default pipeline for task * refine models and pipeline interface * add pipeline folder structure * add image matting pipeline * refine nlp tokenize interface * add nlp pipeline 1.add preprossor model pipeline for nlp text classification 2. add corresponding test Link: https://code.aone.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/8757371 * new nlp pipeline * format pre-commit code * update easynlp pipeline * update model_name for easynlp pipeline; add test for maas_lib/utils/typeassert.py * update test_typeassert.py * refactor code 1. rename typeassert to type_assert 2. use lazy import to make easynlp dependency optional 3. refine image matting UT * fix linter test failed * update requirements.txt * fix UT failed * fix citest script to update requirements 2022-05-19 22:18:35 +08:00			`# Copyright (c) Alibaba, Inc. and its affiliates.`

			`import unittest`

[to #44236829] import nlp preprocessor to make UT run successfully while igoring ast scanning error 2022-08-19 10:23:39 +08:00			`from modelscope.preprocessors import build_preprocessor, nlp`
[to #42373878] refactor maaslib to modelscope 1. refactor maaslib to modelscope 2. fix UT error 3. support pipeline which does not register default model Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/8988388 2022-06-09 20:16:26 +08:00			`from modelscope.utils.constant import Fields, InputFields`
			`from modelscope.utils.logger import get_logger`
[to #41401401] add preprocessor, model and pipeline * add preprocessor module * add model base and builder * update task constant * add load image preprocessor and its dependency * add pipeline interface and UT covered * support default pipeline for task * add image matting pipeline * refine nlp tokenize interface * add nlp pipeline * fix UT failed * add test for Compose Link: https://code.aone.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/8769235 * add preprocessor module * add test for Compose * fix citest error * fix abs class error * add model base and builder * update task constant * add load image preprocessor and its dependency * add pipeline interface and UT covered * support default pipeline for task * refine models and pipeline interface * add pipeline folder structure * add image matting pipeline * refine nlp tokenize interface * add nlp pipeline 1.add preprossor model pipeline for nlp text classification 2. add corresponding test Link: https://code.aone.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/8757371 * new nlp pipeline * format pre-commit code * update easynlp pipeline * update model_name for easynlp pipeline; add test for maas_lib/utils/typeassert.py * update test_typeassert.py * refactor code 1. rename typeassert to type_assert 2. use lazy import to make easynlp dependency optional 3. refine image matting UT * fix linter test failed * update requirements.txt * fix UT failed * fix citest script to update requirements 2022-05-19 22:18:35 +08:00
			`logger = get_logger()`


			`class NLPPreprocessorTest(unittest.TestCase):`

			`def test_tokenize(self):`
			`cfg = dict(type='Tokenize', tokenizer_name='bert-base-cased')`
			`preprocessor = build_preprocessor(cfg, Fields.nlp)`
			`input = {`
			`InputFields.text:`
			`'Do not meddle in the affairs of wizards, '`
			`'for they are subtle and quick to anger.'`
			`}`
			`output = preprocessor(input)`
			`self.assertTrue(InputFields.text in output)`
			`self.assertEqual(output['input_ids'], [`
			`101, 2091, 1136, 1143, 13002, 1107, 1103, 5707, 1104, 16678, 1116,`
			`117, 1111, 1152, 1132, 11515, 1105, 3613, 1106, 4470, 119, 102`
			`])`
			`self.assertEqual(`
			`output['token_type_ids'],`
			`[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])`
			`self.assertEqual(`
			`output['attention_mask'],`
			`[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])`


			`if __name__ == '__main__':`
			`unittest.main()`