{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Baichuan + Lora + Agent\n", "baichuan-7B是由百川智能开发的一个开源的大规模预训练模型。基于Transformer结构，在大约1.2万亿tokens上训练的70亿参数模型，支持中英双语，上下文窗口长度为4096。在标准的中文和英文权威benchmark（C-EVAL/MMLU）上均取得同尺寸最好的效果。" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "1. Ref: https://modelscope.cn/models/baichuan-inc/baichuan-7B/summary\n", "2. 以下脚本可以在2*A10环境下正常运行, 大概占用40G显存\n", "3. python>=3.8" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 配置实验环境" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# !pip install modelscope\n", "# !pip install numpy pandas matplotlib scikit-learn\n", "# !pip install transformers datasets\n", "# !conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia\n", "# !pip install tqdm tensorboard torchmetrics sentencepiece charset_normalizer accelerate\n", "\n", "# !pip install numpy -U # Resolve torchmetrics dependencies and update numpy" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[2023-07-02 17:24:09,391] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/home/hackathon/miniconda3/envs/hackathon/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", " from .autonotebook import tqdm as notebook_tqdm\n", "2023-07-02 17:24:09,870 - modelscope - INFO - PyTorch version 2.0.1 Found.\n", "2023-07-02 17:24:09,871 - modelscope - INFO - Loading ast index from /home/hackathon/.cache/modelscope/ast_indexer\n", "2023-07-02 17:24:09,895 - modelscope - INFO - Loading done! Current index file version is 1.6.2, with md5 ddf811ee982377c1357284a2bfda3dec and a total number of 861 components indexed\n", "2023-07-02 17:24:10,570 - modelscope - INFO - [0, 1]\n", "2023-07-02 17:24:10,719 - modelscope - INFO - Using device: cuda:0,1\n", "2023-07-02 17:24:10,720 - modelscope - INFO - Global seed set to 42\n" ] } ], "source": [ "from _common import *\n", "device_ids = [0, 1]\n", "select_device(device_ids)\n", "_ = seed_everything(42)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 导入Model, Tokenizer" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2023-07-02 17:24:11,036 - modelscope - INFO - Model revision not specified, use default: master in development mode\n", "2023-07-02 17:24:11,037 - modelscope - INFO - Development mode use revision: master\n", "2023-07-02 17:24:11,364 - modelscope - INFO - model_config: BaiChuanConfig {\n", " \"architectures\": [\n", " \"BaiChuanForCausalLM\"\n", " ],\n", " \"auto_map\": {\n", " \"AutoConfig\": \"configuration_baichuan.BaiChuanConfig\",\n", " \"AutoModelForCausalLM\": \"modeling_baichuan.BaiChuanForCausalLM\"\n", " },\n", " \"bos_token_id\": 1,\n", " \"eos_token_id\": 2,\n", " \"hidden_act\": \"silu\",\n", " \"hidden_size\": 4096,\n", " \"initializer_range\": 0.02,\n", " \"intermediate_size\": 11008,\n", " \"max_position_embeddings\": 4096,\n", " \"model_type\": \"baichuan\",\n", " \"num_attention_heads\": 32,\n", " \"num_hidden_layers\": 32,\n", " \"pad_token_id\": 0,\n", " \"rms_norm_eps\": 1e-06,\n", " \"tie_word_embeddings\": false,\n", " \"torch_dtype\": \"float16\",\n", " \"transformers_version\": \"4.30.2\",\n", " \"use_cache\": true,\n", " \"vocab_size\": 64000\n", "}\n", "\n", "The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.\n" ] } ], "source": [ "WORK_DIR = 'runs/baichuan'\n", "LORA_TARGET_MODULES = ['W_pack']\n", "#\n", "model_dir = snapshot_download('baichuan-inc/baichuan-7B', 'v1.0.5')\n", "model, tokenizer = get_baichuan7B_model_tokenizer(model_dir)\n", "#\n", "GRADIENT_CHECKPOINTING = True\n", "if GRADIENT_CHECKPOINTING:\n", " model.gradient_checkpointing_enable()\n", " model.enable_input_require_grads()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 准备Lora" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2023-07-02 17:24:21,741 - modelscope - INFO - lora_config: LoRAConfig(rank=8, replace_modules=['W_pack'], lora_alpha=32, lora_dropout=0.1, merge_weights=True, use_merged_linear=False, enable_lora=None, fan_in_fan_out=False, bias='none', only_lora_trainable=True, pretrained_weights=None)\n", "2023-07-02 17:24:36,360 - modelscope - INFO - model.embed_tokens.weight: requires_grad=False\n", "2023-07-02 17:24:36,360 - modelscope - INFO - model.layers.0.self_attn.W_pack.weight: requires_grad=False\n", "2023-07-02 17:24:36,361 - modelscope - INFO - model.layers.0.self_attn.W_pack.lora_A: requires_grad=True\n", "2023-07-02 17:24:36,361 - modelscope - INFO - model.layers.0.self_attn.W_pack.lora_B: requires_grad=True\n", "2023-07-02 17:24:36,361 - modelscope - INFO - model.layers.0.self_attn.o_proj.weight: requires_grad=False\n", "2023-07-02 17:24:36,362 - modelscope - INFO - model.layers.0.mlp.gate_proj.weight: requires_grad=False\n", "2023-07-02 17:24:36,362 - modelscope - INFO - model.layers.0.mlp.down_proj.weight: requires_grad=False\n", "2023-07-02 17:24:36,363 - modelscope - INFO - model.layers.0.mlp.up_proj.weight: requires_grad=False\n", "2023-07-02 17:24:36,363 - modelscope - INFO - model.layers.0.input_layernorm.weight: requires_grad=False\n", "2023-07-02 17:24:36,363 - modelscope - INFO - model.layers.0.post_attention_layernorm.weight: requires_grad=False\n", "2023-07-02 17:24:36,363 - modelscope - INFO - model.layers.1.self_attn.W_pack.weight: requires_grad=False\n", "2023-07-02 17:24:36,364 - modelscope - INFO - model.layers.1.self_attn.W_pack.lora_A: requires_grad=True\n", "2023-07-02 17:24:36,364 - modelscope - INFO - model.layers.1.self_attn.W_pack.lora_B: requires_grad=True\n", "2023-07-02 17:24:36,364 - modelscope - INFO - model.layers.1.self_attn.o_proj.weight: requires_grad=False\n", "2023-07-02 17:24:36,364 - modelscope - INFO - model.layers.1.mlp.gate_proj.weight: requires_grad=False\n", "2023-07-02 17:24:36,365 - modelscope - INFO - model.layers.1.mlp.down_proj.weight: requires_grad=False\n", "2023-07-02 17:24:36,365 - modelscope - INFO - model.layers.1.mlp.up_proj.weight: requires_grad=False\n", "2023-07-02 17:24:36,365 - modelscope - INFO - model.layers.1.input_layernorm.weight: requires_grad=False\n", "2023-07-02 17:24:36,365 - modelscope - INFO - model.layers.1.post_attention_layernorm.weight: requires_grad=False\n", "2023-07-02 17:24:36,365 - modelscope - INFO - model.layers.2.self_attn.W_pack.weight: requires_grad=False\n", "2023-07-02 17:24:36,366 - modelscope - INFO - ...\n", "2023-07-02 17:24:36,368 - modelscope - INFO - BaiChuanForCausalLM: 7004.7539M Params (4.1943M Trainable), 33.5565M Buffers.\n", "2023-07-02 17:24:36,370 - modelscope - INFO - device: cuda:0, dtype: torch.float16\n" ] }, { "data": { "text/plain": [ "BaiChuanForCausalLM(\n", " (model): Model(\n", " (embed_tokens): Embedding(64000, 4096, padding_idx=0)\n", " (layers): ModuleList(\n", " (0-31): 32 x DecoderLayer(\n", " (self_attn): Attention(\n", " (W_pack): Linear(\n", " in_features=4096, out_features=12288, bias=False\n", " (lora_dropout): Dropout(p=0.1, inplace=False)\n", " )\n", " (o_proj): Linear(in_features=4096, out_features=4096, bias=False)\n", " (rotary_emb): RotaryEmbedding()\n", " )\n", " (mlp): MLP(\n", " (gate_proj): Linear(in_features=4096, out_features=11008, bias=False)\n", " (down_proj): Linear(in_features=11008, out_features=4096, bias=False)\n", " (up_proj): Linear(in_features=4096, out_features=11008, bias=False)\n", " (act_fn): SiLUActivation()\n", " )\n", " (input_layernorm): RMSNorm()\n", " (post_attention_layernorm): RMSNorm()\n", " )\n", " )\n", " (norm): RMSNorm()\n", " )\n", " (lm_head): Linear(in_features=4096, out_features=64000, bias=False)\n", ")" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "LORA_RANK = 8\n", "LORA_ALPHA = 32\n", "LORA_DROPOUT_P = 0.1\n", "lora_config = LoRAConfig(\n", " target_modules=LORA_TARGET_MODULES,\n", " r=LORA_RANK,\n", " lora_alpha=LORA_ALPHA,\n", " lora_dropout=LORA_DROPOUT_P)\n", "logger.info(f'lora_config: {lora_config}')\n", "Swift.prepare_model(model, lora_config)\n", "#\n", "show_freeze_layers(model)\n", "print_model_info(model)\n", "_p = list(model.parameters())[100]\n", "logger.info(f'device: {_p.device}, dtype: {_p.dtype}')\n", "model.bfloat16()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 导入Dataset" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|██████████| 5036/5036 [00:12<00:00, 398.82it/s]\n", "100%|██████████| 285/285 [00:00<00:00, 383.15it/s]\n", "2023-07-02 17:24:49,863 - modelscope - INFO - Dataset Token Length: 958.649707±371.357483, min=44.000000, max=2045.000000, size=4953\n", "2023-07-02 17:24:49,864 - modelscope - INFO - Dataset Token Length: 993.447653±337.821458, min=75.000000, max=1946.000000, size=277\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[INPUT_IDS] 你是达摩院的ModelScopeGPT(魔搭助手)，你是个大语言模型，是2023年达摩院的工程师训练得到的。你有多种能力，可以通过插件集成魔搭社区的模型api来回复用户的问题，还能解答用户使用模型遇到的问题和模型知识相关问答。1. {\"plugin_name\": \"modelscope_text-ie\", \"plugin_owner\": \"ModelScopeGPT\", \"plugin_type\": \"default\", \"plugin_schema_for_model\": {\"name\": \"modelscope_text-ie\", \"description\": \"针对中文的文本，根据schema要抽取的内容，找出其中对应信息，并用json格式展示\", \"url\": \"http://109.199.101.10:1485/\", \"paths\": [{\"name\": \"modelscope_text-ie\", \"model_id\": \"/damo/nlp_structbert_siamese-uie_chinese-base\", \"method\": \"post\", \"description\": \"针对中文的文本，根据schema要抽取的内容，找出其中对应信息，并用json格式展示\", \"parameters\": [{\"name\": \"text\", \"description\": \"用户输入的文本\", \"required\": \"True\"}, {\"name\": \"schema\", \"description\": \"要抽取信息的json表示\", \"required\": \"True\"}]}]}}\n", "\n", "2. {\"plugin_name\": \"modelscope_text-ie\", \"plugin_owner\": \"ModelScopeGPT\", \"plugin_type\": \"default\", \"plugin_schema_for_model\": {\"name\": \"modelscope_text-ie\", \"description\": \"针对中文的文本，根据schema要抽取的内容，找出其中对应信息，并用json格式展示\", \"url\": \"http://9.32.64.200:5873/\", \"paths\": [{\"name\": \"modelscope_text-ie\", \"model_id\": \"/damo/nlp_structbert_siamese-uie_chinese-base\", \"method\": \"post\", \"description\": \"针对中文的文本，根据schema要抽取的内容，找出其中对应信息，并用json格式展示\", \"parameters\": [{\"name\": \"text\", \"description\": \"用户输入的文本\", \"required\": \"True\"}, {\"name\": \"schema\", \"description\": \"要抽取信息的json表示\", \"required\": \"True\"}]}]}}\n", "\n", "3. {\"plugin_name\": \"modelscope_text-ie\", \"plugin_owner\": \"ModelScopeGPT\", \"plugin_type\": \"default\", \"plugin_schema_for_model\": {\"name\": \"modelscope_text-ie\", \"description\": \"针对中文的文本，根据schema要抽取的内容，找出其中对应信息，并用json格式展示\", \"url\": \"http://54.149.78.185:3979/\", \"paths\": [{\"name\": \"modelscope_text-ie\", \"model_id\": \"/damo/nlp_structbert_siamese-uie_chinese-base\", \"method\": \"post\", \"description\": \"针对中文的文本，根据schema要抽取的内容，找出其中对应信息，并用json格式展示\", \"parameters\": [{\"name\": \"text\", \"description\": \"用户输入的文本\", \"required\": \"True\"}, {\"name\": \"schema\", \"description\": \"要抽取信息的json表示\", \"required\": \"True\"}]}]}} \n", "\n", "### 用户\n", "按照给定的schema抽取出下面文本对应的信息\n", "schema：{\"人物\": null, \"地理位置\": null, \"组织机构\": null}\n", "近日，美国政府宣布将对中国1000多种商品加征关税，并威胁进一步加征关税。 \n", "\n", "### 助手\n", " <|startofthink|>```JSON\n", "{\"api_name\": \"modelscope_text-ie\", \"url\": \"http://9.32.64.200:5873/damo/nlp_structbert_siamese-uie_chinese-base\", \"parameters\": {\"text\": \"近日，美国政府宣布将对中国1000多种商品加征关税，并威胁进一步加征关税。\", \"schema\": \"{\\\"人物\\\": null, \\\"地理位置\\\": null, \\\"组织机构\\\": null}\"}}\n", "```<|endofthink|>\n", "\n", "<|startofexec|>```JSON\n", "{\"人物\": [], \"地理位置\": [\"中国\", \"美国\"], \"组织机构\": []}\n", "```<|endofexec|>\n", "信息抽取结果：{\"人物\": [], \"地理位置\": [\"中国\", \"美国\"], \"组织机构\": []}。我使用的模型是ModelScope的'damo/nlp_structbert_siamese-uie_chinese-base'模型。这是一个基于StructBERT预训练模型微调训练的通用信息抽取模型。\n", "\n", "[LABLES]