{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Baichuan + Lora + Agent\n",
"baichuan-7B是由百川智能开发的一个开源的大规模预训练模型。基于Transformer结构,在大约1.2万亿tokens上训练的70亿参数模型,支持中英双语,上下文窗口长度为4096。在标准的中文和英文权威benchmark(C-EVAL/MMLU)上均取得同尺寸最好的效果。"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"1. Ref: https://modelscope.cn/models/baichuan-inc/baichuan-7B/summary\n",
"2. 以下脚本可以在2*A10环境下正常运行, 大概占用40G显存\n",
"3. python>=3.8"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### 配置实验环境"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# !pip install modelscope\n",
"# !pip install numpy pandas matplotlib scikit-learn\n",
"# !pip install transformers datasets\n",
"# !conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia\n",
"# !pip install tqdm tensorboard torchmetrics sentencepiece charset_normalizer accelerate\n",
"\n",
"# !pip install numpy -U # Resolve torchmetrics dependencies and update numpy"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2023-07-02 17:24:09,391] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/hackathon/miniconda3/envs/hackathon/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
" from .autonotebook import tqdm as notebook_tqdm\n",
"2023-07-02 17:24:09,870 - modelscope - INFO - PyTorch version 2.0.1 Found.\n",
"2023-07-02 17:24:09,871 - modelscope - INFO - Loading ast index from /home/hackathon/.cache/modelscope/ast_indexer\n",
"2023-07-02 17:24:09,895 - modelscope - INFO - Loading done! Current index file version is 1.6.2, with md5 ddf811ee982377c1357284a2bfda3dec and a total number of 861 components indexed\n",
"2023-07-02 17:24:10,570 - modelscope - INFO - [0, 1]\n",
"2023-07-02 17:24:10,719 - modelscope - INFO - Using device: cuda:0,1\n",
"2023-07-02 17:24:10,720 - modelscope - INFO - Global seed set to 42\n"
]
}
],
"source": [
"from _common import *\n",
"device_ids = [0, 1]\n",
"select_device(device_ids)\n",
"_ = seed_everything(42)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### 导入Model, Tokenizer"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2023-07-02 17:24:11,036 - modelscope - INFO - Model revision not specified, use default: master in development mode\n",
"2023-07-02 17:24:11,037 - modelscope - INFO - Development mode use revision: master\n",
"2023-07-02 17:24:11,364 - modelscope - INFO - model_config: BaiChuanConfig {\n",
" \"architectures\": [\n",
" \"BaiChuanForCausalLM\"\n",
" ],\n",
" \"auto_map\": {\n",
" \"AutoConfig\": \"configuration_baichuan.BaiChuanConfig\",\n",
" \"AutoModelForCausalLM\": \"modeling_baichuan.BaiChuanForCausalLM\"\n",
" },\n",
" \"bos_token_id\": 1,\n",
" \"eos_token_id\": 2,\n",
" \"hidden_act\": \"silu\",\n",
" \"hidden_size\": 4096,\n",
" \"initializer_range\": 0.02,\n",
" \"intermediate_size\": 11008,\n",
" \"max_position_embeddings\": 4096,\n",
" \"model_type\": \"baichuan\",\n",
" \"num_attention_heads\": 32,\n",
" \"num_hidden_layers\": 32,\n",
" \"pad_token_id\": 0,\n",
" \"rms_norm_eps\": 1e-06,\n",
" \"tie_word_embeddings\": false,\n",
" \"torch_dtype\": \"float16\",\n",
" \"transformers_version\": \"4.30.2\",\n",
" \"use_cache\": true,\n",
" \"vocab_size\": 64000\n",
"}\n",
"\n",
"The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.\n"
]
}
],
"source": [
"WORK_DIR = 'runs/baichuan'\n",
"LORA_TARGET_MODULES = ['W_pack']\n",
"#\n",
"model_dir = snapshot_download('baichuan-inc/baichuan-7B', 'v1.0.5')\n",
"model, tokenizer = get_baichuan7B_model_tokenizer(model_dir)\n",
"#\n",
"GRADIENT_CHECKPOINTING = True\n",
"if GRADIENT_CHECKPOINTING:\n",
" model.gradient_checkpointing_enable()\n",
" model.enable_input_require_grads()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### 准备Lora"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2023-07-02 17:24:21,741 - modelscope - INFO - lora_config: LoRAConfig(rank=8, replace_modules=['W_pack'], lora_alpha=32, lora_dropout=0.1, merge_weights=True, use_merged_linear=False, enable_lora=None, fan_in_fan_out=False, bias='none', only_lora_trainable=True, pretrained_weights=None)\n",
"2023-07-02 17:24:36,360 - modelscope - INFO - model.embed_tokens.weight: requires_grad=False\n",
"2023-07-02 17:24:36,360 - modelscope - INFO - model.layers.0.self_attn.W_pack.weight: requires_grad=False\n",
"2023-07-02 17:24:36,361 - modelscope - INFO - model.layers.0.self_attn.W_pack.lora_A: requires_grad=True\n",
"2023-07-02 17:24:36,361 - modelscope - INFO - model.layers.0.self_attn.W_pack.lora_B: requires_grad=True\n",
"2023-07-02 17:24:36,361 - modelscope - INFO - model.layers.0.self_attn.o_proj.weight: requires_grad=False\n",
"2023-07-02 17:24:36,362 - modelscope - INFO - model.layers.0.mlp.gate_proj.weight: requires_grad=False\n",
"2023-07-02 17:24:36,362 - modelscope - INFO - model.layers.0.mlp.down_proj.weight: requires_grad=False\n",
"2023-07-02 17:24:36,363 - modelscope - INFO - model.layers.0.mlp.up_proj.weight: requires_grad=False\n",
"2023-07-02 17:24:36,363 - modelscope - INFO - model.layers.0.input_layernorm.weight: requires_grad=False\n",
"2023-07-02 17:24:36,363 - modelscope - INFO - model.layers.0.post_attention_layernorm.weight: requires_grad=False\n",
"2023-07-02 17:24:36,363 - modelscope - INFO - model.layers.1.self_attn.W_pack.weight: requires_grad=False\n",
"2023-07-02 17:24:36,364 - modelscope - INFO - model.layers.1.self_attn.W_pack.lora_A: requires_grad=True\n",
"2023-07-02 17:24:36,364 - modelscope - INFO - model.layers.1.self_attn.W_pack.lora_B: requires_grad=True\n",
"2023-07-02 17:24:36,364 - modelscope - INFO - model.layers.1.self_attn.o_proj.weight: requires_grad=False\n",
"2023-07-02 17:24:36,364 - modelscope - INFO - model.layers.1.mlp.gate_proj.weight: requires_grad=False\n",
"2023-07-02 17:24:36,365 - modelscope - INFO - model.layers.1.mlp.down_proj.weight: requires_grad=False\n",
"2023-07-02 17:24:36,365 - modelscope - INFO - model.layers.1.mlp.up_proj.weight: requires_grad=False\n",
"2023-07-02 17:24:36,365 - modelscope - INFO - model.layers.1.input_layernorm.weight: requires_grad=False\n",
"2023-07-02 17:24:36,365 - modelscope - INFO - model.layers.1.post_attention_layernorm.weight: requires_grad=False\n",
"2023-07-02 17:24:36,365 - modelscope - INFO - model.layers.2.self_attn.W_pack.weight: requires_grad=False\n",
"2023-07-02 17:24:36,366 - modelscope - INFO - ...\n",
"2023-07-02 17:24:36,368 - modelscope - INFO - BaiChuanForCausalLM: 7004.7539M Params (4.1943M Trainable), 33.5565M Buffers.\n",
"2023-07-02 17:24:36,370 - modelscope - INFO - device: cuda:0, dtype: torch.float16\n"
]
},
{
"data": {
"text/plain": [
"BaiChuanForCausalLM(\n",
" (model): Model(\n",
" (embed_tokens): Embedding(64000, 4096, padding_idx=0)\n",
" (layers): ModuleList(\n",
" (0-31): 32 x DecoderLayer(\n",
" (self_attn): Attention(\n",
" (W_pack): Linear(\n",
" in_features=4096, out_features=12288, bias=False\n",
" (lora_dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" (o_proj): Linear(in_features=4096, out_features=4096, bias=False)\n",
" (rotary_emb): RotaryEmbedding()\n",
" )\n",
" (mlp): MLP(\n",
" (gate_proj): Linear(in_features=4096, out_features=11008, bias=False)\n",
" (down_proj): Linear(in_features=11008, out_features=4096, bias=False)\n",
" (up_proj): Linear(in_features=4096, out_features=11008, bias=False)\n",
" (act_fn): SiLUActivation()\n",
" )\n",
" (input_layernorm): RMSNorm()\n",
" (post_attention_layernorm): RMSNorm()\n",
" )\n",
" )\n",
" (norm): RMSNorm()\n",
" )\n",
" (lm_head): Linear(in_features=4096, out_features=64000, bias=False)\n",
")"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"LORA_RANK = 8\n",
"LORA_ALPHA = 32\n",
"LORA_DROPOUT_P = 0.1\n",
"lora_config = LoRAConfig(\n",
" target_modules=LORA_TARGET_MODULES,\n",
" r=LORA_RANK,\n",
" lora_alpha=LORA_ALPHA,\n",
" lora_dropout=LORA_DROPOUT_P)\n",
"logger.info(f'lora_config: {lora_config}')\n",
"Swift.prepare_model(model, lora_config)\n",
"#\n",
"show_freeze_layers(model)\n",
"print_model_info(model)\n",
"_p = list(model.parameters())[100]\n",
"logger.info(f'device: {_p.device}, dtype: {_p.dtype}')\n",
"model.bfloat16()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### 导入Dataset"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 5036/5036 [00:12<00:00, 398.82it/s]\n",
"100%|██████████| 285/285 [00:00<00:00, 383.15it/s]\n",
"2023-07-02 17:24:49,863 - modelscope - INFO - Dataset Token Length: 958.649707±371.357483, min=44.000000, max=2045.000000, size=4953\n",
"2023-07-02 17:24:49,864 - modelscope - INFO - Dataset Token Length: 993.447653±337.821458, min=75.000000, max=1946.000000, size=277\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[INPUT_IDS] 你是达摩院的ModelScopeGPT(魔搭助手),你是个大语言模型, 是2023年达摩院的工程师训练得到的。你有多种能力,可以通过插件集成魔搭社区的模型api来回复用户的问题,还能解答用户使用模型遇到的问题和模型知识相关问答。1. {\"plugin_name\": \"modelscope_text-ie\", \"plugin_owner\": \"ModelScopeGPT\", \"plugin_type\": \"default\", \"plugin_schema_for_model\": {\"name\": \"modelscope_text-ie\", \"description\": \"针对中文的文本,根据schema要抽取的内容,找出其中对应信息,并用json格式展示\", \"url\": \"http://109.199.101.10:1485/\", \"paths\": [{\"name\": \"modelscope_text-ie\", \"model_id\": \"/damo/nlp_structbert_siamese-uie_chinese-base\", \"method\": \"post\", \"description\": \"针对中文的文本,根据schema要抽取的内容,找出其中对应信息,并用json格式展示\", \"parameters\": [{\"name\": \"text\", \"description\": \"用户输入的文本\", \"required\": \"True\"}, {\"name\": \"schema\", \"description\": \"要抽取信息的json表示\", \"required\": \"True\"}]}]}}\n",
"\n",
"2. {\"plugin_name\": \"modelscope_text-ie\", \"plugin_owner\": \"ModelScopeGPT\", \"plugin_type\": \"default\", \"plugin_schema_for_model\": {\"name\": \"modelscope_text-ie\", \"description\": \"针对中文的文本,根据schema要抽取的内容,找出其中对应信息,并用json格式展示\", \"url\": \"http://9.32.64.200:5873/\", \"paths\": [{\"name\": \"modelscope_text-ie\", \"model_id\": \"/damo/nlp_structbert_siamese-uie_chinese-base\", \"method\": \"post\", \"description\": \"针对中文的文本,根据schema要抽取的内容,找出其中对应信息,并用json格式展示\", \"parameters\": [{\"name\": \"text\", \"description\": \"用户输入的文本\", \"required\": \"True\"}, {\"name\": \"schema\", \"description\": \"要抽取信息的json表示\", \"required\": \"True\"}]}]}}\n",
"\n",
"3. {\"plugin_name\": \"modelscope_text-ie\", \"plugin_owner\": \"ModelScopeGPT\", \"plugin_type\": \"default\", \"plugin_schema_for_model\": {\"name\": \"modelscope_text-ie\", \"description\": \"针对中文的文本,根据schema要抽取的内容,找出其中对应信息,并用json格式展示\", \"url\": \"http://54.149.78.185:3979/\", \"paths\": [{\"name\": \"modelscope_text-ie\", \"model_id\": \"/damo/nlp_structbert_siamese-uie_chinese-base\", \"method\": \"post\", \"description\": \"针对中文的文本,根据schema要抽取的内容,找出其中对应信息,并用json格式展示\", \"parameters\": [{\"name\": \"text\", \"description\": \"用户输入的文本\", \"required\": \"True\"}, {\"name\": \"schema\", \"description\": \"要抽取信息的json表示\", \"required\": \"True\"}]}]}} \n",
"\n",
"### 用户\n",
"按照给定的schema抽取出下面文本对应的信息\n",
"schema:{\"人物\": null, \"地理位置\": null, \"组织机构\": null}\n",
"近日,美国政府宣布将对中国1000多种商品加征关税,并威胁进一步加征关税。 \n",
"\n",
"### 助手\n",
" <|startofthink|>```JSON\n",
"{\"api_name\": \"modelscope_text-ie\", \"url\": \"http://9.32.64.200:5873/damo/nlp_structbert_siamese-uie_chinese-base\", \"parameters\": {\"text\": \"近日,美国政府宣布将对中国1000多种商品加征关税,并威胁进一步加征关税。\", \"schema\": \"{\\\"人物\\\": null, \\\"地理位置\\\": null, \\\"组织机构\\\": null}\"}}\n",
"```<|endofthink|>\n",
"\n",
"<|startofexec|>```JSON\n",
"{\"人物\": [], \"地理位置\": [\"中国\", \"美国\"], \"组织机构\": []}\n",
"```<|endofexec|>\n",
"信息抽取结果:{\"人物\": [], \"地理位置\": [\"中国\", \"美国\"], \"组织机构\": []}。我使用的模型是ModelScope的'damo/nlp_structbert_siamese-uie_chinese-base'模型。这是一个基于StructBERT预训练模型微调训练的通用信息抽取模型。\n",
"\n",
"[LABLES]