使用 LLaMA Factory 微调 Llama-3 中文对话模型

发布于:2024-04-26 ⋅ 阅读:(30) ⋅ 点赞:(0)

原文:https://colab.research.google.com/drive/1d5KQtbemerlSDSxZIfAaWXhKr30QypiK?usp=sharing#scrollTo=gf60HoT633NY

请申请一个免费 T4 GPU 来运行该脚本

详细讲上面连接。需要科学上网

微调过程大约需要 50 分钟。

训练脚本:

from llmtuner import run_exp

%cd /content/LLaMA-Factory/

run_exp(dict(

  stage="sft",

  do_train=True,

  model_name_or_path="unsloth/llama-3-8b-Instruct-bnb-4bit",

  dataset="identity,alpaca_gpt4_en,alpaca_gpt4_zh",

  template="llama3",

  finetuning_type="lora",

  lora_target="all",

  output_dir="llama3_lora",

  per_device_train_batch_size=2,

  gradient_accumulation_steps=4,

  lr_scheduler_type="cosine",

  logging_steps=10,

  warmup_ratio=0.1,

  save_steps=1000,

  learning_rate=5e-5,

  num_train_epochs=3.0,

  max_samples=500,

  max_grad_norm=1.0,

  quantization_bit=4,

  loraplus_lr_ratio=16.0,

  use_unsloth=True,

  fp16=True,

))

训练过程日志

04/22/2024 04:10:40 - WARNING - llmtuner.hparams.parser - We recommend enable `upcast_layernorm` in quantized training.
WARNING:llmtuner.hparams.parser:We recommend enable `upcast_layernorm` in quantized training.
04/22/2024 04:10:40 - INFO - llmtuner.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: False, compute dtype: torch.float16
INFO:llmtuner.hparams.parser:Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: False, compute dtype: torch.float16
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:89: UserWarning: 
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
  warnings.warn(
[INFO|tokenization_utils_base.py:2046] 2024-04-22 04:10:41,979 >> loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/tokenizer.json
[INFO|tokenization_utils_base.py:2046] 2024-04-22 04:10:41,980 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:2046] 2024-04-22 04:10:41,982 >> loading file special_tokens_map.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/special_tokens_map.json
[INFO|tokenization_utils_base.py:2046] 2024-04-22 04:10:41,984 >> loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/tokenizer_config.json
[WARNING|logging.py:314] 2024-04-22 04:10:42,384 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
04/22/2024 04:10:42 - INFO - llmtuner.data.template - Replace eos token: <|eot_id|>
INFO:llmtuner.data.template:Replace eos token: <|eot_id|>
04/22/2024 04:10:42 - INFO - llmtuner.data.loader - Loading dataset identity.json...
INFO:llmtuner.data.loader:Loading dataset identity.json...
04/22/2024 04:10:42 - WARNING - llmtuner.data.utils - Checksum failed: mismatched SHA-1 hash value at data/identity.json.
WARNING:llmtuner.data.utils:Checksum failed: mismatched SHA-1 hash value at data/identity.json.

Generating train split: 

 91/0 [00:00<00:00, 1640.44 examples/s]

Converting format of dataset: 100%

 91/91 [00:00<00:00, 2822.67 examples/s]

04/22/2024 04:10:42 - INFO - llmtuner.data.loader - Loading dataset alpaca_gpt4_data_en.json...
INFO:llmtuner.data.loader:Loading dataset alpaca_gpt4_data_en.json...

Generating train split: 

 52002/0 [00:00<00:00, 117346.95 examples/s]

Converting format of dataset: 100%

 500/500 [00:00<00:00, 14816.36 examples/s]

04/22/2024 04:10:43 - INFO - llmtuner.data.loader - Loading dataset alpaca_gpt4_data_zh.json...
INFO:llmtuner.data.loader:Loading dataset alpaca_gpt4_data_zh.json...

Generating train split: 

 48818/0 [00:00<00:00, 91511.83 examples/s]

Converting format of dataset: 100%

 500/500 [00:00<00:00, 11785.79 examples/s]

Running tokenizer on dataset: 100%

 1091/1091 [00:00<00:00, 1358.62 examples/s]

[INFO|configuration_utils.py:728] 2024-04-22 04:10:45,417 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/config.json
[INFO|configuration_utils.py:791] 2024-04-22 04:10:45,419 >> Model config LlamaConfig {
  "_name_or_path": "unsloth/llama-3-8b-Instruct-bnb-4bit",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128001,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 8192,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "quantization_config": {
    "_load_in_4bit": true,
    "_load_in_8bit": false,
    "bnb_4bit_compute_dtype": "bfloat16",
    "bnb_4bit_quant_type": "nf4",
    "bnb_4bit_use_double_quant": true,
    "llm_int8_enable_fp32_cpu_offload": false,
    "llm_int8_has_fp16_weight": false,
    "llm_int8_skip_modules": null,
    "llm_int8_threshold": 6.0,
    "load_in_4bit": true,
    "load_in_8bit": false,
    "quant_method": "bitsandbytes"
  },
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.38.2",
  "use_cache": true,
  "vocab_size": 128256
}

input_ids:
[128000, 128006, 9125, 128007, 271, 2675, 527, 264, 11190, 18328, 13, 128009, 128006, 882, 128007, 271, 6151, 128009, 128006, 78191, 128007, 271, 9906, 0, 358, 1097, 445, 81101, 30653, 7496, 11, 459, 15592, 18328, 8040, 555, 445, 8921, 4940, 17367, 13, 2650, 649, 358, 7945, 499, 3432, 30, 128009]
inputs:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>

hi<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Hello! I am Llama-Chinese, an AI assistant developed by LLaMA Factory. How can I assist you today?<|eot_id|>
label_ids:
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 9906, 0, 358, 1097, 445, 81101, 30653, 7496, 11, 459, 15592, 18328, 8040, 555, 445, 8921, 4940, 17367, 13, 2650, 649, 358, 7945, 499, 3432, 30, 128009]
labels:
Hello! I am Llama-Chinese, an AI assistant developed by LLaMA Factory. How can I assist you today?<|eot_id|>
04/22/2024 04:10:45 - INFO - llmtuner.model.patcher - Loading ?-bit BITSANDBYTES-quantized model.
INFO:llmtuner.model.patcher:Loading ?-bit BITSANDBYTES-quantized model.
[INFO|configuration_utils.py:728] 2024-04-22 04:10:45,579 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/config.json
[INFO|configuration_utils.py:791] 2024-04-22 04:10:45,581 >> Model config LlamaConfig {
  "_name_or_path": "unsloth/llama-3-8b-Instruct-bnb-4bit",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128001,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 8192,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "quantization_config": {
    "_load_in_4bit": true,
    "_load_in_8bit": false,
    "bnb_4bit_compute_dtype": "bfloat16",
    "bnb_4bit_quant_type": "nf4",
    "bnb_4bit_use_double_quant": true,
    "llm_int8_enable_fp32_cpu_offload": false,
    "llm_int8_has_fp16_weight": false,
    "llm_int8_skip_modules": null,
    "llm_int8_threshold": 6.0,
    "load_in_4bit": true,
    "load_in_8bit": false,
    "quant_method": "bitsandbytes"
  },
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.38.2",
  "use_cache": true,
  "vocab_size": 128256
}

[INFO|configuration_utils.py:728] 2024-04-22 04:10:45,634 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/config.json
[INFO|configuration_utils.py:791] 2024-04-22 04:10:45,636 >> Model config LlamaConfig {
  "_name_or_path": "unsloth/llama-3-8b-Instruct-bnb-4bit",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128001,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 8192,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "quantization_config": {
    "_load_in_4bit": true,
    "_load_in_8bit": false,
    "bnb_4bit_compute_dtype": "bfloat16",
    "bnb_4bit_quant_type": "nf4",
    "bnb_4bit_use_double_quant": true,
    "llm_int8_enable_fp32_cpu_offload": false,
    "llm_int8_has_fp16_weight": false,
    "llm_int8_skip_modules": null,
    "llm_int8_threshold": 6.0,
    "load_in_4bit": true,
    "load_in_8bit": false,
    "quant_method": "bitsandbytes"
  },
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.38.2",
  "use_cache": true,
  "vocab_size": 128256
}

[INFO|configuration_utils.py:728] 2024-04-22 04:10:45,702 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/config.json
[INFO|configuration_utils.py:791] 2024-04-22 04:10:45,704 >> Model config LlamaConfig {
  "_name_or_path": "unsloth/llama-3-8b-Instruct-bnb-4bit",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128001,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 8192,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "quantization_config": {
    "_load_in_4bit": true,
    "_load_in_8bit": false,
    "bnb_4bit_compute_dtype": "bfloat16",
    "bnb_4bit_quant_type": "nf4",
    "bnb_4bit_use_double_quant": true,
    "llm_int8_enable_fp32_cpu_offload": false,
    "llm_int8_has_fp16_weight": false,
    "llm_int8_skip_modules": null,
    "llm_int8_threshold": 6.0,
    "load_in_4bit": true,
    "load_in_8bit": false,
    "quant_method": "bitsandbytes"
  },
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "float16",
  "transformers_version": "4.38.2",
  "use_cache": true,
  "vocab_size": 128256
}

==((====))==  Unsloth: Fast Llama patching release 2024.4
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.2.1+cu121. CUDA = 7.5. CUDA Toolkit = 12.1.
\        /    Bfloat16 = FALSE. Xformers = 0.0.25.post1. FA = False.
 "-____-"     Free Apache license: GitHub - unslothai/unsloth: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory
[INFO|modeling_utils.py:3257] 2024-04-22 04:10:45,813 >> loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/model.safetensors
[INFO|modeling_utils.py:1400] 2024-04-22 04:10:45,863 >> Instantiating LlamaForCausalLM model under default dtype torch.float16.
[INFO|configuration_utils.py:845] 2024-04-22 04:10:45,871 >> Generate config GenerationConfig {
  "bos_token_id": 128000,
  "eos_token_id": 128001
}

[INFO|modeling_utils.py:3992] 2024-04-22 04:11:13,469 >> All model checkpoint weights were used when initializing LlamaForCausalLM.

[INFO|modeling_utils.py:4000] 2024-04-22 04:11:13,472 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at unsloth/llama-3-8b-Instruct-bnb-4bit.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:800] 2024-04-22 04:11:13,539 >> loading configuration file generation_config.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/generation_config.json
[INFO|configuration_utils.py:845] 2024-04-22 04:11:13,540 >> Generate config GenerationConfig {
  "bos_token_id": 128000,
  "eos_token_id": 128001
}

tokenizer_config.json: 100%

 51.0k/51.0k [00:00<00:00, 2.14MB/s]

tokenizer.json: 100%

 9.08M/9.08M [00:00<00:00, 60.7MB/s]

special_tokens_map.json: 100%

 449/449 [00:00<00:00, 31.3kB/s]

[INFO|tokenization_utils_base.py:2046] 2024-04-22 04:11:14,466 >> loading file tokenizer.json from cache at huggingface_tokenizers_cache/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/tokenizer.json
[INFO|tokenization_utils_base.py:2046] 2024-04-22 04:11:14,468 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:2046] 2024-04-22 04:11:14,469 >> loading file special_tokens_map.json from cache at huggingface_tokenizers_cache/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/special_tokens_map.json
[INFO|tokenization_utils_base.py:2046] 2024-04-22 04:11:14,472 >> loading file tokenizer_config.json from cache at huggingface_tokenizers_cache/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/tokenizer_config.json
[WARNING|logging.py:314] 2024-04-22 04:11:14,881 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|tokenization_utils_base.py:2046] 2024-04-22 04:11:14,935 >> loading file tokenizer.json from cache at huggingface_tokenizers_cache/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/tokenizer.json
[INFO|tokenization_utils_base.py:2046] 2024-04-22 04:11:14,936 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:2046] 2024-04-22 04:11:14,937 >> loading file special_tokens_map.json from cache at huggingface_tokenizers_cache/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/special_tokens_map.json
[INFO|tokenization_utils_base.py:2046] 2024-04-22 04:11:14,939 >> loading file tokenizer_config.json from cache at huggingface_tokenizers_cache/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/tokenizer_config.json
[WARNING|logging.py:314] 2024-04-22 04:11:15,312 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
04/22/2024 04:11:16 - INFO - llmtuner.model.patcher - Gradient checkpointing enabled.
INFO:llmtuner.model.patcher:Gradient checkpointing enabled.
04/22/2024 04:11:16 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA
INFO:llmtuner.model.adapter:Fine-tuning method: LoRA
04/22/2024 04:11:16 - INFO - llmtuner.model.utils - Found linear modules: k_proj,o_proj,down_proj,v_proj,up_proj,q_proj,gate_proj
INFO:llmtuner.model.utils:Found linear modules: k_proj,o_proj,down_proj,v_proj,up_proj,q_proj,gate_proj
[WARNING|logging.py:329] 2024-04-22 04:11:16,731 >> Unsloth 2024.4 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.
04/22/2024 04:11:16 - INFO - llmtuner.model.loader - trainable params: 20971520 || all params: 8051232768 || trainable%: 0.2605
INFO:llmtuner.model.loader:trainable params: 20971520 || all params: 8051232768 || trainable%: 0.2605
[INFO|trainer.py:601] 2024-04-22 04:11:16,796 >> Using auto half precision backend
04/22/2024 04:11:17 - INFO - llmtuner.train.utils - Using LoRA+ optimizer with loraplus lr ratio 16.00.
INFO:llmtuner.train.utils:Using LoRA+ optimizer with loraplus lr ratio 16.00.
[WARNING|logging.py:329] 2024-04-22 04:11:17,203 >> ==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 1,091 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 408
 "-____-"     Number of trainable parameters = 20,971,520

 [408/408 48:57, Epoch 2/3]

Step Training Loss
10 1.568300
20 1.478600
30 1.298700
40 1.188600
50 1.185700
60 1.200300
70 1.249100
80 1.213600
90 1.255900
100 1.186000
110 1.210600
120 1.216200
130 1.111400
140 1.077700
150 0.906100
160 0.895100
170 0.981500
180 0.759400
190 0.834800
200 0.816900
210 0.773200
220 0.946500
230 0.764600
240 0.914700
250 0.864800
260 0.840600
270 0.853600
280 0.745800
290 0.500800
300 0.597600
310 0.616400
320 0.574100
330 0.490300
340 0.602800
350 0.563700
360 0.552900
370 0.574400
380 0.468200
390 0.549200
400 0.528500

[INFO|<string>:460] 2024-04-22 05:00:27,815 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


[INFO|trainer.py:3067] 2024-04-22 05:00:27,822 >> Saving model checkpoint to llama3_lora
[INFO|configuration_utils.py:728] 2024-04-22 05:00:28,263 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/config.json
[INFO|configuration_utils.py:791] 2024-04-22 05:00:28,266 >> Model config LlamaConfig {
  "_name_or_path": "meta-llama/Meta-Llama-3-8B-Instruct",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128001,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 8192,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "quantization_config": {
    "_load_in_4bit": true,
    "_load_in_8bit": false,
    "bnb_4bit_compute_dtype": "bfloat16",
    "bnb_4bit_quant_type": "nf4",
    "bnb_4bit_use_double_quant": true,
    "llm_int8_enable_fp32_cpu_offload": false,
    "llm_int8_has_fp16_weight": false,
    "llm_int8_skip_modules": null,
    "llm_int8_threshold": 6.0,
    "load_in_4bit": true,
    "load_in_8bit": false,
    "quant_method": "bitsandbytes"
  },
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.38.2",
  "use_cache": true,
  "vocab_size": 128256
}

[INFO|tokenization_utils_base.py:2459] 2024-04-22 05:00:28,538 >> tokenizer config file saved in llama3_lora/tokenizer_config.json
[INFO|tokenization_utils_base.py:2468] 2024-04-22 05:00:28,541 >> Special tokens file saved in llama3_lora/special_tokens_map.json
[INFO|modelcard.py:450] 2024-04-22 05:00:28,827 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}
***** train metrics *****
  epoch                    =       2.99
  total_flos               = 32079633GF
  train_loss               =     0.8929
  train_runtime            = 0:49:10.61
  train_samples_per_second =      1.109
  train_steps_per_second   =      0.138

推理:

from llmtuner import ChatModel

from llmtuner.extras.misc import torch_gc

%cd /content/LLaMA-Factory/

chat_model = ChatModel(dict(

  model_name_or_path="unsloth/llama-3-8b-Instruct-bnb-4bit",

  adapter_name_or_path="llama3_lora",

  finetuning_type="lora",

  template="llama3",

))

messages = []

while True:

  query = input("\nUser: ")

  if query.strip() == "exit":

    torch_gc()

    break

  if query.strip() == "clear":

    messages = []

    torch_gc()

    print("History has been removed.")

    continue

  messages.append({"role": "user", "content": query})

  print("Assistant: ", end="", flush=True)

  response = ""

  for new_text in chat_model.stream_chat(messages):

    print(new_text, end="", flush=True)

    response += new_text

  print()

  messages.append({"role": "assistant", "content": response})

推理执行日志

/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:89: UserWarning: 
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
  warnings.warn(
[INFO|tokenization_utils_base.py:2046] 2024-04-22 05:12:13,951 >> loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/tokenizer.json
[INFO|tokenization_utils_base.py:2046] 2024-04-22 05:12:13,953 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:2046] 2024-04-22 05:12:13,957 >> loading file special_tokens_map.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/special_tokens_map.json
[INFO|tokenization_utils_base.py:2046] 2024-04-22 05:12:13,959 >> loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/tokenizer_config.json
[WARNING|logging.py:314] 2024-04-22 05:12:14,407 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
04/22/2024 05:12:14 - INFO - llmtuner.data.template - Replace eos token: <|eot_id|>
INFO:llmtuner.data.template:Replace eos token: <|eot_id|>
[INFO|configuration_utils.py:728] 2024-04-22 05:12:14,462 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/config.json
[INFO|configuration_utils.py:791] 2024-04-22 05:12:14,464 >> Model config LlamaConfig {
  "_name_or_path": "unsloth/llama-3-8b-Instruct-bnb-4bit",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128001,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 8192,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "quantization_config": {
    "_load_in_4bit": true,
    "_load_in_8bit": false,
    "bnb_4bit_compute_dtype": "bfloat16",
    "bnb_4bit_quant_type": "nf4",
    "bnb_4bit_use_double_quant": true,
    "llm_int8_enable_fp32_cpu_offload": false,
    "llm_int8_has_fp16_weight": false,
    "llm_int8_skip_modules": null,
    "llm_int8_threshold": 6.0,
    "load_in_4bit": true,
    "load_in_8bit": false,
    "quant_method": "bitsandbytes"
  },
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.38.2",
  "use_cache": true,
  "vocab_size": 128256
}

04/22/2024 05:12:14 - INFO - llmtuner.model.patcher - Loading ?-bit BITSANDBYTES-quantized model.
INFO:llmtuner.model.patcher:Loading ?-bit BITSANDBYTES-quantized model.
04/22/2024 05:12:14 - INFO - llmtuner.model.patcher - Using KV cache for faster generation.
INFO:llmtuner.model.patcher:Using KV cache for faster generation.
[INFO|modeling_utils.py:3257] 2024-04-22 05:12:14,509 >> loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/model.safetensors
[INFO|modeling_utils.py:1400] 2024-04-22 05:12:14,560 >> Instantiating LlamaForCausalLM model under default dtype torch.float16.
[INFO|configuration_utils.py:845] 2024-04-22 05:12:14,569 >> Generate config GenerationConfig {
  "bos_token_id": 128000,
  "eos_token_id": 128001
}

[INFO|modeling_utils.py:3992] 2024-04-22 05:12:21,290 >> All model checkpoint weights were used when initializing LlamaForCausalLM.

[INFO|modeling_utils.py:4000] 2024-04-22 05:12:21,293 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at unsloth/llama-3-8b-Instruct-bnb-4bit.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:800] 2024-04-22 05:12:21,357 >> loading configuration file generation_config.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/generation_config.json
[INFO|configuration_utils.py:845] 2024-04-22 05:12:21,359 >> Generate config GenerationConfig {
  "bos_token_id": 128000,
  "eos_token_id": 128001
}

04/22/2024 05:12:21 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA
INFO:llmtuner.model.adapter:Fine-tuning method: LoRA
04/22/2024 05:12:22 - INFO - llmtuner.model.adapter - Loaded adapter(s): llama3_lora
INFO:llmtuner.model.adapter:Loaded adapter(s): llama3_lora
04/22/2024 05:12:22 - INFO - llmtuner.model.loader - all params: 8051232768
INFO:llmtuner.model.loader:all params: 8051232768
User: 你是谁
Assistant: 您好,我是 Llama-Chinese,一个由 LLaMA Factory 开发的人工智能助手。我可以根据用户的需求提供相应的回答和帮助。

User: 我的蓝牙耳机坏了,我该去看牙科还是耳鼻喉科?
Assistant: 很抱歉听到您的小配件出了问题。蓝牙耳机的问题与牙科或耳鼻喉科无关。您可以尝试与制造商或当地的维修店联系,进行维修或更换。

User: 谢谢你的帮助
Assistant: 不客气。我是人工智能助手 Llama-Chinese,很高兴能帮到您。

User: exit