LLaMA-2¶
LLaMA-2 is representing a significant advancement in open-source language modeling.
LLaMA-2-7B Models¶
The LLaMA-2-7B model family provides a good balance between performance and computational efficiency. In FusionBench, we offer pre-configured model pools that include various specialized variants for different domains.
This configuration includes the base LLaMA-2-7B model along with specialized variants for chat, mathematics, and coding:
config/modelpool/CausalLMPool/llama-7b_3-models_v1.yaml
_target_: fusion_bench.modelpool.CausalLMPool
_recursive_: false
enable_lazy_loading: true
models:
_pretrained_: meta-llama/Llama-2-7b-hf
chat: meta-llama/Llama-2-7b-chat-hf
math: WizardLMTeam/WizardMath-7B-V1.0
code: codellama/CodeLlama-7b-hf
model_kwargs:
torch_dtype: bfloat16
tokenizer: meta-llama/Llama-2-7b-hf
Model Fusion Experiments¶
Simple Average¶
Merge all models using simple parameter averaging:
fusion_bench path.log_dir=outputs/llama-2/3-models_v1/simple_average \
method=linear/simple_average_for_causallm \
modelpool=CausalLMPool/llama-7b_3-models_v1
Citation¶
If you use LLaMA-2 models in your research, please cite:
@article{touvron2023llama,
title={Llama 2: Open foundation and fine-tuned chat models},
author={Touvron, Hugo and Martin, Louis and Stone, Kevin and Albert, Peter and Almahairi, Amjad and Babaei, Yasmine and Bashlykov, Nikolay and Batra, Soumya and Bhargava, Prajjwal and Bhosale, Shruti and others},
journal={arXiv preprint arXiv:2307.09288},
year={2023}
}