Qwen2.5¶

Qwen2.5 is a series of large language models developed by Alibaba Cloud's Qwen team. These models are designed to excel in various natural language processing tasks including text generation, code completion, mathematical reasoning, and more. The Qwen2.5 series offers models of different sizes to accommodate various computational requirements and use cases.

The following table shows the architecture details and licensing information for all Qwen2.5 open-weight models:

Models	Layers	Heads (Q / KV)	Tie Embedding	Context / Generation Length	License
0.5B	24	14 / 2	Yes	32K / 8K	Apache 2.0
1.5B	28	12 / 2	Yes	32K / 8K	Apache 2.0
3B	36	16 / 2	Yes	32K / 8K	Qwen Research
7B	28	28 / 4	No	128K / 8K	Apache 2.0
14B	48	40 / 8	No	128K / 8K	Apache 2.0
32B	64	40 / 8	No	128K / 8K	Apache 2.0
72B	80	64 / 8	No	128K / 8K	Qwen

Qwen2.5-1.5B Models¶

In FusionBench, we provide several pre-configured model pools for Qwen2.5-1.5B models that are commonly used for model fusion experiments. These configurations include base models and their fine-tuned variants specialized for different domains.

This configuration includes the base model along with instruction, math, and code variants:

config/modelpool/CausalLMPool/Qwen2.5-1.5B_three_models.yaml

_target_: fusion_bench.modelpool.CausalLMPool
_recursive_: false
enable_lazy_loading: true
models:
  _pretrained_: Qwen/Qwen2.5-1.5B
  math: Qwen/Qwen2.5-Math-1.5B
  code: Qwen/Qwen2.5-Coder-1.5B
  instruction: Qwen/Qwen2.5-1.5B-Instruct
model_kwargs:
  torch_dtype: bfloat16
tokenizer: Qwen/Qwen2.5-1.5B

This configuration focuses specifically on mathematical and coding capabilities:

config/modelpool/CausalLMPool/Qwen2.5-1.5B_math_and_code.yaml

_target_: fusion_bench.modelpool.CausalLMPool
_recursive_: false
enable_lazy_loading: true
models:
  _pretrained_: Qwen/Qwen2.5-1.5B
  math: Qwen/Qwen2.5-Math-1.5B
  code: Qwen/Qwen2.5-Coder-1.5B
model_kwargs:
  torch_dtype: bfloat16
tokenizer: Qwen/Qwen2.5-1.5B

Model Fusion Experiments¶

Simple Average¶

Merge all three specialized models using simple parameter averaging:

fusion_bench path.log_dir=outputs/Qwen2.5-1.5B/three_models/simple_average \
    method=linear/simple_average_for_causallm \
    modelpool=CausalLMPool/Qwen2.5-1.5B_three_models

Example for evaluating the merged model using lm-eval-harness on gsm8k and gsm8k_cot tasks:
scripts/lm_eval/evaluate_task.sh \
    outputs/Qwen2.5-1.5B/three_models/simple_average/checkpoint \
    --tasks 'gsm8k,gsm8k_cot' --output_path outputs/lm_eval

Merge math and code models using simple parameter averaging:

fusion_bench path.log_dir=outputs/Qwen2.5-1.5B/math_and_code/simple_average \
    method=linear/simple_average_for_causallm \
    modelpool=CausalLMPool/Qwen2.5-1.5B_math_and_code

Task Arithmetic¶

Merge all three specialized models using task arithmetic:

scaling_factor=0.8
fusion_bench path.log_dir=outputs/Qwen2.5-1.5B/three_models/task_arithmetic/${scaling_factor} \
    method=linear/task_arithmetic_for_causallm \
    method.scaling_factor=${scaling_factor} \
    modelpool=CausalLMPool/Qwen2.5-1.5B_three_models

Ties-Merging¶

Merge all three specialized models using TIES merging:

scaling_factor=0.8
fusion_bench path.log_dir=outputs/Qwen2.5-1.5B/three_models/ties_merging/${scaling_factor} \
    method=linear/ties_merging_for_causallm \
    method.scaling_factor=${scaling_factor} \
    modelpool=CausalLMPool/Qwen2.5-1.5B_three_models

Citation¶

If you use Qwen2.5 models in your research, please cite:

@misc{qwen2025qwen25technicalreport,
      title={Qwen2.5 Technical Report}, 
      author={Qwen and : and An Yang and Baosong Yang and Beichen Zhang and Binyuan Hui and Bo Zheng and Bowen Yu and Chengyuan Li and Dayiheng Liu and Fei Huang and Haoran Wei and Huan Lin and Jian Yang and Jianhong Tu and Jianwei Zhang and Jianxin Yang and Jiaxi Yang and Jingren Zhou and Junyang Lin and Kai Dang and Keming Lu and Keqin Bao and Kexin Yang and Le Yu and Mei Li and Mingfeng Xue and Pei Zhang and Qin Zhu and Rui Men and Runji Lin and Tianhao Li and Tianyi Tang and Tingyu Xia and Xingzhang Ren and Xuancheng Ren and Yang Fan and Yang Su and Yichang Zhang and Yu Wan and Yuqiong Liu and Zeyu Cui and Zhenru Zhang and Zihan Qiu},
      year={2025},
      eprint={2412.15115},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.15115}, 
}