Skip to content

Qwen2.5

Qwen2.5 is a series of large language models developed by Alibaba Cloud's Qwen team. These models are designed to excel in various natural language processing tasks including text generation, code completion, mathematical reasoning, and more. The Qwen2.5 series offers models of different sizes to accommodate various computational requirements and use cases.

The following table shows the architecture details and licensing information for all Qwen2.5 open-weight models:

Models Layers Heads (Q / KV) Tie Embedding Context / Generation Length License
0.5B 24 14 / 2 Yes 32K / 8K Apache 2.0
1.5B 28 12 / 2 Yes 32K / 8K Apache 2.0
3B 36 16 / 2 Yes 32K / 8K Qwen Research
7B 28 28 / 4 No 128K / 8K Apache 2.0
14B 48 40 / 8 No 128K / 8K Apache 2.0
32B 64 40 / 8 No 128K / 8K Apache 2.0
72B 80 64 / 8 No 128K / 8K Qwen

Qwen2.5-1.5B Models

In FusionBench, we provide several pre-configured model pools for Qwen2.5-1.5B models that are commonly used for model fusion experiments. These configurations include base models and their fine-tuned variants specialized for different domains.

This configuration includes the base model along with instruction, math, and code variants:

config/modelpool/CausalLMPool/Qwen2.5-1.5B_three_models.yaml
_target_: fusion_bench.modelpool.CausalLMPool
_recursive_: false
enable_lazy_loading: true
models:
  _pretrained_: Qwen/Qwen2.5-1.5B
  math: Qwen/Qwen2.5-Math-1.5B
  code: Qwen/Qwen2.5-Coder-1.5B
  instruction: Qwen/Qwen2.5-1.5B-Instruct
model_kwargs:
  torch_dtype: bfloat16
tokenizer: Qwen/Qwen2.5-1.5B

This configuration focuses specifically on mathematical and coding capabilities:

config/modelpool/CausalLMPool/Qwen2.5-1.5B_math_and_code.yaml
_target_: fusion_bench.modelpool.CausalLMPool
_recursive_: false
enable_lazy_loading: true
models:
  _pretrained_: Qwen/Qwen2.5-1.5B
  math: Qwen/Qwen2.5-Math-1.5B
  code: Qwen/Qwen2.5-Coder-1.5B
model_kwargs:
  torch_dtype: bfloat16
tokenizer: Qwen/Qwen2.5-1.5B

Model Fusion Experiments

Simple Average

Merge all three specialized models using simple parameter averaging:

fusion_bench path.log_dir=outputs/Qwen2.5-1.5B/three_models/simple_average \
    method=linear/simple_average_for_causallm \
    modelpool=CausalLMPool/Qwen2.5-1.5B_three_models

Example for evaluating the merged model using lm-eval-harness on gsm8k and gsm8k_cot tasks:

scripts/lm_eval/evaluate_task.sh \
    outputs/Qwen2.5-1.5B/three_models/simple_average/checkpoint \
    --tasks 'gsm8k,gsm8k_cot' --output_path outputs/lm_eval

Merge math and code models using simple parameter averaging:

fusion_bench path.log_dir=outputs/Qwen2.5-1.5B/math_and_code/simple_average \
    method=linear/simple_average_for_causallm \
    modelpool=CausalLMPool/Qwen2.5-1.5B_math_and_code

Task Arithmetic

Merge all three specialized models using task arithmetic:

scaling_factor=0.8
fusion_bench path.log_dir=outputs/Qwen2.5-1.5B/three_models/task_arithmetic/${scaling_factor} \
    method=linear/task_arithmetic_for_causallm \
    method.scaling_factor=${scaling_factor} \
    modelpool=CausalLMPool/Qwen2.5-1.5B_three_models

Ties-Merging

Merge all three specialized models using TIES merging:

scaling_factor=0.8
fusion_bench path.log_dir=outputs/Qwen2.5-1.5B/three_models/ties_merging/${scaling_factor} \
    method=linear/ties_merging_for_causallm \
    method.scaling_factor=${scaling_factor} \
    modelpool=CausalLMPool/Qwen2.5-1.5B_three_models

Citation

If you use Qwen2.5 models in your research, please cite:

@misc{qwen2025qwen25technicalreport,
      title={Qwen2.5 Technical Report}, 
      author={Qwen and : and An Yang and Baosong Yang and Beichen Zhang and Binyuan Hui and Bo Zheng and Bowen Yu and Chengyuan Li and Dayiheng Liu and Fei Huang and Haoran Wei and Huan Lin and Jian Yang and Jianhong Tu and Jianwei Zhang and Jianxin Yang and Jiaxi Yang and Jingren Zhou and Junyang Lin and Kai Dang and Keming Lu and Keqin Bao and Kexin Yang and Le Yu and Mei Li and Mingfeng Xue and Pei Zhang and Qin Zhu and Rui Men and Runji Lin and Tianhao Li and Tianyi Tang and Tingyu Xia and Xingzhang Ren and Xuancheng Ren and Yang Fan and Yang Su and Yichang Zhang and Yu Wan and Yuqiong Liu and Zeyu Cui and Zhenru Zhang and Zihan Qiu},
      year={2025},
      eprint={2412.15115},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.15115}, 
}