Skip to content

Reading Lists

Info

working in progress. Any suggestions are welcome.

I've been compiling a comprehensive list of papers and resources that have been instrumental in my research journey. This collection is designed to serve as a valuable starting point for those interested in delving into the field of deep model fusion. If you have any suggestions for papers to add, please feel free to raise an issue or submit a pull request.

Note

Meaning of the symbols in the list:

  • ⭐ Highly recommended
  • 🦙 LLaMA model-related or Mistral-related work
  • Code available on GitHub
  • 🤗 models or datasets available on Hugging Face

Survey Papers

Findings on Model Fusion

  • 🦙 Aakanksha et al. Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning arXiv:2410.10801
  • ⭐ Yadav et al. What Matters for Model Merging at Scale? arXiv:2410.03617

Model Ensemble

  • Liu T Y, Soatto S. Tangent Model Composition for Ensembling and Continual Fine-tuning. arXiv, 2023.
  • 🦙 Wan et al. Knowledge Fusion of Large Language Models arXiv:2401.10491

    Quote

    image

  • 🦙 Wan F, Yang Z, Zhong L, et al. FuseChat: Knowledge Fusion of Chat Models. arXiv, 2024.

    Quote

    image

Model Merging

Mode Connectivity

Mode connectivity is such an important concept in model merging that it deserves its own page.

Weight Interpolation

  • Osowiechi et al. WATT: Weight Average Test-Time Adaptation of CLIP arXiv:2406.13875

    Quote

    image

  • Jiang et al. ForkMerge: Mitigating Negative Transfer in Auxiliary-Task Learning

    Quote

    image

  • Chronopoulou et al. Language and Task Arithmetic with Parameter-Efficient Layers for Zero-Shot Summarization arXiv:2311.09344

    Quote

    image

  • 🦙 L. Yu, B. Yu, H. Yu, F. Huang, and Y. Li, “Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch,” Nov. 06, 2023, arXiv: arXiv:2311.03099. Available: http://arxiv.org/abs/2311.03099

  • ⭐ E. Yang et al., “AdaMerging: Adaptive Model Merging for Multi-Task Learning,” ICLR 2024, arXiv: arXiv:2310.02575. doi: 10.48550/arXiv.2310.02575.
  • ⭐ P. Yadav, D. Tam, L. Choshen, C. Raffel, and M. Bansal, “Resolving Interference When Merging Models,” Jun. 02, 2023, arXiv: arXiv:2306.01708. Available: http://arxiv.org/abs/2306.01708
  • ⭐ Guillermo Ortiz-Jimenez, Alessandro Favero, and Pascal Frossard, “Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models,” May 30, 2023, arXiv: arXiv:2305.12827. doi: 10.48550/arXiv.2305.12827.
  • ⭐ G. Ilharco et al., “Editing Models with Task Arithmetic,” Mar. 31, 2023, arXiv: arXiv:2212.04089. doi: 10.48550/arXiv.2212.04089.
  • Tang et al. Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion

    Quote

    alt text

  • Rame et al. Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards arXiv:2306.04488

  • Huang et al. LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition arXiv:2307.13269

    Quote

    image

  • Wu et al. Pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation

    Quote

    image

  • Chronopoulou et al. AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models arXiv:2302.07027

    Quote

    image

  • Zimmer et al. Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging arXiv:2306.16788

    Quote

    image

  • ⭐ Wortsman et al. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time arXiv:2203.05482

Alignment-based Methods

  • Kinderman et al. Foldable SuperNets: Scalable Merging of Transformers with Different Initializations and Tasks arXiv:2410.01483

    Quote

    image

  • S. K. Ainsworth, J. Hayase, and S. Srinivasa, “Git Re-Basin: Merging Models modulo Permutation Symmetries,” ICLR 2023. Available: http://arxiv.org/abs/2209.04836

  • George Stoica, Daniel Bolya, Jakob Bjorner, Taylor Hearn, and Judy Hoffman, “ZipIt! Merging Models from Different Tasks without Training,” May 04, 2023, arXiv: arXiv:2305.03053. Available: http://arxiv.org/abs/2305.03053

Subspace-based Methods

Online Model Merging

  • 🦙 Alexandrov el al. Mitigating Catastrophic Forgetting in Language Transfer via Model Merging arXiv:2407.08699

    Quote

    image image

  • 🦙 Lu et al. Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment arXiv:2405.17931

  • Izmailov et al. Averaging Weights Leads to Wider Optima and Better Generalization
  • Kaddour et al. Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging arXiv:2209.14981
  • Zhang et al. Lookahead Optimizer: k steps forward, 1 step back http://arxiv.org/abs/1907.08610

Model Mixing/Upscaling/Expansion

  • 🦙 Samragh et al. Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization arXiv:2409.12903

    Quote

    image

  • Zhao et al. Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering arXiv:2409.16167

    Quote

    image

  • Tang et al. SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models arXiv:2408.10174

    Quote

    image

  • 🦙 🤗 C. Chen et al., “Model Composition for Multimodal Large Language Models.” arXiv, Feb. 20, 2024. doi: 10.48550/arXiv.2402.12750.

  • A. Tang, L. Shen, Y. Luo, N. Yin, L. Zhang, and D. Tao, “Merging Multi-Task Models via Weight-Ensembling Mixture of Experts,” Feb. 01, 2024, arXiv: arXiv:2402.00433. doi: 10.48550/arXiv.2402.00433.
  • 🦙 Zhenyi Lu et al., "Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging" 10.48550/arXiv.2406.15479

    Quote

    image

  • 🦙 🤗 Tang A, Shen L, Luo Y, et al. SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models. arXiv, 2024.

  • 🦙 Kim et al. SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling arXiv:2312.15166

    Quote

    image

  • Komatsuzaki et al. Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints arXiv:2212.05055

    Quote

    image

Benchmarks

  • Tam et al. Realistic Evaluation of Model Merging for Compositional Generalization arXiv:2409.18314
  • ⭐ Tang et al. FusionBench: A Comprehensive Benchmark of Deep Model Fusion.

Libraries and Tools

Fine-tuning, Preparing models for fusion

  • PyTorch Classification: A PyTorch library for training/fine-tuning models (CNN, ViT, CLIP) on image classification tasks
  • ⭐ LLaMA Factory: A PyTorch library for fine-tuning LLMs

Model Fusion

  • ⭐ 🤗 FusionBench: A Comprehensive Benchmark of Deep Model Fusion.
  • ⭐ 🦙 MergeKit: A PyTorch library for merging large language models.

Version Control

  • Kandpal et al. Git-Theta: A Git Extension for Collaborative Development of Machine Learning Models arXiv:2306.04529

Other Applications of Model Fusion

Applications in Reinforcement Learning (RL)

  • (Survey Paper) Song Y, Suganthan P N, Pedrycz W, et al. Ensemble reinforcement learning: A survey. Applied Soft Computing, 2023.
  • ⭐ Lee K, Laskin M, Srinivas A, et al. “Sunrise: A simple unified framework for ensemble learning in deep reinforcement learning", ICML, 2021.
  • Ren J, Li Y, Ding Z, et al. “Probabilistic mixture-of-experts for efficient deep reinforcement learning". arXiv:2104.09122, 2021.
  • ⭐ Celik O, Taranovic A, Neumann G. “Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts". arXiv preprint arXiv:2403.06966, 2024.