Reading Lists¶

Info

working in progress. Any suggestions are welcome.

I've been compiling a comprehensive list of papers and resources that have been instrumental in my research journey. This collection is designed to serve as a valuable starting point for those interested in delving into the field of deep model fusion. If you have any suggestions for papers to add, please feel free to raise an issue or submit a pull request.

Note

Meaning of the symbols in the list:

Highly recommended
LLaMA model-related or Mistral-related work
Code available on GitHub
models or datasets available on Hugging Face

Survey Papers¶

E. Yang et al., “Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities.” arXiv, Aug. 14, 2024.

Quote
Yadav et al. A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning arXiv:2408.07057
W. Li, Y. Peng, M. Zhang, L. Ding, H. Hu, and L. Shen, “Deep Model Fusion: A Survey.” arXiv, Sep. 27, 2023. doi: 10.48550/arXiv.2309.15698.
H. Zheng et al., “Learn From Model Beyond Fine-Tuning: A Survey.” arXiv, Oct. 12, 2023.

Findings on Model Fusion¶

Aakanksha et al. Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning arXiv:2410.10801
Yadav et al. What Matters for Model Merging at Scale? arXiv:2410.03617

Model Ensemble¶

Liu T Y, Soatto S. Tangent Model Composition for Ensembling and Continual Fine-tuning. arXiv, 2023.
Wan et al. Knowledge Fusion of Large Language Models arXiv:2401.10491

Quote
Wan F, Yang Z, Zhong L, et al. FuseChat: Knowledge Fusion of Chat Models. arXiv, 2024.

Quote

Model Merging¶

Mode Connectivity¶

Mode connectivity is such an important concept in model merging that it deserves its own page.

Weight Interpolation¶

Osowiechi et al. WATT: Weight Average Test-Time Adaptation of CLIP arXiv:2406.13875

Quote
Jiang et al. ForkMerge: Mitigating Negative Transfer in Auxiliary-Task Learning

Quote
Chronopoulou et al. Language and Task Arithmetic with Parameter-Efficient Layers for Zero-Shot Summarization arXiv:2311.09344

Quote
L. Yu, B. Yu, H. Yu, F. Huang, and Y. Li, “Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch,” Nov. 06, 2023, arXiv: arXiv:2311.03099. Available: http://arxiv.org/abs/2311.03099
E. Yang et al., “AdaMerging: Adaptive Model Merging for Multi-Task Learning,” ICLR 2024, arXiv: arXiv:2310.02575. doi: 10.48550/arXiv.2310.02575.
P. Yadav, D. Tam, L. Choshen, C. Raffel, and M. Bansal, “Resolving Interference When Merging Models,” Jun. 02, 2023, arXiv: arXiv:2306.01708. Available: http://arxiv.org/abs/2306.01708
Guillermo Ortiz-Jimenez, Alessandro Favero, and Pascal Frossard, “Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models,” May 30, 2023, arXiv: arXiv:2305.12827. doi: 10.48550/arXiv.2305.12827.
G. Ilharco et al., “Editing Models with Task Arithmetic,” Mar. 31, 2023, arXiv: arXiv:2212.04089. doi: 10.48550/arXiv.2212.04089.
Tang et al. Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion

Quote
Rame et al. Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards arXiv:2306.04488
Huang et al. LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition arXiv:2307.13269

Quote
Wu et al. Pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation

Quote
Chronopoulou et al. AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models arXiv:2302.07027

Quote
Zimmer et al. Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging arXiv:2306.16788

Quote
Wortsman et al. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time arXiv:2203.05482

Alignment-based Methods¶

Kinderman et al. Foldable SuperNets: Scalable Merging of Transformers with Different Initializations and Tasks arXiv:2410.01483

Quote
S. K. Ainsworth, J. Hayase, and S. Srinivasa, “Git Re-Basin: Merging Models modulo Permutation Symmetries,” ICLR 2023. Available: http://arxiv.org/abs/2209.04836
George Stoica, Daniel Bolya, Jakob Bjorner, Taylor Hearn, and Judy Hoffman, “ZipIt! Merging Models from Different Tasks without Training,” May 04, 2023, arXiv: arXiv:2305.03053. Available: http://arxiv.org/abs/2305.03053

Subspace-based Methods¶

Tang A, Shen L, Luo Y, et al. Concrete subspace learning based interference elimination for multi-task model fusion. arXiv preprint arXiv:2312.06173, 2023.
X. Yi, S. Zheng, L. Wang, X. Wang, and L. He, “A safety realignment framework via subspace-oriented model fusion for large language models.” arXiv, May 14, 2024. doi: 10.48550/arXiv.2405.09055.
Wang K, Dimitriadis N, Ortiz-Jimenez G, et al. Localizing Task Information for Improved Model Merging and Compression. arXiv preprint arXiv:2405.07813, 2024.

Online Model Merging¶

Alexandrov el al. Mitigating Catastrophic Forgetting in Language Transfer via Model Merging arXiv:2407.08699

Quote
Lu et al. Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment arXiv:2405.17931
Izmailov et al. Averaging Weights Leads to Wider Optima and Better Generalization
Kaddour et al. Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging arXiv:2209.14981
Zhang et al. Lookahead Optimizer: k steps forward, 1 step back http://arxiv.org/abs/1907.08610

Model Mixing/Upscaling/Expansion¶

Samragh et al. Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization arXiv:2409.12903

Quote
Zhao et al. Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering arXiv:2409.16167

Quote
Tang et al. SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models arXiv:2408.10174

Quote
C. Chen et al., “Model Composition for Multimodal Large Language Models.” arXiv, Feb. 20, 2024. doi: 10.48550/arXiv.2402.12750.
A. Tang, L. Shen, Y. Luo, N. Yin, L. Zhang, and D. Tao, “Merging Multi-Task Models via Weight-Ensembling Mixture of Experts,” Feb. 01, 2024, arXiv: arXiv:2402.00433. doi: 10.48550/arXiv.2402.00433.
Zhenyi Lu et al., "Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging" 10.48550/arXiv.2406.15479

Quote
Tang A, Shen L, Luo Y, et al. SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models. arXiv, 2024.
Kim et al. SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling arXiv:2312.15166

Quote
Komatsuzaki et al. Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints arXiv:2212.05055

Quote

Benchmarks¶

Tam et al. Realistic Evaluation of Model Merging for Compositional Generalization arXiv:2409.18314
Tang et al. FusionBench: A Comprehensive Benchmark of Deep Model Fusion.

Libraries and Tools¶

Fine-tuning, Preparing models for fusion¶

PyTorch Classification: A PyTorch library for training/fine-tuning models (CNN, ViT, CLIP) on image classification tasks
LLaMA Factory: A PyTorch library for fine-tuning LLMs

Model Fusion¶

FusionBench: A Comprehensive Benchmark of Deep Model Fusion.
MergeKit: A PyTorch library for merging large language models.

Version Control¶

Kandpal et al. Git-Theta: A Git Extension for Collaborative Development of Machine Learning Models arXiv:2306.04529

Other Applications of Model Fusion¶

Applications in Reinforcement Learning (RL)¶

(Survey Paper) Song Y, Suganthan P N, Pedrycz W, et al. Ensemble reinforcement learning: A survey. Applied Soft Computing, 2023.
Lee K, Laskin M, Srinivas A, et al. “Sunrise: A simple unified framework for ensemble learning in deep reinforcement learning", ICML, 2021.
Ren J, Li Y, Ding Z, et al. “Probabilistic mixture-of-experts for efficient deep reinforcement learning". arXiv:2104.09122, 2021.
Celik O, Taranovic A, Neumann G. “Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts". arXiv preprint arXiv:2403.06966, 2024.