Reading Lists¶
Info
working in progress. Any suggestions are welcome.
I've been compiling a comprehensive list of papers and resources that have been instrumental in my research journey. This collection is designed to serve as a valuable starting point for those interested in delving into the field of deep model fusion. If you have any suggestions for papers to add, please feel free to raise an issue or submit a pull request.
Note
Meaning of the symbols in the list:
- Highly recommended
- LLaMA model-related or Mistral-related work
- Code available on GitHub
- models or datasets available on Hugging Face
Survey Papers¶
-
E. Yang et al., “Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities.” arXiv, Aug. 14, 2024.
Quote
-
Yadav et al. A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning arXiv:2408.07057
- W. Li, Y. Peng, M. Zhang, L. Ding, H. Hu, and L. Shen, “Deep Model Fusion: A Survey.” arXiv, Sep. 27, 2023. doi: 10.48550/arXiv.2309.15698.
- H. Zheng et al., “Learn From Model Beyond Fine-Tuning: A Survey.” arXiv, Oct. 12, 2023.
Findings on Model Fusion¶
- Aakanksha et al. Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning arXiv:2410.10801
- Yadav et al. What Matters for Model Merging at Scale? arXiv:2410.03617
Model Ensemble¶
- Liu T Y, Soatto S. Tangent Model Composition for Ensembling and Continual Fine-tuning. arXiv, 2023.
-
Wan et al. Knowledge Fusion of Large Language Models arXiv:2401.10491
Quote
-
Wan F, Yang Z, Zhong L, et al. FuseChat: Knowledge Fusion of Chat Models. arXiv, 2024.
Quote
Model Merging¶
Mode Connectivity¶
Mode connectivity is such an important concept in model merging that it deserves its own page.
Weight Interpolation¶
-
Osowiechi et al. WATT: Weight Average Test-Time Adaptation of CLIP arXiv:2406.13875
Quote
-
Jiang et al. ForkMerge: Mitigating Negative Transfer in Auxiliary-Task Learning
Quote
-
Chronopoulou et al. Language and Task Arithmetic with Parameter-Efficient Layers for Zero-Shot Summarization arXiv:2311.09344
Quote
-
L. Yu, B. Yu, H. Yu, F. Huang, and Y. Li, “Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch,” Nov. 06, 2023, arXiv: arXiv:2311.03099. Available: http://arxiv.org/abs/2311.03099
- E. Yang et al., “AdaMerging: Adaptive Model Merging for Multi-Task Learning,” ICLR 2024, arXiv: arXiv:2310.02575. doi: 10.48550/arXiv.2310.02575.
- P. Yadav, D. Tam, L. Choshen, C. Raffel, and M. Bansal, “Resolving Interference When Merging Models,” Jun. 02, 2023, arXiv: arXiv:2306.01708. Available: http://arxiv.org/abs/2306.01708
- Guillermo Ortiz-Jimenez, Alessandro Favero, and Pascal Frossard, “Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models,” May 30, 2023, arXiv: arXiv:2305.12827. doi: 10.48550/arXiv.2305.12827.
- G. Ilharco et al., “Editing Models with Task Arithmetic,” Mar. 31, 2023, arXiv: arXiv:2212.04089. doi: 10.48550/arXiv.2212.04089.
-
Tang et al. Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion
Quote
-
Rame et al. Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards arXiv:2306.04488
-
Huang et al. LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition arXiv:2307.13269
Quote
-
Wu et al. Pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation
Quote
-
Chronopoulou et al. AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models arXiv:2302.07027
Quote
-
Zimmer et al. Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging arXiv:2306.16788
Quote
-
Wortsman et al. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time arXiv:2203.05482
Alignment-based Methods¶
-
Kinderman et al. Foldable SuperNets: Scalable Merging of Transformers with Different Initializations and Tasks arXiv:2410.01483
Quote
-
S. K. Ainsworth, J. Hayase, and S. Srinivasa, “Git Re-Basin: Merging Models modulo Permutation Symmetries,” ICLR 2023. Available: http://arxiv.org/abs/2209.04836
- George Stoica, Daniel Bolya, Jakob Bjorner, Taylor Hearn, and Judy Hoffman, “ZipIt! Merging Models from Different Tasks without Training,” May 04, 2023, arXiv: arXiv:2305.03053. Available: http://arxiv.org/abs/2305.03053
Subspace-based Methods¶
- Tang A, Shen L, Luo Y, et al. Concrete subspace learning based interference elimination for multi-task model fusion. arXiv preprint arXiv:2312.06173, 2023.
- X. Yi, S. Zheng, L. Wang, X. Wang, and L. He, “A safety realignment framework via subspace-oriented model fusion for large language models.” arXiv, May 14, 2024. doi: 10.48550/arXiv.2405.09055.
- Wang K, Dimitriadis N, Ortiz-Jimenez G, et al. Localizing Task Information for Improved Model Merging and Compression. arXiv preprint arXiv:2405.07813, 2024.
Online Model Merging¶
-
Alexandrov el al. Mitigating Catastrophic Forgetting in Language Transfer via Model Merging arXiv:2407.08699
Quote
-
Lu et al. Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment arXiv:2405.17931
- Izmailov et al. Averaging Weights Leads to Wider Optima and Better Generalization
- Kaddour et al. Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging arXiv:2209.14981
- Zhang et al. Lookahead Optimizer: k steps forward, 1 step back http://arxiv.org/abs/1907.08610
Model Mixing/Upscaling/Expansion¶
-
Samragh et al. Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization arXiv:2409.12903
Quote
-
Zhao et al. Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering arXiv:2409.16167
Quote
-
Tang et al. SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models arXiv:2408.10174
Quote
-
C. Chen et al., “Model Composition for Multimodal Large Language Models.” arXiv, Feb. 20, 2024. doi: 10.48550/arXiv.2402.12750.
- A. Tang, L. Shen, Y. Luo, N. Yin, L. Zhang, and D. Tao, “Merging Multi-Task Models via Weight-Ensembling Mixture of Experts,” Feb. 01, 2024, arXiv: arXiv:2402.00433. doi: 10.48550/arXiv.2402.00433.
-
Zhenyi Lu et al., "Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging" 10.48550/arXiv.2406.15479
Quote
-
Tang A, Shen L, Luo Y, et al. SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models. arXiv, 2024.
-
Kim et al. SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling arXiv:2312.15166
Quote
-
Komatsuzaki et al. Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints arXiv:2212.05055
Quote
Benchmarks¶
- Tam et al. Realistic Evaluation of Model Merging for Compositional Generalization arXiv:2409.18314
- Tang et al. FusionBench: A Comprehensive Benchmark of Deep Model Fusion.
Libraries and Tools¶
Fine-tuning, Preparing models for fusion¶
- PyTorch Classification: A PyTorch library for training/fine-tuning models (CNN, ViT, CLIP) on image classification tasks
- LLaMA Factory: A PyTorch library for fine-tuning LLMs
Model Fusion¶
- FusionBench: A Comprehensive Benchmark of Deep Model Fusion.
- MergeKit: A PyTorch library for merging large language models.
Version Control¶
- Kandpal et al. Git-Theta: A Git Extension for Collaborative Development of Machine Learning Models arXiv:2306.04529
Other Applications of Model Fusion¶
Applications in Reinforcement Learning (RL)¶
- (Survey Paper) Song Y, Suganthan P N, Pedrycz W, et al. Ensemble reinforcement learning: A survey. Applied Soft Computing, 2023.
- Lee K, Laskin M, Srinivas A, et al. “Sunrise: A simple unified framework for ensemble learning in deep reinforcement learning", ICML, 2021.
- Ren J, Li Y, Ding Z, et al. “Probabilistic mixture-of-experts for efficient deep reinforcement learning". arXiv:2104.09122, 2021.
- Celik O, Taranovic A, Neumann G. “Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts". arXiv preprint arXiv:2403.06966, 2024.