CLIP Simple Average¶
This tutorial demonstrates how to merge the vision encoders of CLIP (Contrastive Language-Image Pre-training) models using the Simple Average algorithm - a straightforward, hyperparameter-free approach to model fusion that combines multiple task-specific models into a single unified model.
Mathematically, the Simple Average algorithm can be expressed as:
where \( \theta_{merged} \) is the set of parameters for the merged model, \( N \) is the number of source models, and \( \theta_{i} \) are the parameters of the individual models.
This method works especially well for large models that have been fine-tuned on distinct downstream tasks3, or for models trained on the same task but with varying hyperparameter settings such as learning rate or batch size12.
π§ Standalone YAML Configuration¶
The example uses the following standalone configuration file that demonstrates merging CLIP models fine-tuned on different image classification datasets:
- This is the program to handle the model fusion workflow.
- This is the method config to perform model fusion.
- This is the model pool config containing the base and fine-tuned models.
- This is the task pool config defining evaluation datasets.
Configuration Breakdown¶
- Program: This is the top level configuration that specifies the program to run.
Here we specify the main program as
FabricModelFusionProgram
which handles the model fusion workflow. - Method: This is the method config to perform model fusion.
Here we specify the method as
SimpleAverageAlgorithm
, which performs model fusion. - Model Pool: This is the model pool config containing the base and fine-tuned models.
In this example, it contains two fine-tuned models:
sun397
: Fine-tuned on SUN397 scene recognition datasetstanford-cars
: Fine-tuned on Stanford Cars dataset
- Task Pool: A task pool object is responsible for evaluating the merged model's performance. In this example, we specify t
π Running the Example¶
Execute the model merging process with the following command:
This command will:
- Load the specified CLIP models from the model pool
- Apply the Simple Average algorithm to merge their parameters
- Evaluate the merged model on the specified test datasets
- Generate performance reports comparing the merged model against individual models
π Key Learning Points¶
This example teaches you:
- Basic Configuration: How to structure a FusionBench configuration file
- Model Pool Setup: How to specify multiple models for merging
- Task Pool Configuration: How to define evaluation datasets
- Simple Execution: How to run model merging with a single command
π Debugging Configuration (VS Code)¶
{
"name": "clip_simple_average",
"type": "debugpy",
"request": "launch",
"module": "fusion_bench.scripts.cli",
"args": [
"--config-path",
"${workspaceFolder}/config/_get_started",
"--config-name",
"clip_simple_average"
],
"console": "integratedTerminal",
"justMyCode": true,
"env": {
"HYDRA_FULL_ERROR": "1"
}
}
-
M. Wortsman et al., βModel soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time,β July 01, 2022, arXiv: arXiv:2203.05482. Accessed: May 08, 2023. Available: http://arxiv.org/abs/2203.05482 ↩
-
A. Chegini et al., βModel Soup for Better RLHF: Weight Space Averaging to Improve Alignment in LLMsβ. ↩
-
P. Yadav et al., βWhat Matters for Model Merging at Scale?,β Oct. 04, 2024, arXiv: arXiv:2410.03617. Accessed: Oct. 11, 2024. Available: http://arxiv.org/abs/2410.03617 ↩