CLIP Simple Average¶

This tutorial demonstrates how to merge the vision encoders of CLIP (Contrastive Language-Image Pre-training) models using the Simple Average algorithm - a straightforward, hyperparameter-free approach to model fusion that combines multiple task-specific models into a single unified model.

Mathematically, the Simple Average algorithm can be expressed as:

\[ \theta_{merged} = \frac{1}{N} \sum_{i=1}^{N} \theta_{i} \]

where \( \theta_{merged} \) is the set of parameters for the merged model, \( N \) is the number of source models, and \( \theta_{i} \) are the parameters of the individual models.

This method works especially well for large models that have been fine-tuned on distinct downstream tasks³, or for models trained on the same task but with varying hyperparameter settings such as learning rate or batch size¹².

🔧 Standalone YAML Configuration¶

The example uses the following standalone configuration file that demonstrates merging CLIP models fine-tuned on different image classification datasets:

config/_get_started/clip_simple_average.yaml
_target_: fusion_bench.programs.FabricModelFusionProgram # (1)!
_recursive_: false
method: # (2)!
  _target_: fusion_bench.method.SimpleAverageAlgorithm
modelpool: # (3)!
  _target_: fusion_bench.modelpool.CLIPVisionModelPool
  models:
    _pretrained_: openai/clip-vit-base-patch32
    sun397: tanganke/clip-vit-base-patch32_sun397
    stanford-cars: tanganke/clip-vit-base-patch32_stanford-cars
taskpool: # (4)!
  _target_: fusion_bench.taskpool.CLIPVisionModelTaskPool
  test_datasets:
    sun397:
      _target_: datasets.load_dataset
      path: tanganke/sun397
      split: test
    stanford-cars:
      _target_: datasets.load_dataset
      path: tanganke/stanford_cars
      split: test
  clip_model: openai/clip-vit-base-patch32
  processor: openai/clip-vit-base-patch32

This is the program to handle the model fusion workflow.
This is the method config to perform model fusion.
This is the model pool config containing the base and fine-tuned models.
This is the task pool config defining evaluation datasets.

Configuration Breakdown¶

Program: This is the top level configuration that specifies the program to run. Here we specify the main program as FabricModelFusionProgram which handles the model fusion workflow.
Method: This is the method config to perform model fusion. Here we specify the method as SimpleAverageAlgorithm, which performs model fusion.
Model Pool: This is the model pool config containing the base and fine-tuned models. In this example, it contains two fine-tuned models:
- sun397: Fine-tuned on SUN397 scene recognition dataset
- stanford-cars: Fine-tuned on Stanford Cars dataset
Task Pool: A task pool object is responsible for evaluating the merged model's performance. In this example, we specify t

🚀 Running the Example¶

Execute the model merging process with the following command:

fusion_bench --config-path $PWD/config/_get_started --config-name clip_simple_average

This command will:

Load the specified CLIP models from the model pool
Apply the Simple Average algorithm to merge their parameters
Evaluate the merged model on the specified test datasets
Generate performance reports comparing the merged model against individual models

🎓 Key Learning Points¶

This example teaches you:

Basic Configuration: How to structure a FusionBench configuration file
Model Pool Setup: How to specify multiple models for merging
Task Pool Configuration: How to define evaluation datasets
Simple Execution: How to run model merging with a single command

🐛 Debugging Configuration (VS Code)¶

.vscode/launch.json

{
    "name": "clip_simple_average",
    "type": "debugpy",
    "request": "launch",
    "module": "fusion_bench.scripts.cli",
    "args": [
        "--config-path",
        "${workspaceFolder}/config/_get_started",
        "--config-name",
        "clip_simple_average"
    ],
    "console": "integratedTerminal",
    "justMyCode": true,
    "env": {
        "HYDRA_FULL_ERROR": "1"
    }
}

M. Wortsman et al., “Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time,” July 01, 2022, arXiv: arXiv:2203.05482. Accessed: May 08, 2023. Available: http://arxiv.org/abs/2203.05482 ↩
A. Chegini et al., “Model Soup for Better RLHF: Weight Space Averaging to Improve Alignment in LLMs”. ↩
P. Yadav et al., “What Matters for Model Merging at Scale?,” Oct. 04, 2024, arXiv: arXiv:2410.03617. Accessed: Oct. 11, 2024. Available: http://arxiv.org/abs/2410.03617 ↩