Skip to content

CLIP Simple Average

This tutorial demonstrates how to merge the vision encoders of CLIP (Contrastive Language-Image Pre-training) models using the Simple Average algorithm - a straightforward, hyperparameter-free approach to model fusion that combines multiple task-specific models into a single unified model.

Mathematically, the Simple Average algorithm can be expressed as:

\[ \theta_{merged} = \frac{1}{N} \sum_{i=1}^{N} \theta_{i} \]

where \( \theta_{merged} \) is the set of parameters for the merged model, \( N \) is the number of source models, and \( \theta_{i} \) are the parameters of the individual models.

This method works especially well for large models that have been fine-tuned on distinct downstream tasks3, or for models trained on the same task but with varying hyperparameter settings such as learning rate or batch size12.

πŸ”§ Standalone YAML Configuration

The example uses the following standalone configuration file that demonstrates merging CLIP models fine-tuned on different image classification datasets:

config/_get_started/clip_simple_average.yaml
_target_: fusion_bench.programs.FabricModelFusionProgram # (1)!
_recursive_: false
method: # (2)!
  _target_: fusion_bench.method.SimpleAverageAlgorithm
modelpool: # (3)!
  _target_: fusion_bench.modelpool.CLIPVisionModelPool
  models:
    _pretrained_: openai/clip-vit-base-patch32
    sun397: tanganke/clip-vit-base-patch32_sun397
    stanford-cars: tanganke/clip-vit-base-patch32_stanford-cars
taskpool: # (4)!
  _target_: fusion_bench.taskpool.CLIPVisionModelTaskPool
  test_datasets:
    sun397:
      _target_: datasets.load_dataset
      path: tanganke/sun397
      split: test
    stanford-cars:
      _target_: datasets.load_dataset
      path: tanganke/stanford_cars
      split: test
  clip_model: openai/clip-vit-base-patch32
  processor: openai/clip-vit-base-patch32
  1. This is the program to handle the model fusion workflow.
  2. This is the method config to perform model fusion.
  3. This is the model pool config containing the base and fine-tuned models.
  4. This is the task pool config defining evaluation datasets.

Configuration Breakdown

  • Program: This is the top level configuration that specifies the program to run. Here we specify the main program as FabricModelFusionProgram which handles the model fusion workflow.
  • Method: This is the method config to perform model fusion. Here we specify the method as SimpleAverageAlgorithm, which performs model fusion.
  • Model Pool: This is the model pool config containing the base and fine-tuned models. In this example, it contains two fine-tuned models:
    • sun397: Fine-tuned on SUN397 scene recognition dataset
    • stanford-cars: Fine-tuned on Stanford Cars dataset
  • Task Pool: A task pool object is responsible for evaluating the merged model's performance. In this example, we specify t

πŸš€ Running the Example

Execute the model merging process with the following command:

fusion_bench --config-path $PWD/config/_get_started --config-name clip_simple_average

This command will:

  1. Load the specified CLIP models from the model pool
  2. Apply the Simple Average algorithm to merge their parameters
  3. Evaluate the merged model on the specified test datasets
  4. Generate performance reports comparing the merged model against individual models

πŸŽ“ Key Learning Points

This example teaches you:

  1. Basic Configuration: How to structure a FusionBench configuration file
  2. Model Pool Setup: How to specify multiple models for merging
  3. Task Pool Configuration: How to define evaluation datasets
  4. Simple Execution: How to run model merging with a single command

πŸ› Debugging Configuration (VS Code)

.vscode/launch.json
{
    "name": "clip_simple_average",
    "type": "debugpy",
    "request": "launch",
    "module": "fusion_bench.scripts.cli",
    "args": [
        "--config-path",
        "${workspaceFolder}/config/_get_started",
        "--config-name",
        "clip_simple_average"
    ],
    "console": "integratedTerminal",
    "justMyCode": true,
    "env": {
        "HYDRA_FULL_ERROR": "1"
    }
}

  1. M. Wortsman et al., β€œModel soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time,” July 01, 2022, arXiv: arXiv:2203.05482. Accessed: May 08, 2023. Available: http://arxiv.org/abs/2203.05482 

  2. A. Chegini et al., β€œModel Soup for Better RLHF: Weight Space Averaging to Improve Alignment in LLMs”. 

  3. P. Yadav et al., β€œWhat Matters for Model Merging at Scale?,” Oct. 04, 2024, arXiv: arXiv:2410.03617. Accessed: Oct. 11, 2024. Available: http://arxiv.org/abs/2410.03617