CLIP Task Arithmetic¶

This tutorial demonstrates how to merge CLIP (Contrastive Language-Image Pre-training) models using the Task Arithmetic algorithm ¹ - a powerful model fusion technique that combines multiple task-specific models by manipulating their "task vectors" with configurable scaling factors.

Task Arithmetic is an advanced model fusion technique that operates on the concept of task vectors - the directional differences between a fine-tuned model and its pretrained base model. This approach provides more fine-grained control over the fusion process compared to simple averaging.

Mathematically, Task Arithmetic can be expressed as:

Step 1: Compute Task Vectors

\[ \tau_i = \theta_i - \theta_0 \]

Step 2: Scale and Combine Task Vectors

\[ \theta_{merged} = \theta_0 + \lambda \sum_{i=1}^{N} \tau_i \]

where:

\( \theta_{merged} \) is the final merged model parameters
\( \theta_0 \) is the pretrained base model parameters
\( \theta_i \) are the fine-tuned model parameters
\( \tau_i \) are the task vectors (learned adaptations)
\( \lambda \) is the scaling factor that controls the strength of task vector influence
\( N \) is the number of task-specific models

🔧 Standalone YAML Configuration¶

The example uses the following configuration that demonstrates merging CLIP models with task arithmetic on image classification datasets:

config/_get_started/clip_task_arithmetic.yaml
_target_: fusion_bench.programs.FabricModelFusionProgram
_recursive_: false
method:
  _target_: fusion_bench.method.TaskArithmeticAlgorithm
  scaling_factor: 0.7
modelpool:
  _target_: fusion_bench.modelpool.CLIPVisionModelPool
  models:
    _pretrained_: openai/clip-vit-base-patch32
    sun397: tanganke/clip-vit-base-patch32_sun397
    stanford-cars: tanganke/clip-vit-base-patch32_stanford-cars
taskpool:
  _target_: fusion_bench.taskpool.CLIPVisionModelTaskPool
  test_datasets:
    sun397:
      _target_: datasets.load_dataset
      path: tanganke/sun397
      split: test
    stanford-cars:
      _target_: datasets.load_dataset
      path: tanganke/stanford_cars
      split: test
  clip_model: openai/clip-vit-base-patch32
  processor: openai/clip-vit-base-patch32

Program Configuration: Specifies FabricModelFusionProgram to handle the fusion workflow

Method Configuration: Uses TaskArithmeticAlgorithm with a scaling factor, whose default value is set as 0.7. The option names in the configuration file are the same as those in the code.

TaskArithmeticAlgorithm.__init__()

Initializes the TaskArithmeticAlgorithm with the given scaling factor.

Parameters:

scaling_factor (int) –

The factor by which the task vectors will be scaled before merging.

Source code in fusion_bench/method/task_arithmetic/task_arithmetic.py

def __init__(self, scaling_factor: int, **kwargs):
    """
    Initializes the TaskArithmeticAlgorithm with the given scaling factor.

    Args:
        scaling_factor (int): The factor by which the task vectors will be scaled before merging.
    """
    super().__init__(**kwargs)

Model Pool: Contains the base pretrained model and fine-tuned variants
Task Pool: Defines evaluation datasets for performance assessment

🚀 Running the Example¶

Execute the task arithmetic fusion with the following command:

fusion_bench --config-path $PWD/config/_get_started --config-name clip_task_arithmetic

Hyperparameter Tuning¶

You can experiment with different scaling factors by overriding the configuration:

# More conservative fusion (less task-specific influence)
fusion_bench --config-path $PWD/config/_get_started --config-name clip_task_arithmetic \
    method.scale_factor=0.5

# More aggressive fusion (stronger task-specific influence)  
fusion_bench --config-path $PWD/config/_get_started --config-name clip_task_arithmetic \
    method.scale_factor=1.0

🐛 Debugging Configuration (VS Code)¶

.vscode/launch.json

{
    "name": "clip_task_arithmetic",
    "type": "debugpy",
    "request": "launch",
    "module": "fusion_bench.scripts.cli",
    "args": [
        "--config-path",
        "${workspaceFolder}/config/_get_started",
        "--config-name",
        "clip_task_arithmetic"
    ],
    "console": "integratedTerminal",
    "justMyCode": true,
    "env": {
        "HYDRA_FULL_ERROR": "1"
    }
}

G. Ilharco et al., “Editing Models with Task Arithmetic,” Mar. 31, 2023, arXiv: arXiv:2212.04089. doi: 10.48550/arXiv.2212.04089. ↩