Skip to content

Task Arithmetic

In the rapidly advancing field of machine learning, multi-task learning has emerged as a powerful paradigm, allowing models to leverage information from multiple tasks to improve performance and generalization. One intriguing method in this domain is Task Arithmetic, which involves the combination of task-specific vectors derived from model parameters.

Image title
Task Arithmetic. This figure is credited to 2

Task Vector. A task vector is used to encapsulate the adjustments needed by a model to specialize in a specific task. It is derived from the differences between a pre-trained model's parameters and those fine-tuned for a particular task. Formally, if \(\theta_i\) represents the model parameters fine-tuned for the i-th task and \(\theta_0\) denotes the parameters of the pre-trained model, the task vector for the i-th task is defined as:

\[\tau_i = \theta_i - \theta_0\]

This representation is crucial for methods like Task Arithmetic, where multiple task vectors are aggregated and scaled to form a comprehensive multi-task model.

Task Arithmetic1 begins by computing a task vector \(\tau_i\) for each individual task, using the set of model parameters \(\theta_0 \cup \{\theta_i\}_i\) where \(\theta_0\) is the pre-trained model and \(\theta_i\) are the fine-tuned parameters for i-th task. These task vectors are then aggregated to form a multi-task vector. Subsequently, the multi-task vector is combined with the pre-trained model parameters to obtain the final multi-task model. This process involves scaling the combined vector element-wise by a scaling coefficient (denoted as \(\lambda\)), before adding it to the initial pre-trained model parameters. The resulting formulation for obtaining a multi-task model is expressed as

\[ \theta = \theta_0 + \lambda \sum_{i} \tau_i. \]

The choice of the scaling coefficient \(\lambda\) plays a crucial role in the final model performance. Typically, \(\lambda\) is chosen based on validation set performance.

Recent work has also explored task arithmetic in the tangent space, which can provide improved editing of pre-trained models2.

Examples

CLI Usage

Configuration template for the Task Arithmetic algorithm:

config/method/task_arithmetic.yaml
_target_: fusion_bench.method.TaskArithmeticAlgorithm
scaling_factor: 0.3

Use the following command to run the Task Arithmetic algorithm:

fusion_bench method=task_arithmetic ...

For example, to run the Task Arithmetic algorithm on two models with scaling factor 0.5:

fusion_bench method=task_arithmetic \
    method.scaling_factor=0.5 \
  modelpool=CLIPVisionModelPool/clip-vit-base-patch32_svhn_and_mnist \
  taskpool=CLIPVisionModelTaskPool/clip-vit-base-patch32_svhn_and_mnist

where the configuration for the model pool is:

config/modelpool/CLIPVisionModelPool/clip-vit-base-patch32_svhn_and_mnist.yaml
_target_: fusion_bench.modelpool.CLIPVisionModelPool
_recursive_: False
processor: openai/clip-vit-base-patch32
models:
  _pretrained_: openai/clip-vit-base-patch32
  svhn: tanganke/clip-vit-base-patch32_svhn
  mnist: tanganke/clip-vit-base-patch32_mnist
platform: hf

and the configuration for the task pool:

config/taskpool/CLIPVisionModelTaskPool/clip-vit-base-patch32_svhn_and_mnist.yaml
defaults:
  - /dataset/image_classification/test@test_datasets:
      - svhn
      - mnist
_target_: fusion_bench.taskpool.CLIPVisionModelTaskPool
_recursive_: false
test_datasets: ??? # The datasets to evaluate the model on
base_model: openai/clip-vit-base-patch32
clip_model: ${.base_model} # The base model to use
processor: ${.base_model} # The base model to use
data_processor: ${.processor}
dataloader_kwargs:
  batch_size: 128 # The batch size for the data loader
  num_workers: 8 # The number of worker processes for data loading
  pin_memory: True # Whether to pin memory in data loader
  drop_last: False # Whether to drop the last incomplete batch
  shuffle: False # Whether to shuffle the data
# === layer-wise feature saving ===
# The path to save the features to, if none then the features are not saved
# This is the path to a directory, the features of task `task_name` will be saved in `feature_save_path/task_name.csv`
layer_wise_feature_save_path: null
layer_wise_feature_first_token_only: true # Whether to save only the first token of the features
# The maximum number of samples to save the features for
layer_wise_feature_max_num: 1000

Use Task Arithmetic to merge 8 CLIP-ViT-B-32 models from different image classification tasks and evaluate the performance of the merged model.

fusion_bench \
  method=task_arithmetic \
  modelpool=CLIPVisionModelPool/clip-vit-base-patch32_TA8_model_only \
  taskpool=CLIPVisionModelTaskPool/clip-vit-classification_TA8

API Usage

To use the Task Arithmetic algorithm, you can use the TaskArithmeticAlgorithm class from the fusion_bench.method module.

from torch import nn
from fusion_bench.method.task_arithmetic import TaskArithmeticAlgorithm

# Instantiate the TaskArithmeticAlgorithm
algorithm = TaskArithmeticAlgorithm(scaling_factor=0.5)

# Assume we have a dict of PyTorch models (nn.Module instances) that we want to merge.
# The models should all have the same architecture.
# the dict must contain the pre-trained model with the key '_pretrained_', and arbitrary number of fine-tuned models.
models = {'_pretrained_': nn.Linear(10,10), 'model_1': nn.Linear(10,10), 'model_2': nn.Linear(10,10)}

# Run the algorithm on the models.
# This will return a new model that is the result of task arithmetic on the input models.
merged_model = algorithm.run(models)

Implementation Details


  1. (ICLR 2023) Editing Models with Task Arithmetic. http://arxiv.org/abs/2212.04089 

  2. (NIPS 2023 Oral) Guillermo Ortiz-Jimenez, Alessandro Favero, and Pascal Frossard, “Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models,” doi: 10.48550/arXiv.2305.12827. 

  3. (ICLR 2024) AdaMerging: Adaptive Model Merging for Multi-Task Learning. http://arxiv.org/abs/2310.02575