OPCM (Orthogonal Projection for Continual Merging)¶

OPCM addresses the continual model merging problem, where models arrive sequentially and must be merged one by one into a single model. The core insight is that when merging a new task model into an existing merged model, the task vectors often have overlapping components that cause negative interference. OPCM uses Singular Value Decomposition (SVD) to project each new task vector into the subspace orthogonal to the dominant directions of the current merged task vector¹.

The Continual Merging Setting. In the continual (or online) setting, models $\theta_1, \theta_2, \dots, \theta_K$ arrive sequentially. After merging the first $t$ models, the merged model $\theta^{(t)}$ must be updated to incorporate $\theta_{t+1}$ without reprocessing $\theta_1, \dots, \theta_t$. OPCM maintains a running merge that preserves previous task knowledge while incorporating new tasks.

Mathematical Formulation¶

Let $\theta_0$ be the pretrained (anchor) model. Given the current merged model $\theta^{(t-1)}$ and a new task model $\theta_t$, OPCM computes:

Task Vectors: $$\tau_{merged}^{(t-1)} = \theta^{(t-1)} - \theta_0$$ $$\tau_{new} = \theta_t - \theta_0$$
SVD of Merged Task Vector (for linear weight matrices): $$\tau_{merged}^{(t-1)} = U \Sigma V^\top$$
Projection and Zeroing: Project the new task vector into the SVD basis of the merged task vector: $$P = U^\top \tau_{new} V$$ Zero out the diagonal of $P$ to remove alignment with singular vectors: $$P_{ii} \leftarrow 0$$ Zero out the top-left block (dominant singular directions) up to a split rank $r$: $$P_{ij} \leftarrow 0 \quad \text{for } i, j \leq r$$

The split rank $r$ is determined by the parameter $\alpha$: it is the smallest index such that the cumulative sum of singular values exceeds a fraction $\alpha$ of the total: $$r = \min \left\{ k : \frac{\sum_{i=1}^k \sigma_i}{\sum_{i} \sigma_i} > \alpha \right\}$$

Reconstruct the Cleaned Task Vector: $$\tilde{\tau}_{new} = U P V^\top$$
Update the Merged Model: $$\theta^{(t)} = \theta_0 + \frac{\lambda_{t-1} \tau_{merged}^{(t-1)} + \tilde{\tau}_{new}}{\lambda_t}$$

where $\lambda_t$ is a scaling factor computed to maintain a stable task vector norm, approximately growing as $\sqrt{t}$.

For non-linear parameters (biases, LayerNorm weights), OPCM uses a simpler averaging formula without SVD projection: $$\theta^{(t)} = \theta_0 + \frac{\lambda_{t-1} (\theta^{(t-1)} - \theta_0) + (\theta_t - \theta_0)}{\lambda_t}$$

OPCM Variants in FusionBench¶

1. OPCM (General)¶

The OPCM class in opcm_general.py is the general-purpose implementation with support for Ray-based distributed merging.

OPCM(
    alpha=0.5,                    # SVD projection threshold
    shuffle_order=True,           # shuffle model order
    seed=None,                    # random seed
    save_on_every_step=True,      # save checkpoint at each step
    evaluate_on_every_step=False, # evaluate at each step
    num_ray_actors=0,             # parallel actors for distributed merge
)

2. OPCM for CLIP¶

The OPCMForCLIP class in opcm.py is a CLIP-specific implementation that includes task pool integration for evaluation at each merging step.

OPCMForCLIP(
    alpha=0.5,
    shuffle_order=True,
    seed=None,
    save_on_every_step=True,
    evaluate_on_every_step=True,
)

3. Continual Task Arithmetic¶

A baseline continual merging method that applies task arithmetic incrementally without SVD projection.

ContinualTaskArithmeticForCLIP(
    scaling_factor=0.3,
    shuffle_order=True,
    seed=None,
    save_on_every_step=True,
    evaluate_on_every_step=True,
)

4. Continual TIES-Merging¶

A continual variant of TIES-Merging that resolves sign conflicts at each step.

ContinualTiesMergingForCLIP(
    scaling_factor=0.5,
    threshold=20,
    remove_keys=[],
    merge_func="sum",
    shuffle_order=True,
    seed=None,
    save_on_every_step=True,
    evaluate_on_every_step=True,
)

5. Continual Weight Average¶

A simple incremental averaging baseline: $\theta^{(t)} = \frac{t \cdot \theta^{(t-1)} + \theta_t}{t + 1}$.

ContinualWeightAverageForCLIP(
    shuffle_order=True,
    seed=None,
    save_on_every_step=True,
    evaluate_on_every_step=True,
)

SVD Projection Details¶

The SVD projection is the core of OPCM. The key idea is:

Dominant directions of the merged task vector represent the "shared" knowledge accumulated so far.
New task vectors projected onto these dominant directions cause interference.
By zeroing out the projection onto the top-$r$ singular directions, we keep the new information that is orthogonal (novel) to what has been merged.
The diagonal zeroing step removes alignment with each singular vector individually, preventing self-reinforcement.

The parameter $\alpha$ controls the aggressiveness: - $\alpha = 0.5$ means we zero out directions explaining 50% of the variance. - Higher $\alpha$ zeros out more directions, being more conservative about what gets merged. - Lower $\alpha$ zeros out fewer directions, allowing more overlap.

Examples¶

CLI Usage¶

Configuration template for OPCM:

config/method/opcm/opcm.yaml

# =============================================================================
# FusionBench Method Configuration: OPCM
# =============================================================================
# Incrementally merges models via SVD projection and evaluation per step.
# =============================================================================
_target_: fusion_bench.method.opcm.opcm.OPCMForCLIP
# shuffle the order of the models
shuffle_order: true
# the scaling factor for the SVD projection
alpha: 0.5
# the random seed to use
seed: null
# save the merged model on every step
save_on_every_step: true
# evaluate the merged model on every step
evaluate_on_every_step: true

Run OPCM for CLIP:

fusion_bench method=opcm/opcm \
    method.alpha=0.5 \
    method.shuffle_order=true \
    method.evaluate_on_every_step=true \
    modelpool=CLIPVisionModelPool/clip-vit-base-patch32_TA8 \
    taskpool=CLIPVisionModelTaskPool/clip-vit-classification_TA8

Configuration for the general OPCM (with Ray support):

config/method/opcm/opcm_general.yaml

# =============================================================================
# FusionBench Method Configuration: OPCM
# =============================================================================
# Incrementally merges models via SVD projection and evaluation per step.
# =============================================================================
_target_: fusion_bench.method.opcm.opcm_general.OPCM
# shuffle the order of the models
shuffle_order: true
# the scaling factor for the SVD projection
alpha: 0.5
# the random seed to use
seed: null
# save the merged model on every step
save_on_every_step: true
# evaluate the merged model on every step
evaluate_on_every_step: true
# the number of ray actors to use for distributed merging
num_ray_actors: 0

Run general OPCM with distributed merging:

fusion_bench method=opcm/opcm_general \
    method.alpha=0.5 \
    method.num_ray_actors=4 \
    modelpool=CLIPVisionModelPool/clip-vit-base-patch32_TA8 \
    taskpool=CLIPVisionModelTaskPool/clip-vit-classification_TA8

Configuration for continual task arithmetic:

config/method/opcm/task_arithmetic.yaml

# =============================================================================
# FusionBench Method Configuration: Continual Task Arithmetic
# =============================================================================
# Applies task arithmetic incrementally across a stream of models.
# Maintains per-step save/eval similar to OPCM.
# =============================================================================
_target_: fusion_bench.method.opcm.task_arithmetic.ContinualTaskArithmeticForCLIP
scaling_factor: 0.3
# shuffle the order of the models
shuffle_order: true
# the random seed to use
seed: null
# save the merged model on every step
save_on_every_step: true
# evaluate the merged model on every step
evaluate_on_every_step: true

Run continual task arithmetic:

fusion_bench method=opcm/task_arithmetic \
    method.scaling_factor=0.3 \
    method.shuffle_order=true \
    modelpool=CLIPVisionModelPool/clip-vit-base-patch32_TA8 \
    taskpool=CLIPVisionModelTaskPool/clip-vit-classification_TA8

Configuration for continual TIES-Merging:

config/method/opcm/ties_merging.yaml

# =============================================================================
# FusionBench Method Configuration: Continual TIES Merging
# =============================================================================
# Continual variant of TIES merging with per-step save/eval instrumentation.
# =============================================================================
_target_: fusion_bench.method.opcm.ties_merging.ContinualTiesMergingForCLIP
# Scaling factor $\lambda$
scaling_factor: 0.5
threshold: 20
# List of keys to remove from the state dict, default is empty
remove_keys: []
# Function to merge the models, default is sum. Options are 'sum', 'mean', and 'max'
merge_func: sum
# shuffle the order of the models
shuffle_order: true
# the random seed to use
seed: null
# save the merged model on every step
save_on_every_step: true
# evaluate the merged model on every step
evaluate_on_every_step: true

Run continual TIES-Merging:

fusion_bench method=opcm/ties_merging \
    method.scaling_factor=0.5 \
    method.threshold=20 \
    method.merge_func=sum \
    modelpool=CLIPVisionModelPool/clip-vit-base-patch32_TA8 \
    taskpool=CLIPVisionModelTaskPool/clip-vit-classification_TA8

Configuration for continual weight average:

config/method/opcm/weight_average.yaml

# =============================================================================
# FusionBench Method Configuration: Continual Weighted Average
# =============================================================================
# Incrementally averages model weights as new models arrive.
# =============================================================================
_target_: fusion_bench.method.opcm.weight_average.ContinualWeightAverageForCLIP
# shuffle the order of the models
shuffle_order: true
# the random seed to use
seed: null
# save the merged model on every step
save_on_every_step: true
# evaluate the merged model on every step
evaluate_on_every_step: true

Run continual weight average:

fusion_bench method=opcm/weight_average \
    method.shuffle_order=true \
    modelpool=CLIPVisionModelPool/clip-vit-base-patch32_TA8 \
    taskpool=CLIPVisionModelTaskPool/clip-vit-classification_TA8

API Usage¶

from fusion_bench.method.opcm import OPCM

# Instantiate the algorithm
algorithm = OPCM(
    alpha=0.5,
    shuffle_order=True,
    save_on_every_step=True,
    evaluate_on_every_step=False,
)

# Run continual merging on a model pool
merged_model = algorithm.run(modelpool)

Runtime Behavior¶

All OPCM variants share these behavioral features:

Sequential Processing: Models are processed one at a time. The order can be shuffled (shuffle_order=True) or fixed.
Checkpoint Saving: When save_on_every_step=True, intermediate merged models are saved to {log_dir}/checkpoints/merged_model_{step}/.
Evaluation: When evaluate_on_every_step=True, the merged model is evaluated after each step. Reports are saved as report_{step}.json.
Logging: TensorBoard logs track task vector norms, $\lambda_t$ values, and other metrics.
Progress Bars: Uses tqdm for both model-level and layer-level progress tracking.
Profiling: Built-in SimpleProfilerMixin reports timing for loading, merging, saving, and evaluation.

Distributed Merging with Ray¶

The general OPCM class supports distributed merging via Ray actors. Set num_ray_actors > 0 to spawn Ray actors that process layers in parallel:

fusion_bench method=opcm/opcm_general \
    method.num_ray_actors=8 \
    ...

Each Ray actor independently computes the SVD projection for assigned linear layers, while the main process handles bias and non-linear parameters.

Utility Functions¶

Key utilities in fusion_bench/method/opcm/utils.py:

svd(w, full_matrices, accelerator): Performs SVD on a weight tensor with optional device transfer.
frobenius_inner_product(w1, w2): Computes the Frobenius inner product of two matrices.
get_task_vector_norm(model, pretrained_model): Computes the L2 norm of the task vector.

Hyperparameter Guidelines¶

alpha: Controls the fraction of singular directions to zero out. Typical values: 0.3-0.7. Higher values are more conservative, zeroing out more of the new task's overlap with existing knowledge.
shuffle_order: Randomizing model order can help prevent bias toward earlier models. Set seed for reproducibility.

Implementation Details¶

[OPCM][fusion_bench.method.opcm.OPCM]
OPCMForCLIP
[ContinualTaskArithmeticForCLIP][fusion_bench.method.opcm.ContinualTaskArithmeticForCLIP]
[ContinualTiesMergingForCLIP][fusion_bench.method.opcm.ContinualTiesMergingForCLIP]
[ContinualWeightAverageForCLIP][fusion_bench.method.opcm.ContinualWeightAverageForCLIP]

(ICML 2024) Tang et al. Merging Multi-Task Models via Weight-Ensembling Mixture of Experts. https://arxiv.org/abs/2402.00433 ↩