RegMean¶

RegMean (Regression Mean) is a dataless knowledge fusion approach that formulates model merging as a linear regression problem¹. The algorithm aims to find optimal weights for each linear layer in the merged model by minimizing the discrepancy in predictions between the merge and candidate models.

Algorithm Overview¶

For a transformer layer \(l\), to obtain the merge weights for a linear layer \(W^{(l)}_{M}\), RegMean provides a precise closed-form solution for merging those from \(K\) candidate models:

\[W^{(l)}_{M} = \left[\sum_{i=1}^{K} (X^{(l)}_i)^{\top} X^{(l)}_i\right]^{-1} \sum_{i=1}^{K} (X^{(l)}_i)^{\top} X^{(l)}_i W^{(l)}_i\]

where:

\(W^{(l)}_i\) is the weight matrix of the \(i\)-th candidate model at layer \(l\)
\(X^{(l)}_i\) represents the input activations to layer \(l\) for model \(i\)
The formula computes a weighted combination that minimizes prediction discrepancy

Examples¶

CLI Usage¶

Configuration templates for RegMean:

config/method/regmean/clip_regmean.yaml

# =============================================================================
# FusionBench Method Configuration: RegMean (CLIP)
# =============================================================================
_target_: fusion_bench.method.RegMeanAlgorithmForCLIP
# list, regular expression of names of parameters that need to be excluded
exclude_param_names_regex: []
# numbers of examples to compute regmean weights
num_regmean_examples: 256
weight_transpose: true
# float, reduce non-diagonal elements in regmean weights by multiplying this scalar
reduce_non_diagonal_ratio: 0.95
dataloader_kwargs:
  batch_size: 32
  num_workers: 0

config/method/regmean/gpt2_regmean.yaml

# =============================================================================
# FusionBench Method Configuration: RegMean (GPT-2)
# =============================================================================
_target_: fusion_bench.method.RegMeanAlgorithmForGPT2
# list, regular expression of names of parameters that need to be excluded
exclude_param_names_regex: []
# numbers of examples to compute regmean weights
num_regmean_examples: 256
# float, reduce non-diagonal elements in regmean weights by multiplying this scalar
reduce_non_diagonal_ratio: 0.6
weight_transpose: false
cache_dir: outputs
batch_size: 32
num_workers: 0

CLIP Models¶

Merge CLIP-ViT-B/32 models on eight image classification tasks:

fusion_bench method=regmean/clip_regmean \
  modelpool=CLIPVisionModelPool/clip-vit-base-patch32_TA8 \
  taskpool=CLIPVisionModelTaskPool/clip-vit-classification_TA8

Merge CLIP-ViT-L/14 models on eight image classification tasks:

fusion_bench \
  method=regmean/clip_regmean \
    method.dataloader_kwargs.batch_size=8 \
  modelpool=CLIPVisionModelPool/clip-vit-large-patch14_TA8 \
  taskpool=CLIPVisionModelTaskPool/clip-vit-classification_TA8 \
    taskpool.base_model=openai/clip-vit-large-patch14

Language Models¶

Merge GPT-2 models for text classification tasks:

fusion_bench \
  method=regmean/gpt2_regmean \
  modelpool=gpt-2_glue \
  taskpool=gpt-2_glue

API Usage¶

To use RegMean programmatically:

CLIP Models¶

from fusion_bench.method.regmean import RegMeanAlgorithmForCLIP
from omegaconf import DictConfig

# Configuration for CLIP RegMean
config = DictConfig({
    'exclude_param_names_regex': [],
    'num_regmean_examples': 256,
    'weight_transpose': True,
    'reduce_non_diagonal_ratio': 0.95,
    'dataloader_kwargs': {
        'batch_size': 32,
        'num_workers': 0
    }
})

# Initialize the algorithm
algorithm = RegMeanAlgorithmForCLIP(**config)

# Run the algorithm with a model pool
merged_model = algorithm.run(modelpool)

GPT-2 Models¶

from fusion_bench.method.regmean import RegMeanAlgorithmForGPT2
from omegaconf import DictConfig

# Configuration for GPT-2 RegMean
config = DictConfig({
    'exclude_param_names_regex': [],
    'num_regmean_examples': 256,
    'reduce_non_diagonal_ratio': 0.6,
    'weight_transpose': False,
    'cache_dir': 'outputs',
    'batch_size': 32,
    'num_workers': 0
})

# Initialize the algorithm
algorithm = RegMeanAlgorithmForGPT2(**config)

# Run the algorithm
merged_model = algorithm.run(modelpool)

Implementation Details¶

Xisen Jin, et al. "Dataless Knowledge Fusion by Merging Weights of Language Models." http://arxiv.org/abs/2212.09849 ↩