Merge Large Language Models using SLERP¶

This tutorial demonstrates how to merge Large Language Models (LLMs) using the SLERP (Spherical Linear Interpolation) ¹ algorithm - a sophisticated model fusion technique that interpolates between two models along the surface of a high-dimensional sphere, preserving the geometric properties of the parameter space.

SLERP is particularly effective for merging language models because it maintains the angular relationships between model parameters, which can be crucial for preserving semantic representations and learned behaviors. Unlike simple linear interpolation (LERP), SLERP follows a curved path on the sphere's surface, ensuring consistent interpolation speed and avoiding potential distortions.

🔧 Standalone YAML Configuration¶

This example uses the following configuration that demonstrates merging LLMs using SLERP:

config/_get_started/llm_slerp.yaml
_target_: fusion_bench.programs.FabricModelFusionProgram
_recursive_: false
method:
  _target_: fusion_bench.method.SlerpForCausalLM
  t: 0.5
modelpool:
  _target_: fusion_bench.modelpool.CausalLMPool
  models:
    model_1: ibivibiv/alpaca-dragon-72b-v1
    model_2: moreh/MoMo-72B-lora-1.8.7-DPO
  tokenizer: ibivibiv/alpaca-dragon-72b-v1
  enable_lazy_loading: true # load model as LazyStateDict

Program Configuration: Specifies FabricModelFusionProgram to handle the fusion workflow

Method Configuration: Uses SlerpForCausalLM with a scaling factor (t parameter), whose default value is set as 0.5. The option names in the configuration file are the same as those in the code.

SlerpForCausalLM.__init__()

Initialize the SlerpForCausalLM algorithm.

Parameters:

t (float) –

The interpolation parameter. Must be in the range [0, 1]. t=0 returns the first model, t=1 returns the second model, t=0.5 provides balanced interpolation.
DOT_THRESHOLD (float, default: 0.9995 ) –

The threshold for the dot product of normalized vectors. When the absolute dot product exceeds this threshold, vectors are considered nearly collinear and linear interpolation (LERP) is used instead of SLERP for numerical stability. Defaults to 0.9995.
epsilon (float, default: 1e-08 ) –

Small value used for numerical stability to avoid division by zero during vector normalization. Defaults to 1e-8.
model_save_path (Optional[str], default: None ) –

Path where the merged model should be saved. If None, the model is not saved to disk. Defaults to None.
show_pbar (bool, default: False ) –

Whether to display a progress bar during the interpolation process. Useful for debugging or monitoring progress with large models. Defaults to False.
**kwargs –

Additional keyword arguments passed to the parent BaseAlgorithm class.

Source code in fusion_bench/method/slerp/slerp.py

def __init__(
    self,
    t: float,
    DOT_THRESHOLD: float = 0.9995,
    epsilon: float = 1e-8,
    model_save_path: Optional[str] = None,
    show_pbar: bool = False,
    **kwargs,
):
    """
    Initialize the SlerpForCausalLM algorithm.

    Args:
        t (float): The interpolation parameter. Must be in the range [0, 1].
                  t=0 returns the first model, t=1 returns the second model,
                  t=0.5 provides balanced interpolation.
        DOT_THRESHOLD (float, optional): The threshold for the dot product of normalized vectors.
                                       When the absolute dot product exceeds this threshold,
                                       vectors are considered nearly collinear and linear
                                       interpolation (LERP) is used instead of SLERP for
                                       numerical stability. Defaults to 0.9995.
        epsilon (float, optional): Small value used for numerical stability to avoid
                                 division by zero during vector normalization.
                                 Defaults to 1e-8.
        model_save_path (Optional[str], optional): Path where the merged model should be saved.
                                                 If None, the model is not saved to disk.
                                                 Defaults to None.
        show_pbar (bool, optional): Whether to display a progress bar during the interpolation
                                  process. Useful for debugging or monitoring progress with
                                  large models. Defaults to False.
        **kwargs: Additional keyword arguments passed to the parent BaseAlgorithm class.
    """
    super().__init__(**kwargs)

Model Pool: Contains exactly two LLMs to be merged using spherical interpolation

🚀 Running the Example¶

Execute the SLERP fusion with the following command:

fusion_bench --config-path $PWD/config/_get_started --config-name llm_slerp

Hyperparameter Tuning¶

You can experiment with different interpolation factors by overriding the configuration:

# Favor the first model more (closer to model_1)
fusion_bench --config-path $PWD/config/_get_started --config-name llm_slerp \
    method.t=0.3

# Balanced interpolation (default)
fusion_bench --config-path $PWD/config/_get_started --config-name llm_slerp \
    method.t=0.5

# Favor the second model more (closer to model_2)
fusion_bench --config-path $PWD/config/_get_started --config-name llm_slerp \
    method.t=0.7

🐛 Debugging Configuration (VS Code)¶

.vscode/launch.json

{
    "name": "llm_slerp",
    "type": "debugpy",
    "request": "launch",
    "module": "fusion_bench.scripts.cli",
    "args": [
        "--config-path",
        "${workspaceFolder}/config/_get_started",
        "--config-name",
        "llm_slerp"
    ],
    "console": "integratedTerminal",
    "justMyCode": true,
    "env": {
        "HYDRA_FULL_ERROR": "1"
    }
}

SLERP For Model Merging – A Primer https://www.coinfeeds.ai/ai-blog/slerp-model-merging-primer ↩