Merge Large Language Models using SLERP¶
This tutorial demonstrates how to merge Large Language Models (LLMs) using the SLERP (Spherical Linear Interpolation) 1 algorithm - a sophisticated model fusion technique that interpolates between two models along the surface of a high-dimensional sphere, preserving the geometric properties of the parameter space.
SLERP is particularly effective for merging language models because it maintains the angular relationships between model parameters, which can be crucial for preserving semantic representations and learned behaviors. Unlike simple linear interpolation (LERP), SLERP follows a curved path on the sphere's surface, ensuring consistent interpolation speed and avoiding potential distortions.
🔧 Standalone YAML Configuration¶
This example uses the following configuration that demonstrates merging LLMs using SLERP:
- Program Configuration: Specifies
FabricModelFusionProgram
to handle the fusion workflow -
Method Configuration: Uses
SlerpForCausalLM
with a scaling factor (t parameter), whose default value is set as 0.5. The option names in the configuration file are the same as those in the code.Initialize the SlerpForCausalLM algorithm.
Parameters:
-
t
(float
) –The interpolation parameter. Must be in the range [0, 1]. t=0 returns the first model, t=1 returns the second model, t=0.5 provides balanced interpolation.
-
DOT_THRESHOLD
(float
, default:0.9995
) –The threshold for the dot product of normalized vectors. When the absolute dot product exceeds this threshold, vectors are considered nearly collinear and linear interpolation (LERP) is used instead of SLERP for numerical stability. Defaults to 0.9995.
-
epsilon
(float
, default:1e-08
) –Small value used for numerical stability to avoid division by zero during vector normalization. Defaults to 1e-8.
-
model_save_path
(Optional[str]
, default:None
) –Path where the merged model should be saved. If None, the model is not saved to disk. Defaults to None.
-
show_pbar
(bool
, default:False
) –Whether to display a progress bar during the interpolation process. Useful for debugging or monitoring progress with large models. Defaults to False.
-
**kwargs
–Additional keyword arguments passed to the parent BaseAlgorithm class.
Source code in
fusion_bench/method/slerp/slerp.py
-
-
Model Pool: Contains exactly two LLMs to be merged using spherical interpolation
🚀 Running the Example¶
Execute the SLERP fusion with the following command:
Hyperparameter Tuning¶
You can experiment with different interpolation factors by overriding the configuration:
# Favor the first model more (closer to model_1)
fusion_bench --config-path $PWD/config/_get_started --config-name llm_slerp \
method.t=0.3
# Balanced interpolation (default)
fusion_bench --config-path $PWD/config/_get_started --config-name llm_slerp \
method.t=0.5
# Favor the second model more (closer to model_2)
fusion_bench --config-path $PWD/config/_get_started --config-name llm_slerp \
method.t=0.7
🐛 Debugging Configuration (VS Code)¶
{
"name": "llm_slerp",
"type": "debugpy",
"request": "launch",
"module": "fusion_bench.scripts.cli",
"args": [
"--config-path",
"${workspaceFolder}/config/_get_started",
"--config-name",
"llm_slerp"
],
"console": "integratedTerminal",
"justMyCode": true,
"env": {
"HYDRA_FULL_ERROR": "1"
}
}
-
SLERP For Model Merging – A Primer https://www.coinfeeds.ai/ai-blog/slerp-model-merging-primer ↩