MoE-based Model Merging¶

MoE-based model merging is a technique that combines multiple fine-tuned dense models into a single Mixture-of-Experts (MoE) model. This approach leverages the specialization of different expert models by treating each fine-tuned model as an expert in the resulting MoE architecture. The method involves upscaling the architecture to MoE format and substituting the experts with the weights from different specialized models.
Examples¶
API Usage¶
Basic Example¶
Here's an example demonstrating how to merge multiple fine-tuned models into a Mixtral MoE model:
from fusion_bench.method import (
MixtralForCausalLMMergingAlgorithm,
MixtralMoEMergingAlgorithm,
)
from fusion_bench.modelpool import CausalLMPool
from fusion_bench.utils import print_parameters
# Create a model pool with your fine-tuned expert models
model_pool = CausalLMPool(
models={
"_pretrained_": "path_to_base_model",
"expert_1": "path_to_finetuned_model_1",
"expert_2": "path_to_finetuned_model_2",
"expert_3": "path_to_finetuned_model_3",
"expert_4": "path_to_finetuned_model_4",
},
tokenizer="path_to_base_model",
model_kwargs={"torch_dtype": "bfloat16"},
)
# Initialize the merging algorithm with direct parameters
merging_algorithm = MixtralForCausalLMMergingAlgorithm(
experts_per_token=2, # Number of experts to activate per token
save_checkpoint=None # Optional: path to save the merged model
)
# Run the merging process to get a MoE model
moe_model = merging_algorithm.run(model_pool)
print("Merged MoE model:")
print_parameters(moe_model)
# Save the merged MoE model
moe_model.save_pretrained("path_to_save_moe_model")
CLI Usage¶
This section provides a guide on how to use the fusion_bench
command-line interface to merge models using MoE-based merging.
Configuration Files¶
Configuration template for the MoE merging method:
name: mixtral_moe_upscaling # or "mixtral_for_causal_lm_moe_upscaling"
experts_per_token: 2
# path to save the upscaled model
save_checkpoint: null
Configuration template for the model pool:
_target_: fusion_bench.modelpool.CausalLMPool
models:
_pretrained_: path_to_your_pretrained_model
expert_1: path_to_your_expert_model_1
expert_2: path_to_your_expert_model_2
expert_3: path_to_your_expert_model_3
expert_4: path_to_your_expert_model_4
tokenizer: ${.models._pretrained_}
model_kwargs:
torch_dtype: bfloat16
Running MoE Merging¶
Run the fusion_bench command with MoE merging configuration:
fusion_bench \
method=mixtral_moe_merging \
modelpool=CausalLMPool/mixtral_moe_merging \
taskpool=dummy # this evaluates parameter counts of the merged model