MoE-based Model Merging¶
 
MoE-based model merging is a technique that combines multiple fine-tuned dense models into a single Mixture-of-Experts (MoE) model. This approach leverages the specialization of different expert models by treating each fine-tuned model as an expert in the resulting MoE architecture. The method involves upscaling the architecture to MoE format and substituting the experts with the weights from different specialized models.
Examples¶
API Usage¶
Basic Example¶
Here's an example demonstrating how to merge multiple fine-tuned models into a Mixtral MoE model:
from fusion_bench.method import (
    MixtralForCausalLMMergingAlgorithm,
    MixtralMoEMergingAlgorithm,
)
from fusion_bench.modelpool import CausalLMPool
from fusion_bench.utils import print_parameters
# Create a model pool with your fine-tuned expert models
model_pool = CausalLMPool(
    models={
        "_pretrained_": "path_to_base_model",
        "expert_1": "path_to_finetuned_model_1",
        "expert_2": "path_to_finetuned_model_2",
        "expert_3": "path_to_finetuned_model_3",
        "expert_4": "path_to_finetuned_model_4",
    },
    tokenizer="path_to_base_model",
    model_kwargs={"torch_dtype": "bfloat16"},
)
# Initialize the merging algorithm with direct parameters
merging_algorithm = MixtralForCausalLMMergingAlgorithm(
    experts_per_token=2,  # Number of experts to activate per token
    save_checkpoint=None  # Optional: path to save the merged model
)
# Run the merging process to get a MoE model
moe_model = merging_algorithm.run(model_pool)
print("Merged MoE model:")
print_parameters(moe_model)
# Save the merged MoE model
moe_model.save_pretrained("path_to_save_moe_model")
CLI Usage¶
This section provides a guide on how to use the fusion_bench command-line interface to merge models using MoE-based merging.
Configuration Files¶
Configuration template for the MoE merging method:
# =============================================================================
# FusionBench Method Configuration: Mixtral MoE Merging/Upscaling
# =============================================================================
name: mixtral_moe_upscaling # or "mixtral_for_causal_lm_moe_upscaling"
experts_per_token: 2
# path to save the upscaled model
save_checkpoint: null
Configuration template for the model pool:
_target_: fusion_bench.modelpool.CausalLMPool
models:
  _pretrained_: path_to_your_pretrained_model
  expert_1: path_to_your_expert_model_1
  expert_2: path_to_your_expert_model_2
  expert_3: path_to_your_expert_model_3
  expert_4: path_to_your_expert_model_4
tokenizer: ${.models._pretrained_}
model_kwargs:
  torch_dtype: bfloat16
Running MoE Merging¶
Run the fusion_bench command with MoE merging configuration:
fusion_bench \
    method=mixtral_moe_merging \
    modelpool=CausalLMPool/mixtral_moe_merging \
    taskpool=dummy # this evaluates parameter counts of the merged model