Skip to content

MoE-based Model Merging

alt text

MoE-based model merging is a technique that combines multiple fine-tuned dense models into a single Mixture-of-Experts (MoE) model. This approach leverages the specialization of different expert models by treating each fine-tuned model as an expert in the resulting MoE architecture. The method involves upscaling the architecture to MoE format and substituting the experts with the weights from different specialized models.

Examples

API Usage

Basic Example

Here's an example demonstrating how to merge multiple fine-tuned models into a Mixtral MoE model:

from fusion_bench.method import (
    MixtralForCausalLMMergingAlgorithm,
    MixtralMoEMergingAlgorithm,
)
from fusion_bench.modelpool import CausalLMPool
from fusion_bench.utils import print_parameters

# Create a model pool with your fine-tuned expert models
model_pool = CausalLMPool(
    models={
        "_pretrained_": "path_to_base_model",
        "expert_1": "path_to_finetuned_model_1",
        "expert_2": "path_to_finetuned_model_2",
        "expert_3": "path_to_finetuned_model_3",
        "expert_4": "path_to_finetuned_model_4",
    },
    tokenizer="path_to_base_model",
    model_kwargs={"torch_dtype": "bfloat16"},
)


# Initialize the merging algorithm with direct parameters
merging_algorithm = MixtralForCausalLMMergingAlgorithm(
    experts_per_token=2,  # Number of experts to activate per token
    save_checkpoint=None  # Optional: path to save the merged model
)

# Run the merging process to get a MoE model
moe_model = merging_algorithm.run(model_pool)

print("Merged MoE model:")
print_parameters(moe_model)

# Save the merged MoE model
moe_model.save_pretrained("path_to_save_moe_model")

CLI Usage

This section provides a guide on how to use the fusion_bench command-line interface to merge models using MoE-based merging.

Configuration Files

Configuration template for the MoE merging method:

config/method/mixtral_moe_merging.yaml
name: mixtral_moe_upscaling # or "mixtral_for_causal_lm_moe_upscaling"
experts_per_token: 2
# path to save the upscaled model
save_checkpoint: null

Configuration template for the model pool:

config/modelpool/CausalLMPool/mixtral_moe_merging.yaml
_target_: fusion_bench.modelpool.CausalLMPool
models:
  _pretrained_: path_to_your_pretrained_model
  expert_1: path_to_your_expert_model_1
  expert_2: path_to_your_expert_model_2
  expert_3: path_to_your_expert_model_3
  expert_4: path_to_your_expert_model_4
tokenizer: ${.models._pretrained_}
model_kwargs:
  torch_dtype: bfloat16

Running MoE Merging

Run the fusion_bench command with MoE merging configuration:

fusion_bench \
    method=mixtral_moe_merging \
    modelpool=CausalLMPool/mixtral_moe_merging \
    taskpool=dummy # this evaluates parameter counts of the merged model

Implementation Details