Depth Upscaling¶
Depth Upscaling is a model scaling technique that increases the depth (number of layers) of neural networks by duplicating and concatenating existing layers1. This approach has been shown to be effective for scaling large language models, as demonstrated in the SOLAR 10.7B model.
Usage¶
The DepthUpscalingAlgorithm
is used to upscale the depth of PyTorch models. Here's a basic guide on how to use it:
First, import the necessary modules:
from omegaconf import DictConfig
from torch import nn
from fusion_bench.method.depth_upscaling import DepthUpscalingAlgorithm
Create an instance of DepthUpscalingAlgorithm
by passing the layer indices directly to the constructor.
The layer indices determine the upscaling pattern.
Assume we have a list of PyTorch models (nn.ModuleList
instances) that we want to upscale. Here, we're creating a list of linear models as an example:
Then, we can pass the layers to the run
method of our algorithm:
The run
method will return an upscaled model. The type of the returned model will be the same as the input models (in this case, nn.ModuleList
), and its length will be determined by the layer indices specified in the configuration.
Layer Index Patterns¶
The layer_indices
parameter supports flexible specifications:
- Integer indices: Direct layer references (0-indexed)
- String expressions: Python expressions that evaluate to lists of integers
- Mixed patterns: Combination of integers and strings
# Example patterns:
# [0, 1, 1, 0] - Use layers 0, 1, 1, 0 (4 layers total)
# ["range(0,12)", "range(6,12)"] - First 12 layers + layers 6-11 (18 layers total)
# [0, 2, 4, "range(6,12)"] - Layers 0, 2, 4, then layers 6-11 (9 layers total)
Examples¶
Basic Example¶
Here's a simple example of depth upscaling with basic layers:
from torch import nn
from fusion_bench.method.depth_upscaling import DepthUpscalingAlgorithm
# Create a simple model with 4 layers
model = nn.ModuleList([
nn.Linear(100, 100),
nn.ReLU(),
nn.Linear(100, 50),
nn.Linear(50, 10)
])
# Create upscaling algorithm with specific pattern
# This will create: layer0, layer1, layer2, layer1, layer3
algorithm = DepthUpscalingAlgorithm(layer_indices=[0, 1, 2, 1, 3])
# Apply depth upscaling
upscaled_model = algorithm.run(model)
print(f"Original model layers: {len(model)}")
print(f"Upscaled model layers: {len(upscaled_model)}")
# Outputs:
# Original model layers: 4
# Upscaled model layers: 5
SOLAR-style Mistral Model Upscaling¶
Here we provide an example of how to use the DepthUpscalingAlgorithm
to upscale the depth of a Mistral model 1.

from omegaconf import DictConfig
from torch import nn
from transformers import AutoModelForCausalLM, MistralConfig, MistralForCausalLM
from fusion_bench.method.depth_upscaling import DepthUpscalingAlgorithm
# create a Mistral model
# here we randomly initialize the model for demonstration purposes
# in practice, you would load a pretrained model
model_config = MistralConfig(
# https://huggingface.co/mistralai/Mistral-7B-v0.1/resolve/main/config.json
**{
"architectures": ["MistralForCausalLM"],
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 32768,
"model_type": "mistral",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-05,
"rope_theta": 10000.0,
"sliding_window": 4096,
"tie_word_embeddings": False,
"torch_dtype": "bfloat16",
"transformers_version": "4.34.0.dev0",
"use_cache": True,
"vocab_size": 32000,
}
)
print('creating model')
model: MistralForCausalLM = AutoModelForCausalLM.from_config(model_config)
# Initialize the algorithm with layer indices
algorithm = DepthUpscalingAlgorithm(layer_indices=["range(0,24)", "range(8,32)"])
print('upscaling model')
upscaled_model = algorithm.run(model.model.layers)
# substitute the model with the upscaled model
model.model.layers = upscaled_model
CLI Usage¶
The DepthUpscalingAlgorithm
is integrated into the fusion_bench
package. You can use it by specifying "depth_upscaling"
as the method name in the command line or configuration file.
_target_: DepthUpscalingAlgorithm
# this should be a list of integers or string, indicating the sequence of layers.
# If the entry is an integer, it will use the n-th layer of the model.
# If the entry is a string, it will use the layers specified by the string.
# The string should be a valid python expression that evaluates to a list of integers.
# for example, ["range(0,12)", "range(6,12)"] will use the first 12 layers and the last 6 layers of the model to construct the new model
# [0, 2, 4, "range(6,12)"] will use the 1st, 3rd, 5th, and the 7th to 12th layers of the model to construct the new model
layer_indices: null
You can then run the fusion_bench
command with the specified configuration file: