Model Mixing¶
Layer-level Mixing¶
Depth Upscaling¶
DepthUpscalingAlgorithm
¶
Bases: BaseAlgorithm
Implements the Depth Upscaling Algorithm.
- Kim et al. SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling. http://arxiv.org/abs/2312.15166
This class extends the BaseModelFusionAlgorithm
to handle depth upscaling of models.
It supports upscaling the depth of a model by duplicating specified layers.
Parameters:
-
layer_indices
(list
) –List of layer indices to duplicate.
-
**kwargs
–Additional keyword arguments.
Source code in fusion_bench/method/depth_upscaling/depth_upscaling.py
run(modelpool)
¶
Executes the depth upscaling algorithm on a given model pool.
This method checks the type of the model pool, ensures that it contains only one model, and verifies that the model is an instance of nn.ModuleList
.
Parameters:
-
modelpool
(ModuleList | ModelPool
) –The pool of models to upscale. Must contain only one model.
Returns:
-
ModuleList
–nn.ModuleList: The upscaled model.
Raises:
-
AssertionError
–If the model pool contains more than one model or if the model is not an instance of
nn.ModuleList
. -
ValueError
–If an invalid layer specification is provided in the configuration.
Source code in fusion_bench/method/depth_upscaling/depth_upscaling.py
DepthUpscalingForLlama
¶
Bases: DepthUpscalingAlgorithm
Implements depth upscaling for Llama models.
This class extends the DepthUpscalingAlgorithm to handle Llama models specifically. It supports saving the upscaled model to a specified path.
Parameters:
-
layer_indices
(list
) –List of layer indices to upscale.
-
model_save_path
(Optional[str]
) –Path to save the upscaled model.
-
**kwargs
–Additional keyword arguments.
Source code in fusion_bench/method/depth_upscaling/depth_upscaling_for_llama.py
run(modelpool)
¶
Executes the depth upscaling algorithm on a given model pool.
This method loads the pretrained model or the first model in the pool, applies the depth upscaling algorithm, and updates the number of hidden layers in the model configuration. If a save path is provided, it saves the upscaled model and tokenizer to the specified path.
Parameters:
-
modelpool
(CausalLMPool
) –The pool of models to upscale.
Returns:
-
CausalLM
–The upscaled model.
Source code in fusion_bench/method/depth_upscaling/depth_upscaling_for_llama.py
Model Recombination¶
ModelRecombinationAlgorithm
¶
Bases: BaseAlgorithm
Model recombination recombinates the layers of the given models, to create a new set of models.
Source code in fusion_bench/method/model_recombination.py
run(modelpool, return_modelpool=True)
¶
Executes the model recombination algorithm on a given model pool.
This method loads models from the model pool, determines their type, and applies the appropriate recombination method.
It then creates a new model pool with the recombined models. Depending on the return_modelpool
flag, it either returns
the entire new model pool or just the first model from it.
- If the models in the model pool are of type
nn.ModuleList
, the recombination methodrecombine_modellist
is used. Where each module in the list is shuffled across the models. - If the models are of type
nn.ModuleDict
, the recombination methodrecombine_modeldict
is used. Where each module in the dictionary is shuffled across the models. - If the models are of type
nn.Module
, the recombination methodrecombine_state_dict
is used. Where the state dictionaries of the models are shuffled across the models.
Parameters:
-
modelpool
(BaseModelPool
) –The pool of models to recombine.
-
return_modelpool
(bool
, default:True
) –Flag indicating whether to return the entire model pool or just the first model. Defaults to True. If this algorithm is initialized with config, the value of
return_modelpool
in the config will be used and this argument passed to the method will be ignored.
Returns:
-
Union[Module, BaseModelPool]
–Union[nn.Module, BaseModelPool]: The recombined model pool or the first model from the recombined pool, depending on the
return_modelpool
flag.
Raises:
-
ValueError
–If the models in the model pool are of an unsupported type.
Source code in fusion_bench/method/model_recombination.py
MoE-based Mixing¶
MoE Upscaling¶
MixtralUpscalingAlgorithm
¶
Bases: BaseAlgorithm
This class is responsible for upscaling a model to a MixtralModel. It inherits from the ModelFusionAlgorithm class.
Source code in fusion_bench/method/mixture_of_experts/mixtral_upcycling.py
160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 |
|
__init__(num_experts, experts_per_token, save_checkpoint, **kwargs)
¶
Initialize the MixtralUpscalingAlgorithm.
Parameters:
-
num_experts
(int
) –The number of experts in the Mixtral model.
-
experts_per_token
(int
) –The number of experts per token.
-
save_checkpoint
(str
) –The path to save the checkpoint.
-
**kwargs
–Additional keyword arguments.
Source code in fusion_bench/method/mixture_of_experts/mixtral_upcycling.py
run(modelpool)
¶
Runs the upscaling process.
Parameters:
-
modelpool
(ModelPool | LlamaModel | MistralModel
) –The model to be upscaled.
Returns:
-
MixtralModel
(MixtralModel
) –The upscaled model.
Source code in fusion_bench/method/mixture_of_experts/mixtral_upcycling.py
MixtralForCausalLMUpscalingAlgorithm
¶
Bases: BaseAlgorithm
This class is responsible for upscaling a model to a MixtralForCausalLM. It inherits from the ModelFusionAlgorithm class.
Source code in fusion_bench/method/mixture_of_experts/mixtral_upcycling.py
245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 |
|
__init__(num_experts, experts_per_token, save_checkpoint, **kwargs)
¶
Initialize the MixtralForCausalLMUpscalingAlgorithm.
Parameters:
-
num_experts
(int
) –The number of experts in the Mixtral model.
-
experts_per_token
(int
) –The number of experts per token.
-
save_checkpoint
(str
) –The path to save the checkpoint.
-
**kwargs
–Additional keyword arguments.
Source code in fusion_bench/method/mixture_of_experts/mixtral_upcycling.py
run(modelpool)
¶
Runs the upscaling process.
Parameters:
-
modelpool
(ModelPool | LlamaForCausalLM | MistralForCausalLM
) –The model to be upscaled.
Returns:
-
MixtralForCausalLM
(MixtralForCausalLM
) –The upscaled model.
Source code in fusion_bench/method/mixture_of_experts/mixtral_upcycling.py
MixtralMoEMergingAlgorithm
¶
Bases: MixtralUpscalingAlgorithm
This class is responsible for merging models into a MixtralModel.
Source code in fusion_bench/method/mixture_of_experts/mixtral_merging.py
run(modelpool)
¶
Runs the merging process.
Parameters:
-
modelpool
(ModelPool
) –The pool of models to be merged. Each model in the pool will be treated as an expert, and should be a
MistralModel
orLlamaModel
.
Returns:
-
MixtralModel
(MixtralModel
) –The merged model.
Source code in fusion_bench/method/mixture_of_experts/mixtral_merging.py
MixtralForCausalLMMergingAlgorithm
¶
Bases: MixtralForCausalLMUpscalingAlgorithm
This class is responsible for merging models into a MixtralForCausalLM
.
Source code in fusion_bench/method/mixture_of_experts/mixtral_merging.py
run(modelpool)
¶
Runs the merging process. It first upscales the models to MixtralForCausalLM, then substitutes the experts of the MixtralForCausalLM with the models from the modelpool.
Parameters:
-
modelpool
(ModelPool
) –The pool of models to be merged. Each model in the pool will be treated as an expert, and should be a
MistralForCausalLM
orLlamaForCausalLM
.
Returns:
-
MixtralForCausalLM
(MixtralForCausalLM
) –The merged model.
Source code in fusion_bench/method/mixture_of_experts/mixtral_merging.py
Weight-Ensembling Mixture of Experts (WE-MoE)¶
CLIPWeightEnsemblingMoEAlgorithm
¶
Bases: WeightEnsemblingMoEAlgorithm
, CLIPClassificationMixin
CLIPWeightEnsemblingMoEAlgorithm is a class that implements the WeightEnsemblingMoEAlgorithm for CLIP models. It extends the WeightEnsemblingMoEAlgorithm and CLIPClassificationMixin classes.
Attributes:
-
modelpool
(CLIPVisionModelPool
) –The model pool containing the CLIP models.
Source code in fusion_bench/method/we_moe/clip_we_moe.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 |
|
compute_logits(module, batch, task)
¶
Compute the logits for the given batch and task.
Parameters:
-
module
–The model module.
-
batch
–The input batch.
-
task
–The task name.
Returns:
-
Tensor
(Tensor
) –The computed logits.
Source code in fusion_bench/method/we_moe/clip_we_moe.py
construct_moe_model()
¶
Construct the Mixture of Experts (MoE) model using the models in the model pool.
Returns:
-
WeightEnsemblingMoE
(WeightEnsemblingMoE
) –The constructed MoE model.
Source code in fusion_bench/method/we_moe/clip_we_moe.py
get_shuffled_test_loader_iter(tta_dataset)
cached
¶
Get an iterator for the shuffled test data loader.
Parameters:
-
tta_dataset
(str
) –The name of the test-time adaptation dataset.
Returns:
-
Iterator
–An iterator for the shuffled test data loader.
Source code in fusion_bench/method/we_moe/clip_we_moe.py
load_checkpoint(model, checkpoint)
¶
Load the checkpoint file.
Parameters:
-
model
–The model to load the checkpoint into.
-
checkpoint
–The path to the checkpoint file.
Source code in fusion_bench/method/we_moe/clip_we_moe.py
on_test_time_adaptation_start()
¶
Load the CLIP processor and construct the zero-shot classification head for each task.
save_checkpoint(model, checkpoint)
¶
Save the checkpoint file.
Parameters:
-
model
–The model to save the checkpoint from.
-
checkpoint
–The path to the checkpoint file.
Source code in fusion_bench/method/we_moe/clip_we_moe.py
Sparse WE-MoE¶
SparseWeightEnsemblingMoEAlgorithm
¶
Bases: ModelFusionAlgorithm
Source code in fusion_bench/method/sparse_we_moe/sparse_we_moe.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 |
|
__init__(algorithm_config)
¶
Initialize the SparseWeightEnsemblingMoEAlgorithm with the given configuration.
Parameters:
-
algorithm_config
(DictConfig
) –The configuration for the algorithm.
Source code in fusion_bench/method/sparse_we_moe/sparse_we_moe.py
compute_logits(module, batch, task)
abstractmethod
¶
Compute the logits for a given batch and task.
Parameters:
-
module
(Module
) –The model module.
-
batch
(Any
) –The input batch.
-
task
(str
) –The task for which to compute the logits.
Returns:
-
Tensor
(Tensor
) –The computed logits.
Source code in fusion_bench/method/sparse_we_moe/sparse_we_moe.py
construct_moe_model()
abstractmethod
¶
Construct the Mixture of Experts model using the models in the model pool.
Returns:
-
SparseWeightEnsemblingMoE
(SparseWeightEnsemblingMoE
) –The constructed Mixture of Experts model.
Source code in fusion_bench/method/sparse_we_moe/sparse_we_moe.py
construct_moe_model_sharedgate()
abstractmethod
¶
Construct the Mixture of Experts model using the models in the model pool.
Returns:
-
SparseWeightEnsemblingMoE_ShardGate
(SparseWeightEnsemblingMoE_ShardGate
) –The constructed Mixture of Experts model with shared gate.
Source code in fusion_bench/method/sparse_we_moe/sparse_we_moe.py
construct_post_spare_gate_model(moe_model, gate_prune_ratio)
¶
Construct a (post) sparse gated model.
Parameters:
-
moe_model
(SparseWeightEnsemblingMoE
) –The Mixture of Experts model.
-
gate_prune_ratio
(float
) –The ratio of parameters to prune in the gate.
Returns:
-
SparseWeightEnsemblingMoE
–The constructed (post) sparse gated model.
Source code in fusion_bench/method/sparse_we_moe/sparse_we_moe.py
dynamic_prune(module, prune_ratio)
¶
Dynamically prune the parameters of a module based on the given prune ratio.
Parameters:
-
module
(Module
) –The module to prune.
-
prune_ratio
(float
) –The ratio of parameters to prune.
Source code in fusion_bench/method/sparse_we_moe/sparse_we_moe.py
get_shuffled_test_loader_iter(task)
abstractmethod
¶
Get an iterator for the shuffled test DataLoader for a specific task.
Parameters:
-
task
(str
) –The task for which to get the DataLoader iterator.
Returns:
-
DataLoader
(DataLoader
) –The DataLoader iterator for the specified task.
Source code in fusion_bench/method/sparse_we_moe/sparse_we_moe.py
l1_regularization(module, l1_lambda)
¶
Compute the L1 regularization loss for a module.
Parameters:
-
module
(Module
) –The module for which to compute the L1 regularization loss.
-
l1_lambda
(float
) –The L1 regularization coefficient.
Returns:
-
Tensor
–The L1 regularization loss.
Source code in fusion_bench/method/sparse_we_moe/sparse_we_moe.py
load_checkpoint(model, checkpoint)
abstractmethod
¶
Load the checkpoint file.
Parameters:
-
model
(Module
) –The model to load the checkpoint into.
-
checkpoint
(str
) –The path to the checkpoint file.
Source code in fusion_bench/method/sparse_we_moe/sparse_we_moe.py
on_test_time_adaptation_start()
¶
run(modelpool)
¶
Run the SparseWeightEnsemblingMoEAlgorithm with the given model pool.
Parameters:
-
modelpool
(BaseModelPool
) –The model pool to use for the algorithm.
Returns:
-
SparseWeightEnsemblingMoE
–The final Mixture of Experts model.
Source code in fusion_bench/method/sparse_we_moe/sparse_we_moe.py
save_checkpoint(model, checkpoint)
abstractmethod
¶
Save the checkpoint file.
Parameters:
-
model
(Module
) –The model to save the checkpoint from.
-
checkpoint
(str
) –The path to the checkpoint file.
Source code in fusion_bench/method/sparse_we_moe/sparse_we_moe.py
test_time_adaptation(module)
¶
Perform test-time adaptation for the given module.
Parameters:
-
module
(SparseWeightEnsemblingMoE
) –The module to adapt.
Returns:
-
SparseWeightEnsemblingMoE
–The adapted module.
Source code in fusion_bench/method/sparse_we_moe/sparse_we_moe.py
SparseCLIPWeightEnsemblingMoEAlgorithm
¶
Bases: SparseWeightEnsemblingMoEAlgorithm
, CLIPClassificationMixin
Source code in fusion_bench/method/sparse_we_moe/sparse_clip_we_moe.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 |
|
compute_logits(module, batch, task)
¶
Compute the logits for the given batch and task.
Parameters:
-
module
(CLIPVisionModel
) –The vision model to use for computing logits.
-
batch
(Tuple[Tensor, Tensor]
) –The batch of data.
-
task
(str
) –The task for which to compute logits.
Returns:
-
Tensor
(Tensor
) –The computed logits.
Source code in fusion_bench/method/sparse_we_moe/sparse_clip_we_moe.py
construct_moe_model()
¶
Construct the Mixture of Experts model using the models in the model pool.
Source code in fusion_bench/method/sparse_we_moe/sparse_clip_we_moe.py
construct_moe_model_sharedgate()
¶
Construct the Mixture of Experts model using the models in the model pool with a shared gate.
Source code in fusion_bench/method/sparse_we_moe/sparse_clip_we_moe.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 |
|
get_shuffled_test_loader_iter(tta_dataset)
cached
¶
Get an iterator for the shuffled test data loader.
Source code in fusion_bench/method/sparse_we_moe/sparse_clip_we_moe.py
load_checkpoint(model, checkpoint)
¶
on_test_time_adaptation_start()
¶
Here we load the CLIP processor and construct the zero-shot classification head for each task.
Rank-One MoE¶
RankOneMoEAlgorithm
¶
Bases: ModelFusionAlgorithm
Algorithm for fusing models using RankOne-MoE (https://github.com/EnnengYang/RankOne-MoE).
This class provides methods for constructing the MoE model, performing test-time adaptation, and running the fusion process.
Attributes:
-
_fabric
(Fabric
) –The fabric for distributed training.
-
modelpool
(ModelPool
) –The pool of models to be fused.
-
profiler
(SimpleProfiler
) –The profiler for measuring performance.
Source code in fusion_bench/method/rankone_moe/rankone_moe.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 |
|
__init__(algorithm_config)
¶
Initialize the RankOneMoEAlgorithm with the given configuration.
Parameters:
-
algorithm_config
(DictConfig
) –The configuration for the algorithm.
Source code in fusion_bench/method/rankone_moe/rankone_moe.py
compute_logits(module, batch, task)
abstractmethod
¶
Compute the logits for a given batch and task.
Parameters:
-
module
–The model module to use for computing logits.
-
batch
–The batch of data.
-
task
–The task for which to compute logits.
Returns:
-
Tensor
(Tensor
) –The computed logits.
Source code in fusion_bench/method/rankone_moe/rankone_moe.py
construct_moe_model()
abstractmethod
¶
Construct the Mixture of Experts model using the models in the model pool.
Returns:
-
RankOneMoE
–RankOne-MoE: The constructed MoE model.
get_shuffled_test_loader_iter(task)
abstractmethod
¶
Get an iterator for the shuffled test data loader for a specific task.
Parameters:
-
task
(str
) –The task for which to get the test data loader.
Returns:
-
DataLoader
(DataLoader
) –The shuffled test data loader iterator.
Source code in fusion_bench/method/rankone_moe/rankone_moe.py
load_checkpoint(model, checkpoint)
abstractmethod
¶
Load the checkpoint file.
Parameters:
-
model
–The model to load the checkpoint into.
-
checkpoint
–The checkpoint file to load.
on_test_time_adaptation_start()
¶
run(modelpool)
¶
Run the RankOneMoEAlgorithm to fuse models using RankOne-MoE.
Parameters:
-
modelpool
(ModelPool
) –The pool of models to be fused.
Returns:
-
–
RankOne-MoE: The fused RankOne MoE model.
Source code in fusion_bench/method/rankone_moe/rankone_moe.py
save_checkpoint(model, checkpoint)
abstractmethod
¶
Save the checkpoint file.
Parameters:
-
model
–The model to save the checkpoint from.
-
checkpoint
–The checkpoint file to save.
test_time_adaptation(module)
¶
Perform test-time adaptation for the given module.
Parameters:
-
module
(RankOne - MoE
) –The MoE module to adapt.
Returns:
-
–
RankOne-MoE: The adapted MoE module.
Source code in fusion_bench/method/rankone_moe/rankone_moe.py
CLIPRankOneMoEAlgorithm
¶
Bases: RankOneMoEAlgorithm
, CLIPClassificationMixin
CLIPRankOneMoEAlgorithm is a class that implements the RankOneMoEAlgorithm (https://github.com/EnnengYang/RankOne-MoE) for CLIP models. It extends the RankOneMoEAlgorithm and CLIPClassificationMixin classes.
Attributes:
-
modelpool
(CLIPVisionModelPool
) –The model pool containing the CLIP models.
Source code in fusion_bench/method/rankone_moe/clip_rankone_moe.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 |
|
compute_logits(module, batch, task)
¶
Compute the logits for the given batch and task.
Parameters:
-
module
–The model module.
-
batch
–The input batch.
-
task
–The task name.
Returns:
-
Tensor
(Tensor
) –The computed logits.
Source code in fusion_bench/method/rankone_moe/clip_rankone_moe.py
construct_moe_model()
¶
Construct the RankOne-MoE model using the models in the model pool.
Returns:
-
RankOneMoE
–RankOne-MoE: The constructed MoE model.
Source code in fusion_bench/method/rankone_moe/clip_rankone_moe.py
get_shuffled_test_loader_iter(tta_dataset)
cached
¶
Get an iterator for the shuffled test data loader.
Parameters:
-
tta_dataset
(str
) –The name of the test-time adaptation dataset.
Returns:
-
Iterator
–An iterator for the shuffled test data loader.
Source code in fusion_bench/method/rankone_moe/clip_rankone_moe.py
load_checkpoint(model, checkpoint)
¶
Load the checkpoint file.
Parameters:
-
model
–The model to load the checkpoint into.
-
checkpoint
–The path to the checkpoint file.
Source code in fusion_bench/method/rankone_moe/clip_rankone_moe.py
on_test_time_adaptation_start()
¶
Load the CLIP processor and construct the zero-shot classification head for each task.
save_checkpoint(model, checkpoint)
¶
Save the checkpoint file.
Parameters:
-
model
–The model to save the checkpoint from.
-
checkpoint
–The path to the checkpoint file.
Source code in fusion_bench/method/rankone_moe/clip_rankone_moe.py
Smile Upscaling¶
SmileUpscalingAlgorithm
¶
Bases: SimpleProfilerMixin
, BaseAlgorithm
Source code in fusion_bench/method/smile_upscaling/smile_upscaling.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 |
|
__init__(*, device='cuda', upscaling_accelerator=None, full_matrices=True, gate_k=256, k=256, top_k=1, routing_use_diff=True, average_experts=False, model_path=None, **kwargs)
¶
Initialize the SmileUpscalingAlgorithm.
Parameters:
-
device
(str
, default:'cuda'
) –The device to perform the computation on.
-
upscaling_accelerator
(str
, default:None
) –The device to perform the SVD computation on.
-
full_matrices
(bool
, default:True
) –Whether to compute the full-sized U and V matrices.
-
gate_k
(int
, default:256
) –The number of singular values to keep for the gate.
-
k
(int
, default:256
) –The number of singular values to keep for the experts.
-
top_k
(int
, default:1
) –The number of top experts to select.
-
routing_use_diff
(bool
, default:True
) –Whether to use weight differences for routing.
-
average_experts
(bool
, default:False
) –Whether to average the experts.
-
model_path
(str
, default:None
) –The path to save/load the model.
-
**kwargs
–Additional arguments.
Source code in fusion_bench/method/smile_upscaling/smile_upscaling.py
merge(pretrained_model, finetuned_models, in_place=True)
¶
Merges the pretrained model with the fine-tuned models to create an upscaled model.
Parameters:
-
pretrained_model
(Module
) –The pretrained model.
-
finetuned_models
(List[Module]
) –A list of fine-tuned models.
-
in_place
(bool
, default:True
) –If True, modifies the pretrained model in place. Otherwise, creates a copy.
Returns:
-
–
nn.Module: The merged model.
Source code in fusion_bench/method/smile_upscaling/smile_upscaling.py
run(modelpool)
¶
Executes the upscaling process.
Parameters:
-
modelpool
(ModelPool
) –The pool of models to be used for upscaling.
Returns:
-
–
nn.Module: The upscaled model.
Source code in fusion_bench/method/smile_upscaling/smile_upscaling.py
SingularProjectionMergingAlgorithm
¶
Bases: ModelFusionAlgorithm
, SimpleProfilerMixin
A model fusion algorithm that projects parameter differences into the SVD subspace of a pretrained model.
This algorithm is experimental and aims to investigate the location of task-specific knowledge.
Source code in fusion_bench/method/smile_upscaling/singular_projection_merging.py
76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
merge(pretrained_model, finetuned_model, in_place=True)
¶
Merges the pretrained model with the fine-tuned model by projecting parameter differences into the SVD subspace of the pretrained model.
Parameters:
-
pretrained_model
(Module
) –The pretrained model.
-
finetuned_model
(Module
) –The fine-tuned model.
-
in_place
(bool
, default:True
) –If True, modifies the fine-tuned model in place. Otherwise, creates a copy.
Returns:
-
Module
–nn.Module: The merged model.
Source code in fusion_bench/method/smile_upscaling/singular_projection_merging.py
projection_merge_linear(pretrained_model, finetuned_model, k)
¶
Projects the parameter differences of linear layers into the SVD subspace of the pretrained model.
Parameters:
-
pretrained_model
(Linear
) –The linear layer of the pretrained model.
-
finetuned_model
(Linear
) –The linear layer of the fine-tuned model.
-
k
(int
) –The number of singular values to keep. If negative, it is determined based on the sum of singular values.
Returns:
-
Linear
–nn.Linear: The merged linear layer with projected parameter differences.
Source code in fusion_bench/method/smile_upscaling/singular_projection_merging.py
run(modelpool)
¶
Run the singular projection merging algorithm on the given model pool.
Parameters:
-
modelpool
(ModelPool
) –The pool of models to merge.
Returns:
-
Module
–nn.Module: The merged model.