FusionBench: A Comprehensive Benchmark/Toolkit of Deep Model Fusion¶
FusionBench: 一个全面的深度模型融合的基准/工具包
Breaking Changes in v0.2 (重大变更)
Recent upgrade to v0.2.0 may cause some breaking changes. Make some documented instructions may be outdated.
You can install a specific version by pip install fusion-bench==0.1.6
or checkout to a specific version by git checkout v0.1.6
.
If you encounter any issues, please feel free to raise an issue.
We are working on the documentation and will update it as soon as possible. Use version >=0.2.0 is recommended.
最近升级到 v0.2.0 可能会导致一些重大变更,使得文档中的一些指令可能已过时,需要更新。 您可以通过
pip install fusion-bench==0.1.6
安装特定版本,或者通过git checkout v0.1.6
切换到特定版本。 如果遇到任何问题,请随时提出问题。 我们正在努力更新文档,并会尽快更新。建议使用 >=0.2.0 版本。
Note
- Any questions or comments can be directed to the GitHub Issues page for this project.
- Any contributions or pull requests are welcome. If you find any mistakes or have suggestions for improvements, please feel free to raise an issue or submit a pull request.
- 任何问题或评论可以直接在该项目的GitHub Issues页面上提出。
- 欢迎任何贡献或拉取请求。如果您发现任何错误或有改进建议,请随时提出问题或提交拉取请求。
Introduction to Deep Model Fusion (The Learn From Model Paradigm)
Deep model fusion is a technique that merges, ensemble, or fuse multiple deep neural networks to obtain a unified model. It can be used to improve the performance and robustness of model or to combine the strengths of different models, such as fuse multiple task-specific models to create a multi-task model. For a more detailed introduction to deep model fusion, you can refer to W. Li, 2023, 'Deep Model Fusion: A Survey'. In this benchmark, we evaluate the performance of different fusion methods on a variety of datasets and tasks. ...
深度模型融合简介(从模型中学习的范式)
深度模型融合是一种将多个深度神经网络合并、集成或融合以获得统一模型的技术。 它可以用来提高模型的性能和鲁棒性,或者结合不同模型的优势,例如将多个特定任务的模型融合成一个多任务模型。 有关深度模型融合的更详细介绍,可以参考 W. Li, 2023, 'Deep Model Fusion: A Survey'。 在这个基准测试中,我们评估了不同融合方法在各种数据集和任务上的性能。...
Getting Started¶
Installation¶
Install from PyPI:
pip install fusion-bench
# you can also install a specific version
# pip install fusion-bench==0.1.6
Or install the latest version in development from github repository
git clone https://github.com/tanganke/fusion_bench.git
cd fusion_bench
# checkout to use a specific version. for example, v0.1.6
# git checkout v0.1.6
pip install -e . # install the package in editable mode
View documentation locally:
# install mkdocs and the required packages
pip install -r mkdocs-requirements.txt
# serve the documentation, by default it will be available at http://localhost:8000
mkdocs serve
Command Line Interface¶
fusion_bench
is the command line interface for running the benchmark.
It takes a configuration file as input, which specifies the models, fusion method to be used, and the datasets to be evaluated.
To run the benchmark, you can use the following command:
fusion_bench [--config-path CONFIG_PATH] [--config-name CONFIG_NAME] \
OPTION_1=VALUE_1 OPTION_2=VALUE_2 ...
This program will load the configuration file specified by --config-path
and --config-name
, and run the fusion algorithm on the model pool.
The pseudocode is as follows:
# instantiate an algorithm, a modelpool object that manages the models,
# and a taskpool object that manages the tasks (dataset + metrics)
algorithm = load_algorithm(config.algorithm)
modelpool = load_modelpool(config.modelpool)
taskpool = load_taskpool(config.taskpool)
# run the fusion algorithm on the model pool
merged_model = algorithm.run(modelpool)
# evaluate the merged model on the tasks
report = taskpool.evaluate(merged_model)
For detailed information on the options available, you can refer to this page.
Implmentation of Fusion Algorithms¶
Working in progress.
General Structure of FusionBench¶
FusionBench is a pioneering project that provides a comprehensive benchmark for deep model fusion, facilitating the evaluation and comparison of various model fusion techniques. The project is meticulously designed to support rigorous analysis and experimentation in the field of model fusion, offering a versatile and modular codebase tailored for advanced research and development.
The general structure of the FusionBench project can be visualized through its modular framework, which is divided into several key components:
- Fusion Algorithm: The core component where Model Fusion takes place. It integrates models from the Model Pool and adjusts them according to the specified fusion algorithms. The output is then evaluated for performance and effectiveness.
- Model Pool: A repository of various pre-trained models that can be accessed and utilized for fusion. This pool serves as the foundation for creating new, fused models by leveraging the strengths of each individual model.
- Task Pool: A collection of tasks that the fused models are evaluated on. These tasks help in assessing the practical applicability and robustness of the fused models.
- Models & Warpers, Datasets, and Metrics: These underlying modules include:
- Models & Warpers: Tools and scripts for model loading, wrapping, and pre-processing.
- Datasets: The datasets used for training, validation, and testing the fused models.
- Metrics: The performance metrics used to evaluate the models, providing a comprehensive understanding of their capabilities.
- YAML Configurations: Central to the project's modularity, YAML files are used to configure models, datasets, and metrics, allowing seamless customization and scalability. This is based on the hydra framework, which allows for easy customization and scalability. Read More
By organizing these components into a structured and modular codebase, FusionBench ensures flexibility, ease of use, and scalability for researchers and developers. The project not only serves as a benchmark but also as a robust platform for innovation in the realm of deep model fusion.
Key Features¶
- Comprehensive Benchmark: FusionBench provides a wide range of fusion algorithms, model pools, and tasks for thorough evaluation.
- Modular Design: The project is structured into separate modules for algorithms, model pools, and task pools, allowing easy extension and customization.
- Command-line Interface: A flexible CLI tool
fusion_bench
for running experiments with various configurations. - Web UI: An interactive web interface
fusion_bench_webui
for easier configuration and command generation. - Extensive Documentation: Detailed guides, API references, and examples to help users get started quickly.
Basic Example¶
Here we provide a basic example to demonstrate the usage of FusionBench. We choose the simple average algorithm as the fusion algorithm, and 4 fine-tuned CLIP-ViT-B/32 models to be merged. We are going to evaluate the merged model on 4 tasks with data corrupted by Gaussian noise to evaluate the robustness of the merged model.
We provide an command line interface fusion_bench
to run the example.
The instruction to run the example is as follows:
fusion_bench \
# (1)
--config-name clip-vit-base-patch32_robustness_corrupted \
corruption=gaussian_noise \
# (2)
method=simple_averaging \
# (3)
modelpool=clip-vit-base-patch32_robustness_corrupted \
# (4)
taskpool=clip-vit-base-patch32_robustness_corrupted
- Here we specify the main configuration file to be used.
The
corruption
option specifies the type of data corruption to be applied to the evaluation datasets. In this case, we use Gaussian noise. In FusionBench, we are currently provide 7 types of data corruptions for imaage classification tasks Standford Cars, EuroSAT, RESISC45 and GTSRB. The optioncorrption
can be one of:contrast
,gaussian_noise
,impulse_noise
,jpeg_compression
,motion_blur
,pixelate
,spatter
. - The
method
option specifies the fusion algorithm to be used. In this case, we use the simple averaging algorithm. - Here we specify the model pool to be used.
The model pool is responsible for managing the loading, preprocessing, and saving of the models.
By pass option
modelpool=clip-vit-base-patch32_robustness_corrupted
, the program instantiate a modelpool object that manages 4 task-specific CLIP-ViT-B/32 models that are fine-tuned on Stanford Cars, EuroSAT, RESISC45, and GTSRB datasets. - Here we specify the task pool to be used.
The task pool is responsible for managing the evaluation datasets and metrics.
By pass option
taskpool=clip-vit-base-patch32_robustness_corrupted
, the program instantiate a taskpool object that manages 4 tasks with data corrupted by Gaussian noise.
The configurations are stored in the configs
directory, listed as follows:
The simple averaging algorithm is very straightforward. No additional hyperparameters are required. So the configuration file contains only the name of the algorithm to specify the Python class of the fusion algorithm.
- Name of the fusion algorithm. The
name
field specifies the class of the fusion algorithm.
type: huggingface_clip_vision # (1)
models: # (2)
- name: _pretrained_
path: openai/clip-vit-base-patch32
- name: stanford_cars
path: tanganke/clip-vit-base-patch32_stanford-cars
- name: eurosat
path: tanganke/clip-vit-base-patch32_eurosat
- name: resisc45
path: tanganke/clip-vit-base-patch32_resisc45
- name: gtsrb
path: tanganke/clip-vit-base-patch32_gtsrb
# `corrption` can be one of:
# contrast, gaussian_noise, impulse_noise, jpeg_compression, motion_blur, pixelate, spatter
corruption: ${corruption}
# Other configurations to meet other methods' requirements.
# For example, test dataset for test-time adaptation training.
# ...
- Type of the model pool. The
type
field specifies the class of the model pool. - The
models
field specifies the models to be used for fusion. In this case, we use 4 task-specific CLIP-ViT-B/32 models that are fine-tuned on Stanford Cars, EuroSAT, RESISC45, and GTSRB datasets.
type: clip_vit_classification # (1)
name: clip-vit-robustness_clean
# corrption can be one of:
# contrast, gaussian_noise, impulse_noise, jpeg_compression, motion_blur, pixelate, spatter
corruption: ${corruption}
dataset_type: huggingface_image_classification
tasks: # (2)
- name: stanford_cars
dataset:
name: tanganke/stanford_cars
split: ${taskpool.corruption}
- name: eurosat
dataset:
name: tanganke/eurosat
split: ${taskpool.corruption}
- name: resisc45
dataset:
name: tanganke/resisc45
split: ${taskpool.corruption}
- name: gtsrb
dataset:
name: tanganke/gtsrb
split: ${taskpool.corruption}
clip_model: openai/clip-vit-base-patch32 # (3)
batch_size: 128 # (4)
num_workers: 16
fast_dev_run: ${fast_dev_run}
- Type and name of the task pool. The
type
field specifies the class of the task pool, and thename
field specifies the name of the task pool. - The
tasks
field specifies the tasks to be evaluated. In this case, we evaluate the fused model on 4 tasks: Stanford Cars, EuroSAT, RESISC45, and GTSRB, with data corrupted by${corruption}
. - Base model used for intializing the classification head. Here, we need the text encoder of CLIP-ViT-B/32 to initialize the classification head.
- Batch size and number of workers used for data loading.
A flowchart of the FusionBench command line interface is shown below:
-
Fusion Algorithm Module
Implement the fusion algorithms. Receive the model pool and return the fused model.
-
Model Pool Module
Magage the models. Responsible for loading, preprocessing, and saving the models.
-
Task Pool Module
Manage the tasks. Responsible for loading evaluation datasets and metrics, and evaluating the fused model.
Citation¶
If you find this benchmark useful, please consider citing our work:
@misc{tangFusionBenchComprehensiveBenchmark2024,
title = {{{FusionBench}}: {{A Comprehensive Benchmark}} of {{Deep Model Fusion}}},
shorttitle = {{{FusionBench}}},
author = {Tang, Anke and Shen, Li and Luo, Yong and Hu, Han and Du, Bo and Tao, Dacheng},
year = {2024},
month = jun,
number = {arXiv:2406.03280},
eprint = {2406.03280},
publisher = {arXiv},
url = {http://arxiv.org/abs/2406.03280},
archiveprefix = {arxiv},
langid = {english},
keywords = {Computer Science - Artificial Intelligence,Computer Science - Computation and Language,Computer Science - Machine Learning}
}