CLIP Template Factory Documentation¶
Overview¶
CLIPTemplateFactory
is a class designed to facilitate the dynamic creation and management of dataset templates for use with CLIP models. It serves as a factory class that allows users to retrieve class names and templates for various datasets, register new datasets, and obtain a list of all available datasets.
Usage Example¶
from fusion_bench.tasks.clip_classification import CLIPTemplateFactory
# List all available datasets
available_datasets = CLIPTemplateFactory.get_available_datasets()
print(available_datasets)
get class names and templates for image classification
classnames, templates = CLIPTemplateFactory.get_classnames_and_templates("cifar10")
# classnames: ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
# templates is a list functions, `templates[0](classnames[0])` will return 'a photo of a airplane.'
# or you can use the `get_classnames_and_templates` function
from fusion_bench.tasks.clip_classification import get_classnames_and_templates
classnames, templates = get_classnames_and_templates("cifar10")
or you can register a new dataset
CLIPTemplateFactory.register_dataset(
"new_dataset",
dataset_info={
"module": "module_name",
"classnames": "classnames",
"templates": "templates"
}
)
# Retrieve class names and templates for a registered dataset
# this is equivalent to:
# >>> from module_name import classnames, templates
classnames, templates = CLIPTemplateFactory.get_classnames_and_templates("new_dataset")
# or pass the classnames and templates directly
CLIPTemplaetFactory.register_dataset(
"new_dataset",
classnames=["class1", "class2", "class3"],
templates=[
lambda x: f"a photo of a {x}.",
lambda x: f"a picture of a {x}.",
lambda x: f"an image of a {x}."
]
)
Reference¶
CLIPTemplateFactory
¶
A factory class for creating CLIP dataset templates.
This class provides methods to retrieve class names and templates for various datasets, register new datasets, and get a list of all available datasets. It uses a mapping from dataset names to their respective module paths or detailed information, facilitating dynamic import and usage of dataset-specific class names and templates.
Attributes:
-
_dataset_mapping
(dict
) –A mapping from dataset names to their respective module paths
Methods:
-
get_classnames_and_templates
–str): Retrieves class names and templates for the specified dataset.
-
register_dataset
–str, dataset_info: Dict[str, Any] = None, classnames: List[str] = None, templates: List[Callable] = None): Registers a new dataset with its associated information.
-
get_available_datasets
–Returns a list of all available dataset names.
Source code in fusion_bench/tasks/clip_classification/__init__.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 |
|
get_available_datasets()
staticmethod
¶
Get a list of all available dataset names.
Returns:
-
–
List[str]: A list of dataset names.
get_classnames_and_templates(dataset_name)
staticmethod
¶
Retrieves class names and templates for the specified dataset.
This method looks up the dataset information in the internal mapping and dynamically imports the class names and templates from the specified module. It supports both simple string mappings and detailed dictionary mappings for datasets.
Parameters:
-
dataset_name
¶str
) –The name of the dataset for which to retrieve class names and templates.
Returns:
-
–
Tuple[List[str], List[Callable]]: A tuple containing a list of class names and a list of template callables.
Raises:
-
ValueError
–If the dataset_name is not found in the internal mapping.
Source code in fusion_bench/tasks/clip_classification/__init__.py
register_dataset(dataset_name, *, dataset_info=None, classnames=None, templates=None)
staticmethod
¶
Registers a new dataset with its associated information.
This method allows for the dynamic addition of datasets to the internal mapping. It supports
registration through either a detailed dictionary (dataset_info
) or separate lists of class names
and templates. If a dataset with the same name already exists, it will be overwritten.
The expected format and contents of dataset_info
can vary depending on the needs of the dataset being registered, but typically, it includes the following keys:
- "module": A string specifying the module path where the dataset's related classes and functions are located. This is used for dynamic import of the dataset's class names and templates.
- "classnames": This key is expected to hold the name of the attribute or variable in the specified module that contains a list of class names relevant to the dataset. These class names are used to label data points in the dataset.
- "templates": Similar to "classnames", this key should specify the name of the attribute or variable in the module that contains a list of template callables. These templates are functions or methods that define how data points should be processed or transformed.
Parameters:
-
dataset_name
¶str
) –The name of the dataset to register.
-
dataset_info
¶Dict[str, Any]
, default:None
) –A dictionary containing the dataset information, including module path, class names, and templates. Defaults to None.
-
classnames
¶List[str]
, default:None
) –A list of class names for the dataset. Required if
dataset_info
is not provided. Defaults to None. -
templates
¶List[Callable]
, default:None
) –A list of template callables for the dataset. Required if
dataset_info
is not provided. Defaults to None.
Raises:
-
AssertionError
–If neither
dataset_info
nor bothclassnames
andtemplates
are provided.