Data Utilities¶
Dataset Manipulation¶
fusion_bench.utils.data
¶
InfiniteDataLoader
¶
A wrapper class for DataLoader to create an infinite data loader. This is useful in case we are only interested in the number of steps and not the number of epochs.
This class wraps a DataLoader and provides an iterator that resets when the end of the dataset is reached, creating an infinite loop.
Attributes:
-
data_loader
(DataLoader
) โThe DataLoader to wrap.
-
data_iter
(iterator
) โAn iterator over the DataLoader.
Source code in fusion_bench/utils/data.py
load_tensor_from_file(file_path, device=None)
¶
Loads a tensor from a file, which can be either a .pt, .pth or .np file. If the file is not one of these formats, it will try to load it as a pickle file.
Parameters:
-
file_path
(str
) โThe path to the file to load.
-
device
โThe device to move the tensor to. By default the tensor is loaded on the CPU.
Returns:
-
Tensor
โtorch.Tensor: The tensor loaded from the file.
Source code in fusion_bench/utils/data.py
train_validation_split(dataset, validation_fraction=0.1, validation_size=None, random_seed=None, return_split='both')
¶
Split a dataset into a training and validation set.
Parameters:
-
dataset
(Dataset
) โThe dataset to split.
-
validation_fraction
(Optional[float]
, default:0.1
) โThe fraction of the dataset to use for validation.
-
validation_size
(Optional[int]
, default:None
) โThe number of samples to use for validation.
validation_fraction
must be set toNone
if this is provided. -
random_seed
(Optional[int]
, default:None
) โThe random seed to use for reproducibility.
-
return_split
(Literal['all', 'train', 'val']
, default:'both'
) โThe split to return.
Returns:
-
โ
Tuple[Dataset, Dataset]: The training and validation datasets.
Source code in fusion_bench/utils/data.py
train_validation_test_split(dataset, validation_fraction, test_fraction, random_seed=None, return_spilt='all')
¶
Split a dataset into a training, validation and test set.
Parameters:
-
dataset
(Dataset
) โThe dataset to split.
-
validation_fraction
(float
) โThe fraction of the dataset to use for validation.
-
test_fraction
(float
) โThe fraction of the dataset to use for test.
-
random_seed
(Optional[int]
, default:None
) โThe random seed to use for reproducibility.
-
return_spilt
(Literal['all', 'train', 'val', 'test']
, default:'all'
) โThe split to return.
Returns:
-
โ
Tuple[Dataset, Dataset, Dataset]: The training, validation and test datasets.
Source code in fusion_bench/utils/data.py
Json Import/Export¶
fusion_bench.utils.json
¶
load_from_json(path)
¶
load an object from a json file
Parameters:
-
path
(Union[str, Path]
) โthe path to load the object
Returns:
-
dict
(Union[dict, list]
) โthe loaded object
Source code in fusion_bench/utils/json.py
print_json(j, indent=' ', verbose=False, print_type=True)
¶
print an overview of json file
Examples:
Parameters:
-
j
(dict
) โloaded json file
-
indent
(str
, default:' '
) โDefaults to ' '.
Source code in fusion_bench/utils/json.py
save_to_json(obj, path)
¶
save an object to a json file
Parameters:
-
obj
(Any
) โthe object to save
-
path
(Union[str, Path]
) โthe path to save the object
TensorBoard Data Import¶
fusion_bench.utils.tensorboard
¶
functions deal with tensorboard logs.
parse_tensorboard_as_dict(path, scalars)
¶
returns a dictionary of pandas dataframes for each requested scalar.
Parameters:
-
path(str)
โA file path to a directory containing tf events files, or a single tf events file. The accumulator will load events from this path.
-
scalars
(Iterable[str]
) โscalars
Returns:
-
โ
Dict[str, pandas.DataFrame]: a dictionary of pandas dataframes for each requested scalar
Source code in fusion_bench/utils/tensorboard.py
parse_tensorboard_as_list(path, scalars)
¶
returns a list of pandas dataframes for each requested scalar.
see also: func:
parse_tensorboard_as_dict
Parameters:
-
path(str)
โA file path to a directory containing tf events files, or a single tf events file. The accumulator will load events from this path.
-
scalars
(Iterable[str]
) โscalars
Returns:
-
โ
List[pandas.DataFrame]: a list of pandas dataframes for each requested scalar.