Datasets

Fed-ISIC2019

class flamby.datasets.fed_isic2019.FedIsic2019(*args, **kwargs)[source]

Pytorch dataset containing for each center the features and associated labels for the Isic2019 federated classification. One can instantiate this dataset with train or test data coming from either of the 6 centers it was created from or all data pooled. The train/test split is fixed and given in the train_test_split file.

Parameters:

center (int, optional) – Default to 0
train (bool, optional) – Default to True
pooled (bool, optional) – Default to False
debug (bool, optional) – Default to False
X_dtype (torch.dtype, optional) – Default to torch.float32
y_dtype (torch.dtype, optional) – Default to torch.int64
data_path (str) – If data_path is given it will ignore the config file and look for the dataset directly in data_path. Defaults to None.

Fed-Camelyon16

class flamby.datasets.fed_camelyon16.FedCamelyon16(center=0, train=True, pooled=False, X_dtype=torch.float32, y_dtype=torch.float32, debug=False, data_path=None)[source]

Pytorch dataset containing for each center the features and associated labels for Camelyon16 federated classification. One can instantiate this dataset with train or test data coming from either of the 2 centers it was created from or all data pooled. The train/test split corresponds to the one from the Challenge.

Parameters:

center (int, optional) – Default to 0.
train (bool, optional) – Default to True
pooled (bool, optional) – Whether to take all data from the 2 centers into one dataset, by default False
X_dtype (torch.dtype, optional) – Dtype for inputs X. Defaults to torch.float32.
y_dtype (torch.dtype, optional) – Dtype for labels y. Defaults to torch.int64.
debug (bool, optional,) – Whether or not to use only the part of the dataset downloaded in debug mode. Defaults to False.
data_path (str) – If data_path is given it will ignore the config file and look for the dataset directly in data_path. Defaults to None.

Fed-LIDC-IDRI

class flamby.datasets.fed_lidc_idri.FedLidcIdri(X_dtype=torch.float32, y_dtype=torch.int64, out_shape=(384, 384, 384), sampler=<flamby.datasets.fed_lidc_idri.data_utils.Sampler object>, transform=<flamby.datasets.fed_lidc_idri.data_utils.ClipNorm object>, center=0, train=True, pooled=False, debug=False, data_path=None)[source]

Pytorch dataset containing for each center the features and associated labels for LIDC-IDRI federated classification.

Parameters:

X_dtype (torch.dtype, optional) – Dtype for inputs X. Defaults to torch.float32.
y_dtype (torch.dtype, optional) – Dtype for labels y. Defaults to torch.int64.
out_shape (Tuple or None, optional) – The desired output shape. If None, no padding or cropping is performed. Default is (384, 384, 384).
sampler (flamby.datasets.fed_lidc_idri.data_utils.Sampler) – Patch sampling method.
transform (torch.torchvision.Transform or None, optional.) – Transformation to perform on each data point.
center (int, optional) – Id of the center from which to gather data. Defaults to 0.
train (bool, optional) – Whether to take the train or test split. Defaults to True (train).
pooled (bool, optional) – Whether to take all data from the 2 centers into one dataset. If True, supersedes center argument. Defaults to False.
debug (bool, optional) – Whether the dataset was downloaded in debug mode. Defaults to false.
data_path (str) – If data_path is given it will ignore the config file and look for the dataset directly in data_path. Defaults to None.

Fed-TCGA_BRCA

class flamby.datasets.fed_tcga_brca.FedTcgaBrca(center=0, train=True, pooled=False, X_dtype=torch.float32, y_dtype=torch.float32)[source]

Pytorch dataset containing all the clinical features and (event, time) information for TCGA-BRCA survival analysis. One can instantiate this dataset with train or test data coming from either of the 6 regions or all regions pooled. The train/test split is static and given in the train_test_split file.

Parameters:

center (int, optional) – Between 0 and 5, designates the region in the case of pooled==False. Default to 0
train (bool, optional) – Characterizes if the dataset is used for training or for testing. Default to True
pooled (bool, optional) – Characterizes if the dataset is pooled or not. Default to False
X_dtype (torch.dtype, optional) – Default to torch.float32
y_dtype (torch.dtype, optional) – Default to torch.float32

Fed-Heart-Disease

class flamby.datasets.fed_heart_disease.FedHeartDisease(center=0, train=True, pooled=False, X_dtype=torch.float32, y_dtype=torch.float32, debug=False, data_path=None, normalize=True)[source]

Pytorch dataset containing for each center the features and associated labels for Heart Disease federated classification. One can instantiate this dataset with train or test data coming from either of the 4 centers it was created from or all data pooled. The train/test split are arbitrarily fixed.

Parameters:

center (int, optional) – Default to 0
train (bool, optional) – Default to True
pooled (bool, optional) – Whether to take all data from the 2 centers into one dataset, by default False
X_dtype (torch.dtype, optional) – Dtype for inputs X. Defaults to torch.float32.
y_dtype (torch.dtype, optional) – Dtype for labels y. Defaults to torch.int64.
debug (bool, optional,) – Whether or not to use only the part of the dataset downloaded in debug mode. Defaults to False.
data_path (str) – If data_path is given it will ignore the config file and look for the dataset directly in data_path. Defaults to None.
normalize (bool) – Whether or not to normalize the features. We use the corresponding training client to compute the mean and std per feature used to normalize. When using pooled=True, we use the training part of the full dataset to compute the statistics, in order to reflect the differences between available informations in FL and pooled mode. Defaults to True.

Fed-IXITiny

class flamby.datasets.fed_ixi.FedIXITiny(transform=None, center=0, train=True, pooled=False, debug=False, data_path=None)[source]

Federated class for T1 images in IXI Tiny Dataset

Parameters:

transform – PyTorch Transform to process the data or augment it.
center (int, optional) – Id of the center (hospital) from which to gather data. Defaults to 0.
train (bool, optional) – Whether to take the train or test split. Defaults to True (train).
pooled (bool, optional) – Whether to take all data from the 3 centers into one dataset. If True, supersedes center argument. Defaults to False.
debug (bool, optional) – Default to False.
data_path (str) – If data_path is given it will ignore the config file and look for the dataset directly in data_path. Defaults to None.

Fed-Kits19

class flamby.datasets.fed_kits19.FedKits19(center=0, train=True, pooled=False, X_dtype=torch.float32, y_dtype=torch.float32, debug=False)[source]

Pytorch dataset containing for each center the features and associated labels for Camelyon16 federated classification. One can instantiate this dataset with train or test data coming from either of the 2 centers it was created from or all data pooled. The train/test split corresponds to the one from the Challenge.

Parameters:

center (int, optional) – Center id between 0 and 5. Default to 0
train (bool, optional) – Default to True
pooled (bool, optional) – Default to False
X_dtype (torch.dtype, optional) – Default to torch.float32
y_dtype (torch.dtype, optional) – Default to torch.float32
debug (bool, optional) – Whether or not to use only the part of the dataset downloaded in debug mode. Default to False.