Datasets

Fed-ISIC2019

class flamby.datasets.fed_isic2019.FedIsic2019(*args, **kwargs)[source]

Pytorch dataset containing for each center the features and associated labels for the Isic2019 federated classification. One can instantiate this dataset with train or test data coming from either of the 6 centers it was created from or all data pooled. The train/test split is fixed and given in the train_test_split file.

Parameters:
  • center (int, optional) – Default to 0

  • train (bool, optional) – Default to True

  • pooled (bool, optional) – Default to False

  • debug (bool, optional) – Default to False

  • X_dtype (torch.dtype, optional) – Default to torch.float32

  • y_dtype (torch.dtype, optional) – Default to torch.int64

  • data_path (str) – If data_path is given it will ignore the config file and look for the dataset directly in data_path. Defaults to None.

Fed-Camelyon16

class flamby.datasets.fed_camelyon16.FedCamelyon16(center=0, train=True, pooled=False, X_dtype=torch.float32, y_dtype=torch.float32, debug=False, data_path=None)[source]

Pytorch dataset containing for each center the features and associated labels for Camelyon16 federated classification. One can instantiate this dataset with train or test data coming from either of the 2 centers it was created from or all data pooled. The train/test split corresponds to the one from the Challenge.

Parameters:
  • center (int, optional) – Default to 0.

  • train (bool, optional) – Default to True

  • pooled (bool, optional) – Whether to take all data from the 2 centers into one dataset, by default False

  • X_dtype (torch.dtype, optional) – Dtype for inputs X. Defaults to torch.float32.

  • y_dtype (torch.dtype, optional) – Dtype for labels y. Defaults to torch.int64.

  • debug (bool, optional,) – Whether or not to use only the part of the dataset downloaded in debug mode. Defaults to False.

  • data_path (str) – If data_path is given it will ignore the config file and look for the dataset directly in data_path. Defaults to None.

Fed-LIDC-IDRI

class flamby.datasets.fed_lidc_idri.FedLidcIdri(X_dtype=torch.float32, y_dtype=torch.int64, out_shape=(384, 384, 384), sampler=<flamby.datasets.fed_lidc_idri.data_utils.Sampler object>, transform=<flamby.datasets.fed_lidc_idri.data_utils.ClipNorm object>, center=0, train=True, pooled=False, debug=False, data_path=None)[source]

Pytorch dataset containing for each center the features and associated labels for LIDC-IDRI federated classification.

Parameters:
  • X_dtype (torch.dtype, optional) – Dtype for inputs X. Defaults to torch.float32.

  • y_dtype (torch.dtype, optional) – Dtype for labels y. Defaults to torch.int64.

  • out_shape (Tuple or None, optional) – The desired output shape. If None, no padding or cropping is performed. Default is (384, 384, 384).

  • sampler (flamby.datasets.fed_lidc_idri.data_utils.Sampler) – Patch sampling method.

  • transform (torch.torchvision.Transform or None, optional.) – Transformation to perform on each data point.

  • center (int, optional) – Id of the center from which to gather data. Defaults to 0.

  • train (bool, optional) – Whether to take the train or test split. Defaults to True (train).

  • pooled (bool, optional) – Whether to take all data from the 2 centers into one dataset. If True, supersedes center argument. Defaults to False.

  • debug (bool, optional) – Whether the dataset was downloaded in debug mode. Defaults to false.

  • data_path (str) – If data_path is given it will ignore the config file and look for the dataset directly in data_path. Defaults to None.

Fed-TCGA_BRCA

class flamby.datasets.fed_tcga_brca.FedTcgaBrca(center=0, train=True, pooled=False, X_dtype=torch.float32, y_dtype=torch.float32)[source]

Pytorch dataset containing all the clinical features and (event, time) information for TCGA-BRCA survival analysis. One can instantiate this dataset with train or test data coming from either of the 6 regions or all regions pooled. The train/test split is static and given in the train_test_split file.

Parameters:
  • center (int, optional) – Between 0 and 5, designates the region in the case of pooled==False. Default to 0

  • train (bool, optional) – Characterizes if the dataset is used for training or for testing. Default to True

  • pooled (bool, optional) – Characterizes if the dataset is pooled or not. Default to False

  • X_dtype (torch.dtype, optional) – Default to torch.float32

  • y_dtype (torch.dtype, optional) – Default to torch.float32

Fed-Heart-Disease

class flamby.datasets.fed_heart_disease.FedHeartDisease(center=0, train=True, pooled=False, X_dtype=torch.float32, y_dtype=torch.float32, debug=False, data_path=None, normalize=True)[source]

Pytorch dataset containing for each center the features and associated labels for Heart Disease federated classification. One can instantiate this dataset with train or test data coming from either of the 4 centers it was created from or all data pooled. The train/test split are arbitrarily fixed.

Parameters:
  • center (int, optional) – Default to 0

  • train (bool, optional) – Default to True

  • pooled (bool, optional) – Whether to take all data from the 2 centers into one dataset, by default False

  • X_dtype (torch.dtype, optional) – Dtype for inputs X. Defaults to torch.float32.

  • y_dtype (torch.dtype, optional) – Dtype for labels y. Defaults to torch.int64.

  • debug (bool, optional,) – Whether or not to use only the part of the dataset downloaded in debug mode. Defaults to False.

  • data_path (str) – If data_path is given it will ignore the config file and look for the dataset directly in data_path. Defaults to None.

  • normalize (bool) – Whether or not to normalize the features. We use the corresponding training client to compute the mean and std per feature used to normalize. When using pooled=True, we use the training part of the full dataset to compute the statistics, in order to reflect the differences between available informations in FL and pooled mode. Defaults to True.

Fed-IXITiny

Fed-Kits19

class flamby.datasets.fed_kits19.FedKits19(center=0, train=True, pooled=False, X_dtype=torch.float32, y_dtype=torch.float32, debug=False)[source]

Pytorch dataset containing for each center the features and associated labels for Camelyon16 federated classification. One can instantiate this dataset with train or test data coming from either of the 2 centers it was created from or all data pooled. The train/test split corresponds to the one from the Challenge.

Parameters:
  • center (int, optional) – Center id between 0 and 5. Default to 0

  • train (bool, optional) – Default to True

  • pooled (bool, optional) – Default to False

  • X_dtype (torch.dtype, optional) – Default to torch.float32

  • y_dtype (torch.dtype, optional) – Default to torch.float32

  • debug (bool, optional) – Whether or not to use only the part of the dataset downloaded in debug mode. Default to False.