fedeca.fedeca_core

class FedECA(ndim, ds_client=None, train_data_nodes=None, treated_col='treated', event_col='E', duration_col='T', ps_col='propensity_scores', propensity_fit_cols=None, cox_fit_cols=None, num_rounds_list=[10, 10], damping_factor_nr=0.8, l2_coeff_nr=0.0, standardize_data=True, penalizer=0.0, l1_ratio=1.0, initial_step_size=0.95, learning_rate_strategy='lifelines', dtype='float64', training_strategy='iptw', variance_method='naïve', n_bootstrap=200, bootstrap_seeds=None, bootstrap_function='global', clients_sizes=None, indices_in_global_dataset=None, client_identifier='client', clients_names=None, dp_target_epsilon=None, dp_target_delta=None, dp_max_grad_norm=None, dp_propensity_model_optimizer_class=<class 'torch.optim.sgd.SGD'>, dp_propensity_model_optimizer_kwargs=None, dp_propensity_model_training_params=None, seed=42, aggregation_node=None, experiment_folder='./iptw_experiment', clean_models=False, dependencies=None, timeout=3600, sleep_time=30, fedeca_path=None, evaluation_frequency=None, partner_client=None)

Bases: Experiment, BaseSurvivalEstimator, BootstrapMixin

FedECA class that performs Federated IPTW or AIPTW.

Parameters:
  • ndim (int) –

  • train_data_nodes (list[substrafl.nodes.train_data_node.TrainDataNode] | None) –

  • treated_col (str) –

  • event_col (str) –

  • duration_col (str) –

  • propensity_fit_cols (None | list) –

  • cox_fit_cols (None | list) –

  • num_rounds_list (list[int]) –

  • damping_factor_nr (float) –

  • l2_coeff_nr (float) –

  • standardize_data (bool) –

  • penalizer (float) –

  • l1_ratio (float) –

  • initial_step_size (float) –

  • learning_rate_strategy (str) –

  • dtype (float) –

  • training_strategy (str) –

  • variance_method (str) –

  • n_bootstrap (int | None) –

  • bootstrap_seeds (list[int] | None) –

  • bootstrap_function (Callable | str) –

  • clients_sizes (list | None) –

  • indices_in_global_dataset (list | None) –

  • client_identifier (str) –

  • clients_names (list | None) –

  • dp_target_epsilon (float | None) –

  • dp_target_delta (float | None) –

  • dp_max_grad_norm (float | None) –

  • dp_propensity_model_optimizer_class (Optimizer) –

  • dp_propensity_model_optimizer_kwargs (dict | None) –

  • dp_propensity_model_training_params (dict | None) –

  • seed (int) –

  • aggregation_node (AggregationNode | None) –

  • experiment_folder (str) –

  • clean_models (bool) –

  • dependencies (list | None) –

  • timeout (int) –

  • sleep_time (int) –

  • fedeca_path (None | str) –

  • partner_client (None | Client) –

check_cp_status(idx=0)

Check the status of the process.

compute_propensity_scores(data)

Compute propensity scores and corresponding weights.

Parameters:

data (DataFrame) –

compute_summary(alpha=0.05)

Compute summary for a given threshold.

Parameters:

alpha (float, (default=0.05)) – Confidence level for computing CIs

fit(data, targets=None, n_clients=None, split_method=None, split_method_kwargs=None, data_path=None, variance_method=None, n_bootstrap=None, bootstrap_seeds=None, bootstrap_function=None, dp_target_epsilon=None, dp_target_delta=None, dp_max_grad_norm=None, dp_propensity_model_training_params=None, dp_propensity_model_optimizer_class=None, dp_propensity_model_optimizer_kwargs=None, backend_type='subprocess', urls=None, server_org_id=None, tokens=None)

Fit strategies on global data split across clients.

For test if provided we use test_data_nodes from int or the train_data_nodes in the latter train=test.

Parameters:
  • data (pd.DataFrame) – The global data to be split has to be a dataframe as we only support one opener type.

  • targets (Optional[pd.DataFrame], optional) – A dataframe with propensity score or nothing.

  • nb_clients (Union[int, None], optional) – The number of clients used to split data across, by default None

  • split_method (Union[Callable, None], optional) – How to split data across the nb_clients, by default None

  • split_method_kwargs (Union[Callable, None], optional) – Argument of the function used to split data, by default None

  • data_path (Union[str, None]) – Where to store the data on disk when backend is not remote.

  • variance_method (:class:```{"naive", "robust", "bootstrap"}:class:```) –

    Method for estimating the variance, and therefore the p-value of the estimated treatment effect. * “naive”: Inverse of the Fisher information. * “robust”: The robust sandwich estimator [1] computed in FL thanks

    to FedECA. Useful when samples are reweighted.

    • ”bootstrap”: Bootstrap the given data by sampling each patient with replacement, each time estimate the treatment effect, then use all repeated estimations to compute the variance. The implementation is efficient in substra and shouldn’t induce too much overhead.

    Defauts to naïve. [1] David A Binder. Fitting cox’s proportional hazards models from survey data. Biometrika, 79(1):139–147, 1992. # noqa: E501

  • n_bootstraps (Union[int, None]) – Number of bootstrap to be performed. If None will use len(bootstrap_seeds) instead. If bootstrap_seeds is given seeds those seeds will be used for the generation otherwise seeds are generated randomly.

  • bootstrap_seeds (Union[list[int], None]) – The list of seeds used for bootstrapping random states. If None will generate n_bootstraps randomly, in the presence of both allways use bootstrap_seeds.

  • bootstrap_function (Union[Callable, None]) – The bootstrap function to use for instance if it is necessary to mimic a global sampling.

  • dp_target_epsilon (float) – The target epsilon for (epsilon, delta)-differential private guarantee. Defaults to None.

  • dp_target_delta (float) – The target delta for (epsilon, delta)-differential private guarantee. Defaults to None.

  • dp_max_grad_norm (float) – The maximum L2 norm of per-sample gradients; used to enforce differential privacy. Defaults to None.

  • dp_propensity_model_optimizer_class (torch.optim.Optimizer) – The optimizer to use for the training of the propensity model. Defauts to Adam.

  • dp_propensity_model_optimizer_class_kwargs (dict) – The params to give to optimizer class.

  • dp_propensity_model_training_params (dict) – A dict with keys batch_size and num_updates for the DP-SGD training. Defaults to None.

  • backend_type (str) – The backend to use for substra. Can be either: [“subprocess”, “docker”, “remote”]. Defaults to “subprocess”.

  • urls (Union[list[str], None]) – Urls corresponding to clients API if using remote backend_type. Defaults to None.

  • server_org_id (Union[str, None]) – Url corresponding to server API if using remote backend_type. Defaults to None.

  • tokens (Union[list[str], None]) – Tokens necessary to authenticate each client API if backend_type is remote. Defauts to None.

  • n_clients (int | None) –

  • n_bootstrap (int | None) –

  • dp_propensity_model_optimizer_kwargs (dict | None) –

get_final_cox_model()

Retrieve final cox model.

print_summary()

Print a summary of the FedECA estimation.

reset_experiment()

Remove the propensity model just in case.

run(targets=None)

Run the federated iptw algorithms.

Parameters:

targets (DataFrame | None) –

set_propensity_model_strategy()

Set FedECA to use DP.

At the end it sets the parameter self.propensity_model_strateg

clients_names

of clients for global bootstrap by passing num_clients