fedeca.fedeca_core¶
- class FedECA(ndim, ds_client=None, train_data_nodes=None, treated_col='treated', event_col='E', duration_col='T', ps_col='propensity_scores', propensity_fit_cols=None, cox_fit_cols=None, num_rounds_list=[10, 10], damping_factor_nr=0.8, l2_coeff_nr=0.0, standardize_data=True, penalizer=0.0, l1_ratio=1.0, initial_step_size=0.95, learning_rate_strategy='lifelines', dtype='float64', training_strategy='iptw', variance_method='naïve', n_bootstrap=200, bootstrap_seeds=None, bootstrap_function='global', clients_sizes=None, indices_in_global_dataset=None, client_identifier='client', clients_names=None, dp_target_epsilon=None, dp_target_delta=None, dp_max_grad_norm=None, dp_propensity_model_optimizer_class=<class 'torch.optim.sgd.SGD'>, dp_propensity_model_optimizer_kwargs=None, dp_propensity_model_training_params=None, seed=42, aggregation_node=None, experiment_folder='./iptw_experiment', clean_models=False, dependencies=None, timeout=3600, sleep_time=30, fedeca_path=None, evaluation_frequency=None, partner_client=None)¶
Bases:
Experiment
,BaseSurvivalEstimator
,BootstrapMixin
FedECA class that performs Federated IPTW or AIPTW.
- Parameters:
ndim (int) –
train_data_nodes (list[substrafl.nodes.train_data_node.TrainDataNode] | None) –
treated_col (str) –
event_col (str) –
duration_col (str) –
propensity_fit_cols (None | list) –
cox_fit_cols (None | list) –
damping_factor_nr (float) –
l2_coeff_nr (float) –
standardize_data (bool) –
penalizer (float) –
l1_ratio (float) –
initial_step_size (float) –
learning_rate_strategy (str) –
dtype (float) –
training_strategy (str) –
variance_method (str) –
n_bootstrap (int | None) –
clients_sizes (list | None) –
indices_in_global_dataset (list | None) –
client_identifier (str) –
clients_names (list | None) –
dp_target_epsilon (float | None) –
dp_target_delta (float | None) –
dp_max_grad_norm (float | None) –
dp_propensity_model_optimizer_class (Optimizer) –
dp_propensity_model_optimizer_kwargs (dict | None) –
dp_propensity_model_training_params (dict | None) –
seed (int) –
aggregation_node (AggregationNode | None) –
experiment_folder (str) –
clean_models (bool) –
dependencies (list | None) –
timeout (int) –
sleep_time (int) –
fedeca_path (None | str) –
partner_client (None | Client) –
- check_cp_status(idx=0)¶
Check the status of the process.
- compute_propensity_scores(data)¶
Compute propensity scores and corresponding weights.
- Parameters:
data (DataFrame) –
- compute_summary(alpha=0.05)¶
Compute summary for a given threshold.
- Parameters:
alpha (
float
,(default=0.05)
) – Confidence level for computing CIs
- fit(data, targets=None, n_clients=None, split_method=None, split_method_kwargs=None, data_path=None, variance_method=None, n_bootstrap=None, bootstrap_seeds=None, bootstrap_function=None, dp_target_epsilon=None, dp_target_delta=None, dp_max_grad_norm=None, dp_propensity_model_training_params=None, dp_propensity_model_optimizer_class=None, dp_propensity_model_optimizer_kwargs=None, backend_type='subprocess', urls=None, server_org_id=None, tokens=None)¶
Fit strategies on global data split across clients.
For test if provided we use test_data_nodes from int or the train_data_nodes in the latter train=test.
- Parameters:
data (
pd.DataFrame
) – The global data to be split has to be a dataframe as we only support one opener type.targets (
Optional[pd.DataFrame]
, optional) – A dataframe with propensity score or nothing.nb_clients (
Union[int
,None]
, optional) – The number of clients used to split data across, by default Nonesplit_method (
Union[Callable
,None]
, optional) – How to split data across the nb_clients, by default Nonesplit_method_kwargs (
Union[Callable
,None]
, optional) – Argument of the function used to split data, by default Nonedata_path (
Union[str
,None]
) – Where to store the data on disk when backend is not remote.variance_method (:class:
```{"naive", "robust", "bootstrap"}
:class:```) –Method for estimating the variance, and therefore the p-value of the estimated treatment effect. * “naive”: Inverse of the Fisher information. * “robust”: The robust sandwich estimator [1] computed in FL thanks
to FedECA. Useful when samples are reweighted.
”bootstrap”: Bootstrap the given data by sampling each patient with replacement, each time estimate the treatment effect, then use all repeated estimations to compute the variance. The implementation is efficient in substra and shouldn’t induce too much overhead.
Defauts to naïve. [1] David A Binder. Fitting cox’s proportional hazards models from survey data. Biometrika, 79(1):139–147, 1992. # noqa: E501
n_bootstraps (
Union[int
,None]
) – Number of bootstrap to be performed. If None will use len(bootstrap_seeds) instead. If bootstrap_seeds is given seeds those seeds will be used for the generation otherwise seeds are generated randomly.bootstrap_seeds (
Union[list[int]
,None]
) – The list of seeds used for bootstrapping random states. If None will generate n_bootstraps randomly, in the presence of both allways use bootstrap_seeds.bootstrap_function (
Union[Callable
,None]
) – The bootstrap function to use for instance if it is necessary to mimic a global sampling.dp_target_epsilon (
float
) – The target epsilon for (epsilon, delta)-differential private guarantee. Defaults to None.dp_target_delta (
float
) – The target delta for (epsilon, delta)-differential private guarantee. Defaults to None.dp_max_grad_norm (
float
) – The maximum L2 norm of per-sample gradients; used to enforce differential privacy. Defaults to None.dp_propensity_model_optimizer_class (
torch.optim.Optimizer
) – The optimizer to use for the training of the propensity model. Defauts to Adam.dp_propensity_model_optimizer_class_kwargs (
dict
) – The params to give to optimizer class.dp_propensity_model_training_params (
dict
) – A dict with keys batch_size and num_updates for the DP-SGD training. Defaults to None.backend_type (
str
) – The backend to use for substra. Can be either: [“subprocess”, “docker”, “remote”]. Defaults to “subprocess”.urls (
Union[list[str]
,None]
) – Urls corresponding to clients API if using remote backend_type. Defaults to None.server_org_id (
Union[str
,None]
) – Url corresponding to server API if using remote backend_type. Defaults to None.tokens (
Union[list[str]
,None]
) – Tokens necessary to authenticate each client API if backend_type is remote. Defauts to None.n_clients (int | None) –
n_bootstrap (int | None) –
dp_propensity_model_optimizer_kwargs (dict | None) –
- get_final_cox_model()¶
Retrieve final cox model.
- print_summary()¶
Print a summary of the FedECA estimation.
- reset_experiment()¶
Remove the propensity model just in case.
- set_propensity_model_strategy()¶
Set FedECA to use DP.
At the end it sets the parameter self.propensity_model_strateg
- clients_names¶
of clients for global bootstrap by passing num_clients