Core
aggregation
Aggregation functions.
Copy-pasted from the CancerLINQ repo.
aggregate_means(local_means, n_local_samples, filter_nan=False)
Aggregate local means.
Aggregate the local means into a global mean by using the local number of samples.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
local_means
|
list[Any]
|
list of local means. Could be array, float, Series. |
required |
n_local_samples
|
list[int]
|
list of number of samples used for each local mean. |
required |
filter_nan
|
bool
|
Filter NaN values in the local means, by default False. |
False
|
Returns:
Type | Description |
---|---|
Any
|
Aggregated mean. Same type of the local means |
Source code in fedpydeseq2/core/utils/aggregation.py
compute_lfc_utils
get_lfc_utils_from_gene_mask_adata(adata, gene_mask, disp_param_name, beta=None, lfc_param_name=None)
Get the necessary data for LFC computations from the local adata and genes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
adata
|
AnnData
|
The local AnnData object. |
required |
gene_mask
|
ndarray
|
The mask of genes to use for the IRLS algorithm. This mask identifies the genes in the non_zero_gene_names. If None, all non zero genes are used. |
required |
disp_param_name
|
str
|
The name of the dispersion parameter in the adata.varm. |
required |
beta
|
Optional[ndarray]
|
The log fold change values, of shape (n_non_zero_genes,). |
None
|
lfc_param_name
|
str | None
|
The name of the lfc parameter in the adata.varm. Is incompatible with beta. |
None
|
Returns:
Name | Type | Description |
---|---|---|
gene_names |
list[str]
|
The names of the genes to use for the IRLS algorithm. |
design_matrix |
ndarray
|
The design matrix. |
size_factors |
ndarray
|
The size factors. |
counts |
ndarray
|
The count matrix from the local adata. |
dispersions |
ndarray
|
The dispersions from the local adata. |
beta_on_mask |
ndarray
|
The log fold change values on the mask. |
Source code in fedpydeseq2/core/utils/compute_lfc_utils.py
design_matrix
build_design_matrix(metadata, design_factors='stage', levels=None, continuous_factors=None, ref_levels=None)
Build design_matrix matrix for DEA.
Unless specified, the reference factor is chosen alphabetically. Copied from PyDESeq2, with some modifications specific to fedomics to ensure that all centers have the same columns
Parameters:
Name | Type | Description | Default |
---|---|---|---|
metadata
|
DataFrame
|
DataFrame containing metadata information. Must be indexed by sample barcodes. |
required |
design_factors
|
str or list
|
Name of the columns of metadata to be used as design_matrix variables.
(default: |
'stage'
|
levels
|
dict
|
An optional dictionary of lists of strings specifying the levels of each factor
in the global design, e.g. |
None
|
ref_levels
|
dict
|
An optional dictionary of the form |
None
|
continuous_factors
|
list
|
An optional list of continuous (as opposed to categorical) factors, that should
also be in |
None
|
Returns:
Type | Description |
---|---|
DataFrame
|
A DataFrame with experiment design information (to split cohorts). Indexed by sample barcodes. |
Source code in fedpydeseq2/core/utils/design_matrix.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 |
|
layers
build_layers
Module to construct the layers.
cooks
Module to set the cooks layer.
can_set_cooks_layer(adata, shared_state, raise_error=False)
Check if the Cook's distance can be set.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
adata
|
AnnData
|
The local adata. |
required |
shared_state
|
Optional[dict]
|
The shared state containing the Cook's dispersion values. |
required |
raise_error
|
bool
|
Whether to raise an error if the Cook's distance cannot be set. |
False
|
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
Whether the Cook's distance can be set. |
Raises:
Type | Description |
---|---|
ValueError:
|
If the Cook's distance cannot be set and raise_error is True. |
Source code in fedpydeseq2/core/utils/layers/build_layers/cooks.py
set_cooks_layer(adata, shared_state)
Compute the Cook's distance from the shared state.
This function computes the Cook's distance from the shared state and stores it in the "cooks" layer of the local adata.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
adata
|
AnnData
|
The local adata. |
required |
shared_state
|
dict
|
The shared state containing the Cook's dispersion values. |
required |
Source code in fedpydeseq2/core/utils/layers/build_layers/cooks.py
fit_lin_mu_hat
Module to reconstruct the fit_lin_mu_hat layer.
can_get_fit_lin_mu_hat(local_adata, raise_error=False)
Check if the fit_lin_mu_hat layer can be reconstructed.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
local_adata
|
AnnData
|
The local AnnData object. |
required |
raise_error
|
bool
|
If True, raise an error if the fit_lin_mu_hat layer cannot be reconstructed. |
False
|
Returns:
Type | Description |
---|---|
bool
|
True if the fit_lin_mu_hat layer can be reconstructed, False otherwise. |
Raises:
Type | Description |
---|---|
ValueError
|
If the fit_lin_mu_hat layer cannot be reconstructed and raise_error is True. |
Source code in fedpydeseq2/core/utils/layers/build_layers/fit_lin_mu_hat.py
set_fit_lin_mu_hat(local_adata, min_mu=0.5)
Calculate the _fit_lin_mu_hat layer using the provided local data.
Checks are performed to ensure necessary keys are present in the data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
local_adata
|
AnnData
|
The local anndata object containing necessary keys for computation. |
required |
min_mu
|
float
|
The minimum value for mu, defaults to 0.5. |
0.5
|
Source code in fedpydeseq2/core/utils/layers/build_layers/fit_lin_mu_hat.py
hat_diagonals
Module to set the hat diagonals layer.
can_set_hat_diagonals_layer(adata, shared_state, raise_error=False)
Check if the hat diagonals layer can be reconstructed.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
adata
|
AnnData
|
The AnnData object. |
required |
shared_state
|
Optional[dict]
|
The shared state dictionary. |
required |
raise_error
|
bool
|
If True, raise an error if the hat diagonals layer cannot be reconstructed. |
False
|
Returns:
Type | Description |
---|---|
bool
|
True if the hat diagonals layer can be reconstructed, False otherwise. |
Raises:
Type | Description |
---|---|
ValueError
|
If the hat diagonals layer cannot be reconstructed and raise_error is True. |
Source code in fedpydeseq2/core/utils/layers/build_layers/hat_diagonals.py
make_hat_diag_batch(beta, global_hat_matrix_inv, design_matrix, size_factors, dispersions, min_mu=0.5)
Compute the H matrix for a batch of LFC estimates.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
beta
|
ndarray
|
Current LFC estimate, of shape (batch_size, n_params). |
required |
global_hat_matrix_inv
|
ndarray
|
The inverse of the global hat matrix, of shape (batch_size, n_params, n_params). |
required |
design_matrix
|
ndarray
|
The design matrix, of shape (n_obs, n_params). |
required |
size_factors
|
ndarray
|
The size factors, of shape (n_obs). |
required |
dispersions
|
ndarray
|
The dispersions, of shape (batch_size). |
required |
min_mu
|
float
|
Lower bound on estimated means, to ensure numerical stability.
(default: |
0.5
|
Returns:
Type | Description |
---|---|
ndarray
|
The H matrix, of shape (batch_size, n_obs). |
Source code in fedpydeseq2/core/utils/layers/build_layers/hat_diagonals.py
set_hat_diagonals_layer(adata, shared_state, n_jobs=1, joblib_verbosity=0, joblib_backend='loky', batch_size=100, min_mu=0.5)
Compute the hat diagonals layer from the adata and the shared state.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
adata
|
AnnData
|
The AnnData object. |
required |
shared_state
|
Optional[dict]
|
The shared state dictionary. This dictionary must contain the global hat matrix inverse. |
required |
n_jobs
|
int
|
The number of jobs to use for parallel processing. |
1
|
joblib_verbosity
|
int
|
The verbosity level of joblib. |
0
|
joblib_backend
|
str
|
The joblib backend to use. |
'loky'
|
batch_size
|
int
|
The batch size for parallel processing. |
100
|
min_mu
|
float
|
Lower bound on estimated means, to ensure numerical stability. |
0.5
|
Returns:
Type | Description |
---|---|
ndarray
|
The hat diagonals layer, of shape (n_obs, n_params). |
Source code in fedpydeseq2/core/utils/layers/build_layers/hat_diagonals.py
mu_hat
Module to build the mu_hat layer.
can_get_mu_hat(local_adata, raise_error=False)
Check if the mu_hat layer can be reconstructed.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
local_adata
|
AnnData
|
The local AnnData object. |
required |
raise_error
|
bool
|
If True, raise an error if the mu_hat layer cannot be reconstructed. |
False
|
Returns:
Type | Description |
---|---|
bool
|
True if the mu_hat layer can be reconstructed, False otherwise. |
Raises:
Type | Description |
---|---|
ValueError
|
If the mu_hat layer cannot be reconstructed and raise_error is True. |
Source code in fedpydeseq2/core/utils/layers/build_layers/mu_hat.py
set_mu_hat_layer(local_adata)
Reconstruct the mu_hat layer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
local_adata
|
AnnData
|
The local AnnData object. |
required |
Source code in fedpydeseq2/core/utils/layers/build_layers/mu_hat.py
mu_layer
Module to construct mu layer from LFC estimates.
can_set_mu_layer(local_adata, lfc_param_name, mu_param_name, raise_error=False)
Check if the mu layer can be reconstructed.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
local_adata
|
AnnData
|
The local AnnData object. |
required |
lfc_param_name
|
str
|
The name of the log fold changes parameter in the adata. |
required |
mu_param_name
|
str
|
The name of the mu parameter in the adata. |
required |
raise_error
|
bool
|
If True, raise an error if the mu layer cannot be reconstructed. |
False
|
Returns:
Type | Description |
---|---|
bool
|
True if the mu layer can be reconstructed, False otherwise. |
Raises:
Type | Description |
---|---|
ValueError
|
If the mu layer cannot be reconstructed and raise_error is True. |
Source code in fedpydeseq2/core/utils/layers/build_layers/mu_layer.py
make_mu_batch(beta, design_matrix, size_factors)
Compute the mu matrix for a batch of LFC estimates.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
beta
|
ndarray
|
Current LFC estimate, of shape (batch_size, n_params). |
required |
design_matrix
|
ndarray
|
The design matrix, of shape (n_obs, n_params). |
required |
size_factors
|
ndarray
|
The size factors, of shape (n_obs). |
required |
Returns:
Name | Type | Description |
---|---|---|
mu |
ndarray
|
The mu matrix, of shape (n_obs, batch_size). |
Source code in fedpydeseq2/core/utils/layers/build_layers/mu_layer.py
set_mu_layer(local_adata, lfc_param_name, mu_param_name, n_jobs=1, joblib_verbosity=0, joblib_backend='loky', batch_size=100)
Reconstruct a mu layer from the adata and a given LFC field.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
local_adata
|
AnnData
|
The local AnnData object. |
required |
lfc_param_name
|
str
|
The name of the log fold changes parameter in the adata. |
required |
mu_param_name
|
str
|
The name of the mu parameter in the adata. |
required |
n_jobs
|
int
|
Number of jobs to run in parallel. |
1
|
joblib_verbosity
|
int
|
Verbosity level of joblib. |
0
|
joblib_backend
|
str
|
Joblib backend to use. |
'loky'
|
batch_size
|
int
|
Batch size for parallelization. |
100
|
Source code in fedpydeseq2/core/utils/layers/build_layers/mu_layer.py
normed_counts
Module to construct the normed_counts layer.
can_get_normed_counts(adata, raise_error=False)
Check if the normed_counts layer can be reconstructed.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
adata
|
AnnData
|
The local AnnData object. |
required |
raise_error
|
bool
|
If True, raise an error if the normed_counts layer cannot be reconstructed. |
False
|
Returns:
Type | Description |
---|---|
bool
|
True if the normed_counts layer can be reconstructed, False otherwise. |
Raises:
Type | Description |
---|---|
ValueError
|
If the normed_counts layer cannot be reconstructed and raise_error is True. |
Source code in fedpydeseq2/core/utils/layers/build_layers/normed_counts.py
set_normed_counts(adata)
Reconstruct the normed_counts layer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
adata
|
AnnData
|
The local AnnData object. |
required |
Source code in fedpydeseq2/core/utils/layers/build_layers/normed_counts.py
sqerror
Module to construct the sqerror layer.
can_get_sqerror_layer(adata, raise_error=False)
Check if the squared error layer can be reconstructed.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
adata
|
AnnData
|
The local AnnData object. |
required |
raise_error
|
bool
|
If True, raise an error if the squared error layer cannot be reconstructed. |
False
|
Returns:
Type | Description |
---|---|
bool
|
True if the squared error layer can be reconstructed, False otherwise. |
Raises:
Type | Description |
---|---|
ValueError
|
If the squared error layer cannot be reconstructed and raise_error is True. |
Source code in fedpydeseq2/core/utils/layers/build_layers/sqerror.py
set_sqerror_layer(local_adata)
Compute the squared error between the normalized counts and the trimmed mean.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
local_adata
|
AnnData
|
Local AnnData. It is expected to have the following fields: - layers["normed_counts"]: the normalized counts. - varm["cell_means"]: the trimmed mean. - obs["cells"]: the cells. |
required |
Source code in fedpydeseq2/core/utils/layers/build_layers/sqerror.py
y_hat
Module containing the necessary functions to reconstruct the y_hat layer.
can_get_y_hat(local_adata, raise_error=False)
Check if the y_hat layer can be reconstructed.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
local_adata
|
AnnData
|
The local AnnData object. |
required |
raise_error
|
bool
|
If True, raise an error if the y_hat layer cannot be reconstructed. |
False
|
Returns:
Type | Description |
---|---|
bool
|
True if the y_hat layer can be reconstructed, False otherwise. |
Raises:
Type | Description |
---|---|
ValueError
|
If the y_hat layer cannot be reconstructed and raise_error is True. |
Source code in fedpydeseq2/core/utils/layers/build_layers/y_hat.py
set_y_hat(local_adata)
Reconstruct the y_hat layer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
local_adata
|
AnnData
|
The local AnnData object. |
required |
Source code in fedpydeseq2/core/utils/layers/build_layers/y_hat.py
build_refit_adata
set_basic_refit_adata(self)
Set the basic refit adata from the local adata.
This function checks that the local adata is loaded and the replaced genes are computed and stored in the varm field. It then sets the refit adata from the local adata.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
self
|
Any
|
The object containing the local adata and the refit adata. |
required |
Source code in fedpydeseq2/core/utils/layers/build_refit_adata.py
set_imputed_counts_refit_adata(self)
Set the imputed counts in the refit adata.
This function checks that the refit adata, the local adata, the replaced genes, the trimmed mean normed counts, the size factors, the cooks G cutoff, and the replaceable genes are computed and stored in the appropriate fields. It then sets the imputed counts in the refit adata.
Note that this function must be run on an object which already contains
a refit_adata, whose counts, obsm and uns have been set with the
set_basic_refit_adata
function.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
self
|
Any
|
The object containing the refit adata, the local adata, the replaced genes, the trimmed mean normed counts, the size factors, the cooks G cutoff, and the replaceable genes. |
required |
Source code in fedpydeseq2/core/utils/layers/build_refit_adata.py
cooks_layer
can_skip_local_cooks_preparation(self)
Check if the Cook's distance is in the layers to save.
This function checks if the Cook's distance is in the layers to save.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
self
|
Any
|
The object. |
required |
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
Whether the Cook's distance is in the layers to save. |
Source code in fedpydeseq2/core/utils/layers/cooks_layer.py
make_hat_matrix_summands_batch(design_matrix, size_factors, beta, dispersions, min_mu)
Make the local hat matrix.
This is quite similar to the make_irls_summands_batch function, but it does not require the counts, and returns only the H matrix.
This is used in the final step of the IRLS algorithm to compute the local hat matrix.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
design_matrix
|
ndarray
|
The design matrix, of shape (n_obs, n_params). |
required |
size_factors
|
ndarray
|
The size factors, of shape (n_obs). |
required |
beta
|
ndarray
|
The log fold change matrix, of shape (batch_size, n_params). |
required |
dispersions
|
ndarray
|
The dispersions, of shape (batch_size). |
required |
min_mu
|
float
|
Lower bound on estimated means, to ensure numerical stability. |
required |
Returns:
Name | Type | Description |
---|---|---|
H |
ndarray
|
The H matrix, of shape (batch_size, n_params, n_params). |
Source code in fedpydeseq2/core/utils/layers/cooks_layer.py
prepare_cooks_agg(method)
Decorate the aggregation step to compute the Cook's distance.
This decorator is supposed to be placed on the aggregation step just before a local step which needs the "cooks" layer. The decorator will check if the shared state contains the necessary keys for the Cook's distance computation. If this is not the case, then the Cook's distance must have been saved in the layers_to_save. It will compute the Cook's dispersion, the hat matrix inverse, and then call the method.
It will add the following keys to the shared state: - cooks_dispersions - global_hat_matrix_inv
Parameters:
Name | Type | Description | Default |
---|---|---|---|
method
|
Callable
|
The aggregation method to decorate. It must have the following signature: method(self, shared_states: Optional[list], **method_parameters). |
required |
Returns:
Name | Type | Description |
---|---|---|
Callable |
The decorated method. |
Source code in fedpydeseq2/core/utils/layers/cooks_layer.py
127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 |
|
prepare_cooks_local(method)
Decorate the local method just preceding a local method needing cooks.
This method is only applied if the Cooks layer is not present or must not be saved between steps.
This step is used to compute the local hat matrix and the mean normed counts.
Before the method is called, the varEst must be accessed from the shared state, or from the local adata if it is not present in the shared state.
The local hat matrix and the mean normed counts are computed, and the following keys are added to the shared state: - local_hat_matrix - mean_normed_counts - n_samples - varEst
Parameters:
Name | Type | Description | Default |
---|---|---|---|
method
|
Callable
|
The remote_data method to decorate. |
required |
Returns:
Name | Type | Description |
---|---|---|
Callable |
The decorated method. |
Source code in fedpydeseq2/core/utils/layers/cooks_layer.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
|
joblib_utils
get_joblib_parameters(x)
Get the joblib parameters from an object, and return them as a tuple.
If the object has no joblib parameters, default values are returned.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Any
|
Object from which to extract the joblib parameters. |
required |
Returns:
Name | Type | Description |
---|---|---|
n_jobs |
int
|
Number of jobs to run in parallel. |
joblib_verbosity |
int
|
Verbosity level of joblib. |
joblib_backend |
str
|
Joblib backend. |
batch_size |
int
|
Batch size for the IRLS algorithm. |
Source code in fedpydeseq2/core/utils/layers/joblib_utils.py
reconstruct_adatas_decorator
Module containing a decorator to handle simple layers.
This wrapper is used to load and save simple layers from the adata object. These simple layers are defined in SIMPLE_LAYERS.
check_and_load_layers(self, adata_name, layers_to_load, shared_state, only_from_disk)
Check and load layers for a given adata_name.
This function checks the availability of the layers to load and loads them, for the adata_name adata.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
self
|
Any
|
The object containing the adata. |
required |
adata_name
|
str
|
The name of the adata to load the layers into. |
required |
layers_to_load
|
dict[str, Optional[list[str]]]
|
The layers to load for each adata. It must have adata_name as a key. |
required |
shared_state
|
Optional[dict]
|
The shared state. |
required |
only_from_disk
|
bool
|
Whether to load only the layers from disk. |
required |
Source code in fedpydeseq2/core/utils/layers/reconstruct_adatas_decorator.py
reconstruct_adatas(method)
Decorate a method to load layers and remove them before saving the state.
This decorator loads the layers from the data_from_opener and the adata object before calling the method. It then removes the layers from the adata object after the method is called.
The object self CAN have the following attributes:
-
save_layers_to_disk: if this argument exists or is True, we save all the layers on disk, without removing them at the end of each local step. If it is False, we remove all layers that must be removed at the end of each local step. This argument is prevalent above all others described below.
-
layers_to_save_on_disk: if this argument exists, contains the layers that must be saved on disk at EVERY local step. It can be either None (in which case the default behaviour is to save no layers) or a dictionary with a refit_adata and local_adata key. The associated values contain either None (no layers) or a list of layers to save at each step.
This decorator adds two parameters to each method decorated with it: - layers_to_load - layers_to_save_on_disk
If the layers_to_load is None, the default is to load all available layers. Else, we only load the layers specified in the layers_to_load argument.
The layers_to_save_on_disk argument is ADDED to the layers_to_save_on_disk attribute of self for the duration of the method and then removed. That way, the inner method can access the names of the layers_to_save_on_disk which will effectively be saved at the end of the step.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
method
|
Callable
|
The method to decorate. This method is expected to have the following signature: method(self, data_from_opener: ad.AnnData, shared_state: Any, **method_parameters). |
required |
Returns:
Type | Description |
---|---|
Callable
|
The decorated method, which loads the simple layers before calling the method and removes the simple layers after the method is called. |
Source code in fedpydeseq2/core/utils/layers/reconstruct_adatas_decorator.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 |
|
reconstruct_refit_adata_without_layers(self)
Reconstruct the refit adata without the layers.
This function reconstructs the refit adata without the layers. It is used to avoid the counts and the obsm being loaded uselessly in the refit_adata.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
self
|
Any
|
The object containing the adata. |
required |
Source code in fedpydeseq2/core/utils/layers/reconstruct_adatas_decorator.py
utils
get_available_layers(adata, shared_state, refit=False, all_layers_from_disk=False)
Get the available layers in the adata.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
adata
|
Optional[AnnData]
|
The local adata. |
required |
shared_state
|
dict
|
The shared state containing the Cook's dispersion values. |
required |
refit
|
bool
|
Whether to refit the layers. |
False
|
all_layers_from_disk
|
bool
|
Whether to get all layers from disk. |
False
|
Returns:
Type | Description |
---|---|
list[str]
|
List of available layers. |
Source code in fedpydeseq2/core/utils/layers/utils.py
load_layers(adata, shared_state, layers_to_load, n_jobs=1, joblib_verbosity=0, joblib_backend='loky', batch_size=100)
Load the simple layers from the data_from_opener and the adata object.
This function loads the layers in the layers_to_load attribute in the adata object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
adata
|
AnnData
|
The AnnData object to load the layers into. |
required |
shared_state
|
dict
|
The shared state containing the Cook's dispersion values. |
required |
layers_to_load
|
list[str]
|
The list of layers to load. |
required |
n_jobs
|
int
|
The number of jobs to use for parallel processing. |
1
|
joblib_verbosity
|
int
|
The verbosity level of joblib. |
0
|
joblib_backend
|
str
|
The joblib backend to use. |
'loky'
|
batch_size
|
int
|
The batch size for parallel processing. |
100
|
Source code in fedpydeseq2/core/utils/layers/utils.py
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 |
|
remove_layers(adata, layers_to_save_on_disk, refit=False)
Remove the simple layers from the adata object.
This function removes the simple layers from the adata object. The layers_to_save parameter can be used to specify which layers to save in the local state. If layers_to_save is None, no layers are saved.
This function also adds all present layers to the _available_layers field in the adata object. This field is used to keep track of the layers that are present in the adata object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
adata
|
AnnData
|
The AnnData object to remove the layers from. |
required |
refit
|
bool
|
Whether the adata object is the refit_adata object. |
False
|
layers_to_save_on_disk
|
list[str]
|
The list of layers to save. If None, no layers are saved. |
required |
Source code in fedpydeseq2/core/utils/layers/utils.py
logging
logging_decorators
Module containing decorators to log the input and outputs of a method.
All logging is controlled through a logging configuration file. This configuration file can be either set by the log_config_path attribute of the class, or by the default_config.ini file in the same directory as this module.
get_method_logger(self, method)
Get the method logger from a configuration file.
If the class instance has a log_config_path attribute, the logger is configured with the file at this path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
self
|
Any
|
The class instance |
required |
method
|
Callable
|
The class method. |
required |
Returns:
Type | Description |
---|---|
Logger
|
The logger instance. |
Source code in fedpydeseq2/core/utils/logging/logging_decorators.py
log_remote(method)
Decorate a remote method to log the input and outputs.
This decorator logs the shared state keys with the info level.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
method
|
Callable
|
The method to decorate. This method is expected to have the following signature: method(self, shared_states: Optional[list], **method_parameters). |
required |
Returns:
Type | Description |
---|---|
Callable
|
The decorated method, which logs the shared state keys with the info level. |
Source code in fedpydeseq2/core/utils/logging/logging_decorators.py
log_remote_data(method)
Decorate a remote_data to log the input and outputs.
This decorator logs the shared state keys with the info level, and the different layers of the local_adata and refit_adata with the debug level.
This is done before and after the method call.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
method
|
Callable
|
The method to decorate. This method is expected to have the following signature: method(self, data_from_opener: ad.AnnData, shared_state: Any = None, **method_parameters). |
required |
Returns:
Type | Description |
---|---|
Callable
|
The decorated method, which logs the shared state keys with the info level and the different layers of the local_adata and refit_adata with the debug level. |
Source code in fedpydeseq2/core/utils/logging/logging_decorators.py
log_save_local_state(method)
Decorate a method to log the size of the local state saved.
This function is destined to decorate the save_local_state method of a class.
It logs the size of the local state saved in the local state path, in MB. This is logged as an info message.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
method
|
Callable
|
The method to decorate. This method is expected to have the following signature: method(self, path: pathlib.Path). |
required |
Returns:
Type | Description |
---|---|
Callable
|
The decorated method, which logs the size of the local state saved. |
Source code in fedpydeseq2/core/utils/logging/logging_decorators.py
log_shared_state_adatas(self, method, shared_state)
Log the information of the local step.
Precisely, log the shared state keys (info), and the different layers of the local_adata and refit_adata (debug).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
self
|
Any
|
The class instance |
required |
method
|
Callable
|
The class method. |
required |
shared_state
|
Optional[dict]
|
The shared state dictionary, whose keys we log with the info level. |
required |
Source code in fedpydeseq2/core/utils/logging/logging_decorators.py
mle
batch_mle_grad(counts, design, mu, alpha)
Estimate the local gradients wrt dispersions on a batch of genes.
Returns both the gradient of the negative likelihood, and two matrices used to compute the gradient of the Cox-Reid adjustment.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
counts
|
ndarray
|
Raw counts for a set of genes (n_samples x n_genes). |
required |
design
|
ndarray
|
Design matrix (n_samples x n_params). |
required |
mu
|
ndarray
|
Mean estimation for the NB model (n_samples x n_genes). |
required |
alpha
|
float
|
Initial dispersion estimate (nn_genes). |
required |
Returns:
Name | Type | Description |
---|---|---|
grad |
ndarray
|
Gradient of the negative log likelihood of the observations counts following
:math: |
M1 |
ndarray
|
First summand for the gradient of the CR adjustment (n_genes x n_params x n_params). |
M2 |
ndarray
|
Second summand for the gradient of the CR adjustment (n_genes x n_params x n_params). |
Source code in fedpydeseq2/core/utils/mle.py
batch_mle_update(log_alpha, global_CR_summand_1, global_CR_summand_2, global_ll_grad, lr, alpha_hat=None, prior_disp_var=None, prior_reg=False)
Perform a global dispersions update on a batch of genes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
log_alpha
|
ndarray
|
Current global log dispersions (n_genes). |
required |
global_CR_summand_1
|
ndarray
|
Global summand 1 for the CR adjustment (n_genes x n_params x n_params). |
required |
global_CR_summand_2
|
ndarray
|
Global summand 2 for the CR adjustment (n_genes x n_params x n_params). |
required |
global_ll_grad
|
ndarray
|
Global gradient of the negative log likelihood (n_genes). |
required |
lr
|
float
|
Learning rate. |
required |
alpha_hat
|
ndarray
|
Reference dispersions (for MAP estimation, n_genes). |
None
|
prior_disp_var
|
float
|
Prior dispersion variance. |
None
|
prior_reg
|
bool
|
Whether to use prior regularization for MAP estimation (default: |
False
|
Returns:
Type | Description |
---|---|
ndarray
|
Updated global log dispersions (n_genes). |
Source code in fedpydeseq2/core/utils/mle.py
global_grid_cr_loss(nll, cr_grid)
Compute the global negative log likelihood on a grid.
Sums previously computed local negative log likelihoods and Cox-Reid adjustments.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
nll
|
ndarray
|
Negative log likelihoods of size (n_genes x grid_length). |
required |
cr_grid
|
ndarray
|
Summands for the Cox-Reid adjustment (n_genes x grid_length x n_params x n_params). |
required |
Returns:
Type | Description |
---|---|
ndarray
|
Adjusted negative log likelihood (n_genes x grid_length). |
Source code in fedpydeseq2/core/utils/mle.py
local_grid_summands(counts, design, mu, alpha_grid)
Compute local summands of the adjusted negative log likelihood on a grid.
Includes the Cox-Reid regularization.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
counts
|
ndarray
|
Raw counts for a set of genes (n_samples x n_genes). |
required |
design
|
ndarray
|
Design matrix (n_samples x n_params). |
required |
mu
|
ndarray
|
Mean estimation for the NB model (n_samples x n_genes). |
required |
alpha_grid
|
ndarray
|
Dispersion estimates (n_genes x grid_length). |
required |
Returns:
Name | Type | Description |
---|---|---|
nll |
ndarray
|
Negative log likelihoods of size (n_genes x grid_length). |
cr_matrix |
ndarray
|
Summands for the Cox-Reid adjustment (n_genes x grid_length x n_params x n_params). |
Source code in fedpydeseq2/core/utils/mle.py
single_mle_grad(counts, design, mu, alpha)
Estimate the local gradients of a negative binomial GLM wrt dispersions.
Returns both the gradient of the negative likelihood, and two matrices used to compute the gradient of the Cox-Reid adjustment.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
counts
|
ndarray
|
Raw counts for a given gene (n_samples). |
required |
design
|
ndarray
|
Design matrix (n_samples x n_params). |
required |
mu
|
ndarray
|
Mean estimation for the NB model (n_samples). |
required |
alpha
|
float
|
Initial dispersion estimate (1). |
required |
Returns:
Name | Type | Description |
---|---|---|
grad |
ndarray
|
Gradient of the negative log likelihood of the observations counts following
:math: |
M1 |
ndarray
|
First summand for the gradient of the CR adjustment (n_params x n_params). |
M2 |
ndarray
|
Second summand for the gradient of the CR adjustment (n_params x n_params). |
Source code in fedpydeseq2/core/utils/mle.py
vec_loss(counts, design, mu, alpha, cr_reg=True, prior_reg=False, alpha_hat=None, prior_disp_var=None)
Compute the adjusted negative log likelihood of a batch of genes.
Includes Cox-Reid regularization and (optionally) prior regularization.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
counts
|
ndarray
|
Raw counts for a set of genes (n_samples x n_genes). |
required |
design
|
ndarray
|
Design matrix (n_samples x n_params). |
required |
mu
|
ndarray
|
Mean estimation for the NB model (n_samples x n_genes). |
required |
alpha
|
ndarray
|
Dispersion estimates (n_genes). |
required |
cr_reg
|
bool
|
Whether to include Cox-Reid regularization (default: True). |
True
|
prior_reg
|
bool
|
Whether to include prior regularization (default: False). |
False
|
alpha_hat
|
ndarray
|
Reference dispersions (for MAP estimation, n_genes). |
None
|
prior_disp_var
|
float
|
Prior dispersion variance. |
None
|
Returns:
Type | Description |
---|---|
ndarray
|
Adjusted negative log likelihood (n_genes). |
Source code in fedpydeseq2/core/utils/mle.py
negative_binomial
Gradients and loss functions for the negative binomial distribution.
grid_nb_nll(counts, mu, alpha_grid, mask_nan=None)
Neg log-likelihood of a negative binomial, batched wrt genes on a grid.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
counts
|
ndarray
|
Observations, n_samples x n_genes. |
required |
mu
|
ndarray
|
Mean estimation for the NB model (n_samples x n_genes). |
required |
alpha_grid
|
ndarray
|
Dispersions (n_genes x grid_length). |
required |
mask_nan
|
ndarray
|
Mask for the values of the grid where mu should have taken values >> 1. |
None
|
Returns:
Type | Description |
---|---|
ndarray
|
Negative log likelihoods of size (n_genes x grid_length). |
Source code in fedpydeseq2/core/utils/negative_binomial.py
mu_grid_nb_nll(counts, mu_grid, alpha)
Compute the neg log-likelihood of a negative binomial.
This function is batched wrt genes on a mu grid.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
counts
|
ndarray
|
Observations, (n_obs, batch_size). |
required |
mu_grid
|
ndarray
|
Means of the distribution :math: |
required |
alpha
|
ndarray
|
Dispersions of the distribution :math: |
required |
Returns:
Type | Description |
---|---|
ndarray
|
Negative log likelihoods of the observations counts
following :math: |
Notes
[1] https://en.wikipedia.org/wiki/Negative_binomial_distribution
Source code in fedpydeseq2/core/utils/negative_binomial.py
vec_nb_nll_grad(counts, mu, alpha)
Return the gradient of the negative log-likelihood of a negative binomial.
Vectorized version (wrt genes).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
counts
|
ndarray
|
Observations, n_samples x n_genes. |
required |
mu
|
ndarray
|
Mean of the distribution. |
required |
alpha
|
Series
|
Dispersion of the distribution, s.t. the variance is
:math: |
required |
Returns:
Type | Description |
---|---|
ndarray
|
Gradient of the negative log likelihood of the observations counts following
:math: |
Source code in fedpydeseq2/core/utils/negative_binomial.py
pass_on_results
Module to implement the passing of the first shared state.
TODO remove after all savings have been factored out, if not needed anymore.
AggPassOnResults
Mixin to pass on the first shared state.
Source code in fedpydeseq2/core/utils/pass_on_results.py
pass_on_results(shared_states)
Pass on the shared state.
This method simply returns the first shared state.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
shared_states
|
list
|
List of shared states. |
required |
Returns:
Name | Type | Description |
---|---|---|
dict |
The first shared state.
|
|
Source code in fedpydeseq2/core/utils/pass_on_results.py
pipe_steps
aggregation_step(aggregation_method, train_data_nodes, aggregation_node, input_shared_states, round_idx, description='', clean_models=True, method_params=None)
Perform an aggregation step of the federated learning strategy.
Used as a wrapper to execute an aggregation method on the data of each organization.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
aggregation_method
|
Callable
|
Method to be executed on the shared states. |
required |
train_data_nodes
|
list
|
List of TrainDataNode. |
required |
aggregation_node
|
AggregationNode
|
Aggregation node. |
required |
input_shared_states
|
list
|
List of shared states to be aggregated. |
required |
round_idx
|
int
|
Round index. |
required |
description
|
str
|
Description of the algorithm. |
''
|
clean_models
|
bool
|
Whether to clean the models after the computation. |
True
|
method_params
|
dict
|
Optional keyword arguments to be passed to the aggregation method. |
None
|
Returns:
Name | Type | Description |
---|---|---|
SharedStateRef
|
A shared state containing the results of the aggregation. |
|
round_idx |
int
|
Round index incremented by 1 |
Source code in fedpydeseq2/core/utils/pipe_steps.py
local_step(local_method, train_data_nodes, output_local_states, round_idx, input_local_states=None, input_shared_state=None, aggregation_id=None, description='', clean_models=True, method_params=None)
Local step of the federated learning strategy.
Used as a wrapper to execute a local method on the data of each organization.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
local_method
|
Callable
|
Method to be executed on the local data. |
required |
train_data_nodes
|
TrainDataNode
|
List of TrainDataNode. |
required |
output_local_states
|
dict
|
Dictionary of local states to be updated. |
required |
round_idx
|
int
|
Round index. |
required |
input_local_states
|
dict
|
Dictionary of local states to be used as input. |
None
|
input_shared_state
|
SharedStateRef
|
Shared state to be used as input. |
None
|
aggregation_id
|
str
|
Aggregation node id. |
None
|
description
|
str
|
Description of the algorithm. |
''
|
clean_models
|
bool
|
Whether to clean the models after the computation. |
True
|
method_params
|
dict
|
Optional keyword arguments to be passed to the local method. |
None
|
Returns:
Name | Type | Description |
---|---|---|
output_local_states |
dict
|
Local states containing the results of the local method, to keep within the training nodes. |
output_shared_states |
list
|
Shared states containing the results of the local method, to be sent to the aggregation node. |
round_idx |
int
|
Round index incremented by 1 |
Source code in fedpydeseq2/core/utils/pipe_steps.py
stat_utils
build_contrast(design_factors, design_columns, continuous_factors=None, contrast=None)
Check the validity of the contrast (if provided).
If not, build a default
contrast, corresponding to the last column of the design matrix.
A contrast should be a list of three strings, in the following format:
['variable_of_interest', 'tested_level', 'reference_level']
.
Names must correspond to the metadata data passed to the FedCenters.
E.g., ['condition', 'B', 'A']
will measure the LFC of 'condition B'
compared to 'condition A'.
For continuous variables, the last two strings will be left empty, e.g.
``['measurement', '', ''].
If None, the last variable from the design matrix
is chosen as the variable of interest, and the reference level is picked
alphabetically.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
design_factors
|
list
|
The design factors. |
required |
design_columns
|
list
|
The names of the columns of the design matrices in the centers. |
required |
continuous_factors
|
list
|
The continuous factors in the design, if any. (default: |
None
|
contrast
|
list
|
A list of three strings, in the following format:
|
None
|
Source code in fedpydeseq2/core/utils/stat_utils.py
build_contrast_vector(contrast, LFC_columns)
Build a vector corresponding to the desired contrast.
Allows to test any pair of levels without refitting LFCs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
contrast
|
list
|
A list of three strings, in the following format:
|
required |
LFC_columns
|
list
|
The names of the columns of the LFC matrices in the centers. |
required |
Returns:
Name | Type | Description |
---|---|---|
contrast_vector |
ndarray
|
The contrast vector, containing multipliers to apply to the LFCs. |
contrast_idx |
(int, optional)
|
The index of the tested contrast in the LFC matrix. |
Source code in fedpydeseq2/core/utils/stat_utils.py
wald_test(M, lfc, ridge_factor, contrast_vector, lfc_null, alt_hypothesis)
Run Wald test for a single gene.
Computes Wald statistics, standard error and p-values from dispersion and LFC estimates.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
M
|
ndarray
|
Central parameter in the covariance matrix estimator. |
required |
lfc
|
ndarray
|
Log-fold change estimate (in natural log scale). |
required |
ridge_factor
|
ndarray
|
Regularization factors. |
required |
contrast_vector
|
ndarray
|
Vector encoding the contrast that is being tested. |
required |
lfc_null
|
float
|
The log fold change (in natural log scale) under the null hypothesis. |
required |
alt_hypothesis
|
str
|
The alternative hypothesis for computing wald p-values. |
required |
Returns:
Name | Type | Description |
---|---|---|
wald_p_value |
float
|
Estimated p-value. |
wald_statistic |
float
|
Wald statistic. |
wald_se |
float
|
Standard error of the Wald statistic. |
Source code in fedpydeseq2/core/utils/stat_utils.py
123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 |
|