Estimating LFCs and dispersions
compute_MAP_dispersions
Module containing the mixin class to compute MAP dispersions.
compute_MAP_dispersions
Main module to compute dispersions by minimizing the MLE using a grid search.
ComputeMAPDispersions
Bases: LocFilterMAPDispersions
, ComputeDispersionsGridSearch
Mixin class to implement the computation of MAP dispersions.
Methods:
Name | Description |
---|---|
fit_MAP_dispersions |
A method to fit the MAP dispersions and filter them. The filtering is done by removing the dispersions that are too far from the trend curve. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_MAP_dispersions/compute_MAP_dispersions.py
fit_MAP_dispersions(train_data_nodes, aggregation_node, local_states, shared_state, round_idx, clean_models, refit_mode=False)
Fit MAP dispersions, and apply filtering.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
train_data_nodes
|
List of TrainDataNode. |
required | |
aggregation_node
|
The aggregation node. |
required | |
local_states
|
Local states. Required to propagate intermediate results. |
required | |
shared_state
|
Contains the output of the trend fitting, that is a dictionary with a "fitted_dispersion" field containing the fitted dispersions from the trend curve, a "prior_disp_var" field containing the prior variance of the dispersions, and a "_squared_logres" field containing the squared residuals of the trend fitting. |
required | |
round_idx
|
The current round. |
required | |
clean_models
|
Whether to clean the models after the computation. |
required | |
refit_mode
|
bool
|
Whether to run the pipeline in refit mode, after cooks outliers were
replaced. If True, the pipeline will be run on |
False
|
Returns:
Name | Type | Description |
---|---|---|
local_states |
dict
|
Local states. Required to propagate intermediate results. |
round_idx |
int
|
The updated round index. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_MAP_dispersions/compute_MAP_dispersions.py
substeps
LocFilterMAPDispersions
Mixin to filter MAP dispersions and obtain the final dispersion estimates.
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_MAP_dispersions/substeps.py
filter_outlier_genes(data_from_opener, shared_state, refit_mode=False)
Filter out outlier genes.
Avoids shrinking the dispersions of genes that are too far from the trend curve.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_from_opener
|
AnnData
|
Not used. |
required |
shared_state
|
dict
|
Contains: - "MAP_dispersions": MAP dispersions, |
required |
refit_mode
|
bool
|
Whether to run the pipeline on |
False
|
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_MAP_dispersions/substeps.py
compute_dispersion_prior
compute_dispersion_prior
Module containing the steps for fitting the dispersion trend.
ComputeDispersionPrior
Bases: AggFitDispersionTrendAndPrior
, LocGetMeanDispersionAndMean
, LocUpdateFittedDispersions
Mixin class to implement the fit of the dispersion trend.
Methods:
Name | Description |
---|---|
compute_dispersion_prior |
The method to fit the dispersion trend. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_dispersion_prior/compute_dispersion_prior.py
compute_dispersion_prior(train_data_nodes, aggregation_node, local_states, genewise_dispersions_shared_state, round_idx, clean_models)
Fit the dispersion trend.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
train_data_nodes
|
list of TrainDataNode. |
required | |
aggregation_node
|
The aggregation node. |
required | |
local_states
|
Local states. Required to propagate intermediate results. |
required | |
genewise_dispersions_shared_state
|
Shared state with a "genewise_dispersions" key. |
required | |
round_idx
|
Index of the current round. |
required | |
clean_models
|
Whether to clean the models after the computation. |
required |
Returns:
Name | Type | Description |
---|---|---|
local_states |
dict
|
Local states. Required to propagate intermediate results. |
dispersion_trend_share_state |
dict
|
Shared states with: - "fitted_dispersions": the fitted dispersions, - "prior_disp_var": the prior dispersion variance. |
round_idx |
int
|
The updated round index. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_dispersion_prior/compute_dispersion_prior.py
substeps
Module containing the substeps for the computation of size factors.
AggFitDispersionTrendAndPrior
Mixin class to implement the fit of the dispersion trend.
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_dispersion_prior/substeps.py
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 |
|
agg_fit_dispersion_trend_and_prior_dispersion(shared_states)
Fit the dispersion trend, and compute the dispersion prior.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
shared_states
|
dict
|
Shared states from the local step with the following keys: - genewise_dispersions: np.ndarray of shape (n_genes,) - n_params: int - non_zero: np.ndarray of shape (n_genes,) - mean_normed_counts: np.ndarray of shape (n_genes,) - n_obs: int |
required |
Returns:
Type | Description |
---|---|
dict
|
dict with the following keys: - prior_disp_var: float The prior dispersion variance. - _squared_logres: float The squared log-residuals. - trend_coeffs: np.ndarray of shape (2,) The coefficients of the parametric dispersion trend. - fitted_dispersions: np.ndarray of shape (n_genes,) The fitted dispersions, computed from the dispersion trend. - disp_function_type: str The type of dispersion function (parametric or mean). - mean_disp: float, optional The mean dispersion (if "mean" fit type). |
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_dispersion_prior/substeps.py
89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 |
|
LocGetMeanDispersionAndMean
Mixin to get the local mean and dispersion.
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_dispersion_prior/substeps.py
get_local_mean_and_dispersion(data_from_opener, shared_state)
Return local gene means and dispersion.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_from_opener
|
AnnData
|
AnnData returned by the opener. Not used. |
required |
shared_state
|
dict
|
Shared state returned by the last step of gene-wise dispersion computation. Contains a "genewise_dispersions" key with the gene-wise dispersions. |
required |
Returns:
Type | Description |
---|---|
dict
|
Local results to be shared via shared_state to the aggregation node. dict with the following keys: - mean_normed_counts: np.ndarray[float] of shape (n_genes,) The mean normed counts. - n_obs: int The number of observations. - non_zero: np.ndarray[bool] of shape (n_genes,) Mask of the genes with non zero counts. - genewise_dispersions: np.ndarray[float] of shape (n_genes,) The genewise dispersions. - num_vars: int The number of variables. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_dispersion_prior/substeps.py
LocUpdateFittedDispersions
Mixin to update the fitted dispersions after replacing outliers.
To use in refit mode only
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_dispersion_prior/substeps.py
loc_update_fitted_dispersions(data_from_opener, shared_state)
Update the fitted dispersions after replacing outliers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_from_opener
|
AnnData
|
AnnData returned by the opener. Not used. |
required |
shared_state
|
dict
|
A dictionary with a "fitted_dispersions" key, containing the dispersions fitted before replacing the outliers. |
required |
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_dispersion_prior/substeps.py
utils
disp_function(x, disp_function_type, coeffs=None, mean_disp=None)
Return the dispersion trend function at x.
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_dispersion_prior/utils.py
compute_genewise_dispersions
Module containing the mixin class to compute genewise dispersions.
compute_MoM_dispersions
Module to implement the computation of MoM dispersions.
compute_MoM_dispersions
Main module to compute method of moments (MoM) dispersions.
ComputeMoMDispersions
Bases: ComputeRoughDispersions
, LocInvSizeMean
, AggMomentsDispersion
Mixin class to implement the computation of MoM dispersions.
Relies on the ComputeRoughDispersions class, in addition to substeps.
Methods:
Name | Description |
---|---|
compute_MoM_dispersions |
The method to compute the MoM dispersions, that must be used in the main pipeline. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_genewise_dispersions/compute_MoM_dispersions/compute_MoM_dispersions.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
|
compute_MoM_dispersions(train_data_nodes, aggregation_node, local_states, gram_features_shared_states, round_idx, clean_models, refit_mode=False)
Compute method of moments dispersions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
train_data_nodes
|
List of TrainDataNode. |
required | |
aggregation_node
|
The aggregation node. |
required | |
local_states
|
Local states. Required to propagate intermediate results. |
required | |
gram_features_shared_states
|
The list of shared states outputed by the compute_size_factors step. They contain a "local_gram_matrix" and a "local_features" fields. |
required | |
round_idx
|
The current round. |
required | |
clean_models
|
Whether to clean the models after the computation. |
required | |
refit_mode
|
bool
|
Whether to run the pipeline in refit mode, after cooks outliers were
replaced. If True, the pipeline will be run on |
False
|
Returns:
Name | Type | Description |
---|---|---|
local_states |
dict
|
Local states. Required to propagate intermediate results. |
mom_dispersions_shared_state |
dict
|
Shared states containing MoM dispersions. |
round_idx |
int
|
The updated round number. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_genewise_dispersions/compute_MoM_dispersions/compute_MoM_dispersions.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
|
compute_rough_dispersions
Module to compute rough dispersions.
ComputeRoughDispersions
Bases: AggRoughDispersion
, LocRoughDispersion
, AggCreateRoughDispersionsSystem
Mixin class to implement the computation of rough dispersions.
Methods:
Name | Description |
---|---|
compute_rough_dispersions |
The method to compute the rough dispersions, that must be used in the main pipeline. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_genewise_dispersions/compute_MoM_dispersions/compute_rough_dispersions.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
|
compute_rough_dispersions(train_data_nodes, aggregation_node, local_states, gram_features_shared_states, round_idx, clean_models, refit_mode=False)
Compute rough dispersions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
train_data_nodes
|
List of TrainDataNode. |
required | |
aggregation_node
|
The aggregation node. |
required | |
local_states
|
Local states. Required to propagate intermediate results. |
required | |
gram_features_shared_states
|
The list of shared states outputed by the compute_size_factors step. They contain a "local_gram_matrix" and a "local_features" fields. |
required | |
round_idx
|
The current round. |
required | |
clean_models
|
Whether to clean the models after the computation. |
required | |
refit_mode
|
bool
|
Whether to run the pipeline in refit mode, after cooks outliers were
replaced. If True, the pipeline will be run on |
False
|
Returns:
Name | Type | Description |
---|---|---|
local_states |
dict
|
Local states. Required to propagate intermediate results. |
rough_dispersion_shared_state |
dict
|
Shared states containing rough dispersions. |
round_idx |
int
|
The updated round number. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_genewise_dispersions/compute_MoM_dispersions/compute_rough_dispersions.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
|
substeps
Module to implement the substeps for the rough dispersions step.
This module contains all these substeps as mixin classes.
AggCreateRoughDispersionsSystem
Mixin to solve the linear system for rough dispersions.
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_genewise_dispersions/compute_MoM_dispersions/substeps.py
create_rough_dispersions_system(shared_states, refit_mode=False)
Solve the linear system in for rough dispersions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
shared_states
|
list
|
List of results (local_gram_matrix, local_features) from training nodes. |
required |
refit_mode
|
bool
|
Whether to run the pipeline in refit mode, after cooks outliers were replaced. If True, there is no need to compute the Gram matrix which was already computed in the compute_size_factors step (default: False). |
False
|
Returns:
Type | Description |
---|---|
dict
|
The global feature vector and the global hat matrix if refit_mode is
|
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_genewise_dispersions/compute_MoM_dispersions/substeps.py
AggMomentsDispersion
Mixin to compute MoM dispersions.
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_genewise_dispersions/compute_MoM_dispersions/substeps.py
209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 |
|
aggregate_moments_dispersions(shared_states)
Compute global moments dispersions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
shared_states
|
list
|
List of results (local_inverse_size_mean, local_counts_mean, local_squared_squared_mean, local_n_obs, rough_dispersions) from training nodes. |
required |
Returns:
Type | Description |
---|---|
dict
|
Global moments dispersions, the mask of all zero genes, the total number of samples (used to set max_disp and lr), and the total normed counts mean (used in the independent filtering step). |
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_genewise_dispersions/compute_MoM_dispersions/substeps.py
216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 |
|
AggRoughDispersion
Mixin to aggregate local rough dispersions.
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_genewise_dispersions/compute_MoM_dispersions/substeps.py
aggregate_rough_dispersions(shared_states)
Aggregate local rough dispersions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
shared_states
|
list
|
List of results (rough_dispersions, n_obs, n_params) from training nodes. |
required |
Returns:
Type | Description |
---|---|
dict
|
Global rough dispersions. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_genewise_dispersions/compute_MoM_dispersions/substeps.py
LocInvSizeMean
Mixin to compute local means of inverse size factors.
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_genewise_dispersions/compute_MoM_dispersions/substeps.py
local_inverse_size_mean(data_from_opener, shared_state=None, refit_mode=False)
Compute local means of inverse size factors, counts, and squared counts.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_from_opener
|
AnnData
|
AnnData returned by the opener. Not used. |
required |
shared_state
|
dict
|
Shared state containing rough dispersions from aggregator. |
None
|
refit_mode
|
bool
|
Whether to run the pipeline in refit mode, after cooks outliers were
replaced. If True, the pipeline will be run on |
False
|
Returns:
Type | Description |
---|---|
dict
|
dictionary containing all quantities required to compute MoM dispersions: local inverse size factor means, counts means, squared counts means, rough dispersions and number of samples. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_genewise_dispersions/compute_MoM_dispersions/substeps.py
LocRoughDispersion
Mixin to compute local rough dispersions.
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_genewise_dispersions/compute_MoM_dispersions/substeps.py
local_rough_dispersions(data_from_opener, shared_state, refit_mode=False)
Compute local rough dispersions, and save the global gram matrix.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_from_opener
|
AnnData
|
AnnData returned by the opener. Not used. |
required |
shared_state
|
dict
|
Shared state containing
- the gram matrix, if refit_mode is |
required |
refit_mode
|
bool
|
Whether to run the pipeline in refit mode, after cooks outliers were
replaced. If True, the pipeline will be run on |
False
|
Returns:
Type | Description |
---|---|
dict
|
Dictionary containing local rough dispersions, number of samples and number of parameters (i.e. number of columns in the design matrix). |
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_genewise_dispersions/compute_MoM_dispersions/substeps.py
compute_genewise_dispersions
Main module to compute genewise dispersions.
ComputeGenewiseDispersions
Bases: ComputeDispersionsGridSearch
, ComputeMoMDispersions
, LocLinMu
, GetNumReplicates
, ComputeLFC
, LocSetMuHat
Mixin class to implement the computation of both genewise and MAP dispersions.
The switch between genewise and MAP dispersions is done by setting the fit_mode
argument in the fit_dispersions
to either "MLE" or "MAP".
Methods:
Name | Description |
---|---|
fit_gene_wise_dispersions |
A method to fit gene-wise dispersions using a grid search. Performs four steps: 1. Compute the first dispersions estimates using a method of moments (MoM) approach. 2. Compute the number of replicates for each combination of factors. This step is necessary to compute the mean estimate in one case, and in downstream steps (cooks distance, etc). 3. Compute an estimate of the mean from these dispersions. 4. Fit the dispersions using a grid search. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_genewise_dispersions/compute_genewise_dispersions.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 |
|
fit_genewise_dispersions(train_data_nodes, aggregation_node, local_states, gram_features_shared_states, round_idx, clean_models, refit_mode=False)
Fit the gene-wise dispersions.
Performs four steps: 1. Compute the first dispersions estimates using a method of moments (MoM) approach. 2. Compute the number of replicates for each combination of factors. This step is necessary to compute the mean estimate in one case, and in downstream steps (cooks distance, etc). 3. Compute an estimate of the mean from these dispersions. 4. Fit the dispersions using a grid search.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
train_data_nodes
|
List of TrainDataNode. |
required | |
aggregation_node
|
The aggregation node. |
required | |
local_states
|
Local states. Required to propagate intermediate results. |
required | |
gram_features_shared_states
|
The list of shared states outputed by the compute_size_factors step. They contain a "local_gram_matrix" and a "local_features" fields. |
required | |
round_idx
|
The current round. |
required | |
clean_models
|
Whether to clean the models after the computation. |
required | |
refit_mode
|
bool
|
Whether to run the pipeline in refit mode, after cooks outliers were
replaced. If True, the pipeline will be run on |
False
|
Returns:
Name | Type | Description |
---|---|---|
local_states |
dict
|
Local states. Required to propagate intermediate results. |
shared_state |
dict or list[dict]
|
A dictionary containing: - "genewise_dispersions": The MLE dispersions, to be stored locally at - "lower_log_bounds": log lower bounds for the grid search (only used in internal loop), - "upper_log_bounds": log upper bounds for the grid search (only used in internal loop). |
round_idx |
int
|
The updated round index. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_genewise_dispersions/compute_genewise_dispersions.py
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 |
|
get_num_replicates
Module containing the mixin class to compute the number of replicates.
get_num_replicates
GetNumReplicates
Bases: LocGetDesignMatrixLevels
, AggGetCountsLvlForCells
, LocFinalizeCellCounts
Mixin class to get the number of replicates for each combination of factors.
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_genewise_dispersions/get_num_replicates/get_num_replicates.py
get_num_replicates(train_data_nodes, aggregation_node, local_states, round_idx, clean_models)
Compute the number of replicates for each combination of factors.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
train_data_nodes
|
List of TrainDataNode. |
required | |
aggregation_node
|
The aggregation node. |
required | |
local_states
|
Local states. Required to propagate intermediate results. |
required | |
round_idx
|
Index of the current round. |
required | |
clean_models
|
Whether to clean the models after the computation. |
required |
Returns:
Name | Type | Description |
---|---|---|
local_states |
dict
|
Local states, to store the number of replicates and cell level codes. |
round_idx |
int
|
The updated round index. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_genewise_dispersions/get_num_replicates/get_num_replicates.py
substeps
AggGetCountsLvlForCells
Mixin that aggregate the counts of the design matrix values.
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_genewise_dispersions/get_num_replicates/substeps.py
agg_get_counts_lvl_for_cells(shared_states)
Aggregate the counts of the design matrix values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
shared_states
|
list(dict)
|
List of shared states with the following key: - unique_counts: unique values and counts of the local design matrix |
required |
Returns:
Type | Description |
---|---|
dict
|
Dictionary with keys labeling the different values taken by the overall design matrix. Each values of the dictionary contains the sum of the counts of the corresponding design matrix value and the level. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_genewise_dispersions/get_num_replicates/substeps.py
LocFinalizeCellCounts
Mixin that finalize the cell counts.
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_genewise_dispersions/get_num_replicates/substeps.py
loc_finalize_cell_counts(data_from_opener, shared_state=dict)
Finalize the cell counts.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_from_opener
|
AnnData
|
AnnData returned by the opener. Not used. |
required |
shared_state
|
dict
|
Dictionary with keys labeling the different values taken by the overall design matrix. Each values of the dictionary contains the sum of the counts of the corresponding design matrix value and the level. |
dict
|
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_genewise_dispersions/get_num_replicates/substeps.py
LocGetDesignMatrixLevels
Mixin to get the unique values of the local design matrix.
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_genewise_dispersions/get_num_replicates/substeps.py
loc_get_design_matrix_levels(data_from_opener, shared_state=dict)
Get the values of the local design matrix.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_from_opener
|
AnnData
|
AnnData returned by the opener. Not used. |
required |
shared_state
|
dict
|
Not used. |
dict
|
Returns:
Type | Description |
---|---|
dict
|
Dictionary with the following key: - unique_counts: unique values and counts of the local design matrix |
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_genewise_dispersions/get_num_replicates/substeps.py
substeps
LocLinMu
Mixin to fit linear mu estimates locally.
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_genewise_dispersions/substeps.py
fit_lin_mu(data_from_opener, shared_state, min_mu=0.5, refit_mode=False)
Fit linear mu estimates and store them locally.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_from_opener
|
AnnData
|
Not used. |
required |
shared_state
|
dict
|
Contains values to be saved in local adata: - "MoM_dispersions": MoM dispersions, - "nom_zero": Mask of all zero genes, - "tot_num_samples": Total number of samples. |
required |
min_mu
|
float
|
Lower threshold for fitted means, for numerical stability.
(default: |
0.5
|
refit_mode
|
bool
|
Whether to run the pipeline in refit mode. If True, the pipeline will be run
on |
False
|
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_genewise_dispersions/substeps.py
LocSetMuHat
Mixin to set mu estimates locally.
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_genewise_dispersions/substeps.py
set_mu_hat(data_from_opener, shared_state, refit_mode=False)
Pick between linear and IRLS mu estimates.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_from_opener
|
AnnData
|
Not used. |
required |
shared_state
|
dict
|
Not used. |
required |
refit_mode
|
bool
|
Whether to run on |
False
|
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_genewise_dispersions/substeps.py
compute_lfc
Module which contains the Mixin in charge of fitting log fold changes.
compute_lfc
Module containing the ComputeLFC method.
ComputeLFC
Bases: LocGetGramMatrixAndLogFeatures
, AggCreateBetaInit
, LocSaveLFC
, FedProxQuasiNewton
, FedIRLS
Mixin class to implement the LFC computation algorithm.
The goal of this class is to implement the IRLS algorithm specifically applied to the negative binomial distribution, with fixed dispersion parameter, and in the case where it fails, to catch it with the FedProxQuasiNewton algorithm.
This class also initializes the beta parameters and computes the final hat matrix.
Methods:
Name | Description |
---|---|
compute_lfc |
The main method to compute the log fold changes by running the IRLS algorithm and catching it with the FedProxQuasiNewton algorithm. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_lfc/compute_lfc.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
compute_lfc(train_data_nodes, aggregation_node, local_states, round_idx, clean_models=True, lfc_mode='lfc', refit_mode=False)
Compute the log fold changes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
train_data_nodes
|
list
|
List of TrainDataNode. |
required |
aggregation_node
|
AggregationNode
|
The aggregation node. |
required |
local_states
|
dict
|
Local states. Required to propagate intermediate results. |
required |
round_idx
|
int
|
The current round. |
required |
clean_models
|
bool
|
If True, the models are cleaned. |
True
|
lfc_mode
|
Literal['lfc', 'mu_init']
|
The mode of the IRLS algorithm ("lfc" or "mu_init"). |
'lfc'
|
refit_mode
|
bool
|
Whether to run the pipeline in refit mode, after cooks outliers were
replaced. If True, the pipeline will be run on |
False
|
Returns:
Name | Type | Description |
---|---|---|
local_states |
dict
|
Local states. Required to propagate intermediate results. |
round_idx |
int
|
The updated round index. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_lfc/compute_lfc.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
substeps
Module to implement the substeps for the fitting of log fold changes.
This module contains all these substeps as mixin classes.
AggCreateBetaInit
Mixin to create the beta init.
Methods:
Name | Description |
---|---|
create_beta_init |
A remote method. Creates the beta init (initialization value for the ComputeLFC algorithm) and returns the initialization state for the IRLS algorithm containing this initialization value and other necessary quantities. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_lfc/substeps.py
175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 |
|
create_beta_init(shared_states)
Create the beta init.
It does so either by solving the least squares regression system if the gram matrix is full rank, or by aggregating the log means if the gram matrix is not full rank.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
shared_states
|
list[dict]
|
A list of dictionaries containing the following keys: - gram_full_rank: bool Whether the gram matrix is full rank. - n_non_zero_genes: int The number of non zero genes. - n_params: int The number of parameters. If the gram matrix is full rank, the state contains: - local_log_features: ndarray The local log features, only if the gram matrix is full rank. - global_gram_matrix: ndarray The global gram matrix, only if the gram matrix is full rank. If the gram matrix is not full rank, the state contains: - normed_log_means: ndarray The normed log means, only if the gram matrix is not full rank. - n_obs: int The number of observations, only if the gram matrix is not full rank. |
required |
Returns:
Type | Description |
---|---|
dict[str, Any]
|
A dictionary containing all the necessary info to run IRLS. It contains the following fields: - beta: ndarray The initial beta, of shape (n_non_zero_genes, n_params). - irls_diverged_mask: ndarray A boolean mask indicating if fed avg should be used for a given gene (shape: (n_non_zero_genes,)). Is set to False initially, and will be set to True if the gene has diverged. - irls_mask: ndarray A boolean mask indicating if IRLS should be used for a given gene (shape: (n_non_zero_genes,)). Is set to True initially, and will be set to False if the gene has converged or diverged. - global_nll: ndarray The global_nll of the current beta from the previous beta, of shape (n_non_zero_genes,). - round_number_irls: int The current round number of the IRLS algorithm. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_lfc/substeps.py
187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 |
|
LocGetGramMatrixAndLogFeatures
Mixin accessing the quantities to compute the initial beta of ComputeLFC.
Attributes:
Name | Type | Description |
---|---|---|
local_adata |
AnnData
|
The local AnnData object. |
Methods:
Name | Description |
---|---|
get_gram_matrix_and_log_features |
A remote_data method. Creates the local quantities necessary to compute the initial beta. If the gram matrix is full rank, it shares the features vector and the gram matrix. If the gram matrix is not full rank, it shares the normed log means and the number of observations. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_lfc/substeps.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
get_gram_matrix_and_log_features(data_from_opener, shared_state, lfc_mode, refit_mode=False)
Create the local quantities necessary to compute the initial beta.
To do so, we assume that the local_adata.uns contains the following fields: - n_params: int The number of parameters. - _global_gram_matrix: ndarray The global gram matrix.
From the IRLS mode, we will set the following fields: - _irls_mu_param_name: str The name of the mu parameter, to save at the end of the IRLS run This is None if we do not want to save the mu parameter. - _irls_beta_param_name: str The name of the beta parameter, to save as a varm at the end of the fed irls run This is None if we do not want to save the beta parameter. - _irls_disp_param_name: str The name of the dispersion parameter. - _lfc_mode: str The mode of the IRLS algorithm. This is used to set the previous fields.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_from_opener
|
AnnData
|
Not used. |
required |
shared_state
|
dict
|
Not used, all the necessary info is stored in the local adata. |
required |
lfc_mode
|
Literal['lfc', 'mu_init']
|
The mode of the IRLS algorithm ("lfc", or "mu_init"). |
required |
refit_mode
|
bool
|
Whether to run the pipeline on |
False
|
Returns:
Type | Description |
---|---|
dict
|
The state to share to the server. It always contains the following fields: - gram_full_rank: bool Whether the gram matrix is full rank. - n_non_zero_genes: int The number of non zero genes. - n_params: int The number of parameters. - If the gram matrix is full rank, the state contains: - local_log_features: ndarray The local log features. - global_gram_matrix: ndarray The global gram matrix. - If the gram matrix is not full rank, the state contains: - normed_log_means: ndarray The normed log means. - n_obs: int The number of observations. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_lfc/substeps.py
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
LocSaveLFC
Mixin to create the local quantities to compute the final hat matrix.
Attributes:
Name | Type | Description |
---|---|---|
local_adata |
AnnData
|
The local AnnData object. |
num_jobs |
int
|
The number of cpus to use. |
joblib_verbosity |
int
|
The verbosity of the joblib backend. |
joblib_backend |
str
|
The backend to use for the joblib parallelization. |
irls_batch_size |
int
|
The batch size to use for the IRLS algorithm. |
min_mu |
float
|
The minimum value for the mu parameter. |
Methods:
Name | Description |
---|---|
make_local_final_hat_matrix_summands |
A remote_data method. Creates the local quantities to compute the final hat matrix, which must be computed on all genes. This step is expected to be applied after catching the IRLS method with the fed prox quasi newton method, and takes as an input a shared state from the last iteration of that method. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_lfc/substeps.py
287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 |
|
save_lfc_to_local(data_from_opener, shared_state, refit_mode=False)
Create the local quantities to compute the final hat matrix.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_from_opener
|
AnnData
|
Not used. |
required |
shared_state
|
dict
|
The shared state. The shared state is a dictionary containing the following keys: - beta: ndarray The current beta, of shape (n_non_zero_genes, n_params). - irls_diverged_mask: ndarray A boolean mask indicating if the irsl method has diverged. In that case, these genes are caught with the fed prox newton method. (shape: (n_non_zero_genes,)). - PQN_diverged_mask: ndarray A boolean mask indicating if the fed prox newton method has diverged. These genes are not caught by any method, and the returned beta value is the output of the PQN method, even though it has not converged. |
required |
refit_mode
|
bool
|
Whether to run the pipeline on |
False
|
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_lfc/substeps.py
325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 |
|
utils
Module to implement the utilities of the IRLS algorithm.
Most of these functions have the _batch suffix, which means that they are vectorized to work over batches of genes in the parralel_backend file in the same module.
make_irls_nll_batch(beta, design_matrix, size_factors, dispersions, counts, min_mu=0.5)
Compute the negative binomial log likelihood from LFC estimates.
Used in ComputeLFC to compute the deviance score. This function is vectorized to work over batches of genes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
beta
|
ndarray
|
Current LFC estimate, of shape (batch_size, n_params). |
required |
design_matrix
|
ndarray
|
The design matrix, of shape (n_obs, n_params). |
required |
size_factors
|
ndarray
|
The size factors, of shape (n_obs). |
required |
dispersions
|
ndarray
|
The dispersions, of shape (batch_size). |
required |
counts
|
ndarray
|
The counts, of shape (n_obs,batch_size). |
required |
min_mu
|
float
|
Lower bound on estimated means, to ensure numerical stability.
(default: |
0.5
|
Returns:
Type | Description |
---|---|
ndarray
|
Local negative binomial log-likelihoods, of shape (batch_size). |
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/compute_lfc/utils.py
deseq2_lfc_dispersions
DESeq2LFCDispersions
Bases: ComputeGenewiseDispersions
, ComputeDispersionPrior
, ComputeMAPDispersions
, ComputeLFC
Mixin class to compute the log fold change and the dispersions with DESeq2.
This class encapsulates the steps to compute the log fold change and the dispersions from a given count matrix and a design matrix.
Methods:
Name | Description |
---|---|
run_deseq2_lfc_dispersions |
The method to compute the log fold change and the dispersions. It starts from the design matrix and the count matrix. It returns the shared states by the local nodes after the computation of Cook's distances. It is meant to be run two times in the main pipeline if Cook's refitting is applied/ |
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/deseq2_lfc_dispersions.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 |
|
run_deseq2_lfc_dispersions(train_data_nodes, aggregation_node, local_states, gram_features_shared_states, round_idx, clean_models, refit_mode=False)
Run the DESeq2 pipeline to compute the log fold change and the dispersions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
train_data_nodes
|
List of TrainDataNode. |
required | |
aggregation_node
|
The aggregation node. |
required | |
local_states
|
Local states. Required to propagate intermediate results. |
required | |
gram_features_shared_states
|
Output of the "compute_size_factor step" if refit_mode is False. Output of the "replace_outliers" step if refit_mode is True. In both cases, contains a "local_features" key with the features vector to input to compute_genewise_dispersion. In the non refit mode case, it also contains a "local_gram_matrix" key with the local gram matrix. |
required | |
round_idx
|
Index of the current round. |
required | |
clean_models
|
Whether to clean the models after the computation. |
required | |
refit_mode
|
Whether we are refittinh Cooks outliers or not. |
False
|
Returns:
Name | Type | Description |
---|---|---|
local_states |
dict
|
Local states updated with the results of the DESeq2 pipeline. |
round_idx |
int
|
The updated round index. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_lfc_dispersions/deseq2_lfc_dispersions.py
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 |
|