Federated IRLS
Module which contains the Mixin in charge of performing FedIRLS.
fed_irls
Module containing the ComputeLFC method.
FedIRLS
Bases: LocMakeIRLSSummands
, AggMakeIRLSUpdate
Mixin class to implement the LFC computation algorithm.
The goal of this class is to implement the IRLS algorithm specifically applied to the negative binomial distribution, with fixed dispersion parameter (only the mean parameter, expressed as the exponential of the log fold changes times the design matrix, is estimated). This algorithm is caught with another method on the genes on which it fails.
To the best of our knowledge, there is no explicit implementation of IRLS for the negative binomial in a federated setting. However, the steps of IRLS are akin to the ones of a Newton-Raphson algorithm, with the difference that the Hessian matrix is replaced by the Fisher information matrix.
Let us recall the steps of the IRLS algorithm for one gene (this method then
implements these iterations for all genes in parallell).
We want to estimate the log fold changes :math:\beta
from the counts :math:y
and the design matrix :math:X
. The negative binomial likelihood is given by:
.. math:: \mathcal{L}(\beta) = \sum_{i=1}^n \left( y_i \log(\mu_i) - (y_i + \alpha^{-1}) \log(\mu_i + \alpha^{-1}) \right) + \text{const}(y, \alpha)
where :math:\mu_i = \gamma_i\exp(X_i \cdot \beta)
and :math:\alpha
is
the dispersion parameter.
Given an iterate :math:\beta_k
, the IRLS algorithm computes the next iterate
:math:\beta_{k+1}
as follows.
First, we compute the mean parameter :math:\mu_k
from the current iterate, using
the formula of the log fold changes:
.. math:: (\mu_{k})_i = \gamma_i \exp(X_i \cdot \beta_k)
In practice, we trim the values of :math:\mu_k
to a minimum value to ensure
numerical stability.
Then, we compute the weight matrix :math:W_k
from the current iterate
:math:\beta_k
, which is a diagonal matrix with diagonal elements:
.. math:: (W_k){ii} = \frac{\mu}}{1 + \mu_{k,i} \alpha
where :math:\alpha
is the dispersion parameter.
This weight matrix is used to compute both the estimated variance (or hat matrix)
and the feature vector :math:z_k
:
.. math:: z_k = \log\left(\frac{\mu_k}{\gamma}\right) + \frac{y - \mu_k}{\mu_k}
The estimated variance is given by:
.. math:: H_k = X^T W_k X
The update step is then given by:
.. math:: \beta_{k+1} = (H_k)^{-1} X^T W_k z_k
This is akin to the Newton-Raphson algorithm, with the Hessian matrix replaced by the Fisher information, and the gradient replaced by the feature vector.
Methods:
Name | Description |
---|---|
run_fed_irls |
Run the IRLS algorithm. |
Source code in fedpydeseq2/core/fed_algorithms/fed_irls/fed_irls.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 |
|
run_fed_irls(train_data_nodes, aggregation_node, local_states, input_shared_state, round_idx, clean_models=True, refit_mode=False)
Run the IRLS algorithm.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
train_data_nodes
|
list
|
List of TrainDataNode. |
required |
aggregation_node
|
AggregationNode
|
The aggregation node. |
required |
local_states
|
dict
|
Local states. Required to propagate intermediate results. |
required |
input_shared_state
|
dict
|
Shared state with the following keys: - beta: ndarray The current beta, of shape (n_non_zero_genes, n_params). - irls_diverged_mask: ndarray A boolean mask indicating if fed avg should be used for a given gene (shape: (n_non_zero_genes,)). - irls_mask: ndarray A boolean mask indicating if IRLS should be used for a given gene (shape: (n_non_zero_genes,)). - global_nll: ndarray The global_nll of the current beta from the previous beta, of shape (n_non_zero_genes,). - round_number_irls: int The current round number of the IRLS algorithm. |
required |
round_idx
|
int
|
The current round. |
required |
clean_models
|
bool
|
If True, the models are cleaned. |
True
|
refit_mode
|
bool
|
Whether to run on |
False
|
Returns:
Name | Type | Description |
---|---|---|
local_states |
dict
|
Local states. Required to propagate intermediate results. |
global_irls_summands_nlls_shared_state |
dict
|
Shared states containing the final IRLS results. It contains nothing for now. - beta: ndarray The current beta, of shape (n_non_zero_genes, n_params). - irls_diverged_mask: ndarray A boolean mask indicating if fed avg should be used for a given gene (shape: (n_non_zero_genes,)). - irls_mask: ndarray A boolean mask indicating if IRLS should be used for a given gene (shape: (n_non_zero_genes,)). - global_nll: ndarray The global_nll of the current beta from the previous beta, of shape (n_non_zero_genes,). - round_number_irls: int The current round number of the IRLS algorithm. |
round_idx |
int
|
The updated round index. |
Source code in fedpydeseq2/core/fed_algorithms/fed_irls/fed_irls.py
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 |
|
substeps
Module to implement the substeps for the fitting of log fold changes.
This module contains all these substeps as mixin classes.
AggMakeIRLSUpdate
Mixin class to aggregate IRLS summands.
Please refer to the method make_local_irls_summands_and_nlls for more.
Attributes:
Name | Type | Description |
---|---|---|
num_jobs |
int
|
The number of cpus to use. |
joblib_verbosity |
int
|
The verbosity of the joblib backend. |
joblib_backend |
str
|
The backend to use for the joblib parallelization. |
irls_batch_size |
int
|
The batch size to use for the IRLS algorithm. |
max_beta |
float
|
The maximum value for the beta parameter. |
beta_tol |
float
|
The tolerance for the beta parameter. |
irls_num_iter |
int
|
The number of iterations for the IRLS algorithm. |
Methods:
Name | Description |
---|---|
make_global_irls_update |
A remote method. Aggregates the local quantities to create the global IRLS update. It also updates the masks indicating which genes have diverged or converged according to the deviance. |
Source code in fedpydeseq2/core/fed_algorithms/fed_irls/substeps.py
204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 |
|
make_global_irls_update(shared_states)
Make the summands for the IRLS algorithm.
The role of this function is twofold.
1) It computes the global_nll and updates the masks according to the deviance, for the beta values that have been computed in the previous round.
2) It aggregates the local hat matrix and features to solve the linear system and get the new beta values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
shared_states
|
list[dict]
|
A list of dictionaries containing the following keys: - local_hat_matrix: ndarray The local hat matrix, of shape (n_irls_genes, n_params, n_params). n_irsl_genes is the number of genes that are still active (non zero gene names on the irls_mask). - local_features: ndarray The local features, of shape (n_irls_genes, n_params). - irls_diverged_mask: ndarray A boolean mask indicating if fed avg should be used for a given gene (shape: (n_non_zero_genes,)). - irls_mask: ndarray A boolean mask indicating if IRLS should be used for a given gene (shape: (n_non_zero_genes,)). - global_nll: ndarray The global_nll of the current beta from the previous beta, of shape (n_non_zero_genes,). - round_number_irls: int The current round number of the IRLS algorithm. |
required |
Returns:
Type | Description |
---|---|
dict[str, Any]
|
A dictionary containing all the necessary info to run IRLS. It contains the following fields: - beta: ndarray The log fold changes, of shape (n_non_zero_genes, n_params). - irls_diverged_mask: ndarray A boolean mask indicating if fed avg should be used for a given gene (shape: (n_non_zero_genes,)). - irls_mask: ndarray A boolean mask indicating if IRLS should be used for a given gene (shape: (n_non_zero_genes,)). - global_nll: ndarray The global_nll of the current beta from the previous beta, of shape (n_non_zero_genes,). - round_number_irls: int The current round number of the IRLS algorithm. |
Source code in fedpydeseq2/core/fed_algorithms/fed_irls/substeps.py
243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 |
|
LocMakeIRLSSummands
Mixin to make the summands for the IRLS algorithm.
Attributes:
Name | Type | Description |
---|---|---|
local_adata |
AnnData
|
The local AnnData object. |
num_jobs |
int
|
The number of cpus to use. |
joblib_verbosity |
int
|
The verbosity of the joblib backend. |
joblib_backend |
str
|
The backend to use for the joblib parallelization. |
irls_batch_size |
int
|
The batch size to use for the IRLS algorithm. |
min_mu |
float
|
The minimum value for the mu parameter. |
irls_num_iter |
int
|
The number of iterations for the IRLS algorithm. |
Methods:
Name | Description |
---|---|
make_local_irls_summands_and_nlls |
A remote_data method. Makes the summands for the IRLS algorithm. It also passes on the necessary global quantities. |
Source code in fedpydeseq2/core/fed_algorithms/fed_irls/substeps.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 |
|
make_local_irls_summands_and_nlls(data_from_opener, shared_state, refit_mode=False)
Make the summands for the IRLS algorithm.
This functions does two main operations:
1) It computes the summands for the beta update. 2) It computes the local quantities to compute the global_nll of the current beta
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_from_opener
|
AnnData
|
Not used. |
required |
shared_state
|
dict
|
A dictionary containing the following keys: - beta: ndarray The current beta, of shape (n_non_zero_genes, n_params). - irls_diverged_mask: ndarray A boolean mask indicating if fed avg should be used for a given gene (shape: (n_non_zero_genes,)). - irls_mask: ndarray A boolean mask indicating if IRLS should be used for a given gene (shape: (n_non_zero_genes,)). - global_nll: ndarray The global_nll of the current beta from the previous beta, of shape (n_non_zero_genes,). - round_number_irls: int The current round number of the IRLS algorithm. |
required |
refit_mode
|
bool
|
Whether to run on |
False
|
Returns:
Type | Description |
---|---|
dict
|
The state to share to the server. It contains the following fields: - beta: ndarray The current beta, of shape (n_non_zero_genes, n_params). - local_nll: ndarray The local nll of the current beta, of shape (n_irls_genes,). - local_hat_matrix: ndarray The local hat matrix, of shape (n_irls_genes, n_params, n_params). n_irsl_genes is the number of genes that are still active (non zero gene names on the irls_mask). - local_features: ndarray The local features, of shape (n_irls_genes, n_params). - irls_diverged_mask: ndarray A boolean mask indicating if fed avg should be used for a given gene (shape: (n_non_zero_genes,)). - irls_mask: ndarray A boolean mask indicating if IRLS should be used for a given gene (shape: (n_non_zero_genes,)). - global_nll: ndarray The global_nll of the current beta of shape (n_non_zero_genes,). This parameter is simply passed to the next shared state - round_number_irls: int The current round number of the IRLS algorithm. This round number is not updated here. |
Source code in fedpydeseq2/core/fed_algorithms/fed_irls/substeps.py
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 |
|
utils
Module to implement the utilities of the IRLS algorithm.
Most of these functions have the _batch suffix, which means that they are vectorized to work over batches of genes in the parralel_backend file in the same module.
make_irls_update_summands_and_nll_batch(design_matrix, size_factors, beta, dispersions, counts, min_mu)
Make the summands for the IRLS algorithm for a given set of genes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
design_matrix
|
ndarray
|
The design matrix, of shape (n_obs, n_params). |
required |
size_factors
|
ndarray
|
The size factors, of shape (n_obs). |
required |
beta
|
ndarray
|
The log fold change matrix, of shape (batch_size, n_params). |
required |
dispersions
|
ndarray
|
The dispersions, of shape (batch_size). |
required |
counts
|
ndarray
|
The counts, of shape (n_obs,batch_size). |
required |
min_mu
|
float
|
Lower bound on estimated means, to ensure numerical stability. |
required |
Returns:
Name | Type | Description |
---|---|---|
H |
ndarray
|
The H matrix, of shape (batch_size, n_params, n_params). |
y |
ndarray
|
The y vector, of shape (batch_size, n_params). |
nll |
ndarray
|
The negative binomial negative log-likelihood, of shape (batch_size). |