Computing statistics and p-values
Module containing all the necessary steps to perform statistical analysis.
compute_padj
Module containing the Mixin to compute adjusted p-values.
compute_padj
ComputeAdjustedPValues
Bases: IndependentFiltering
, PValueAdjustment
Mixin class to implement the computation of adjusted p-values.
Attributes:
Name | Type | Description |
---|---|---|
independent_filter |
bool
|
A boolean flag to indicate whether to use independent filtering or not. |
Methods:
Name | Description |
---|---|
compute_adjusted_p_values |
A method to compute adjusted p-values. Runs independent filtering if self.independent_filter is True. Runs BH method otherwise. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_stats/compute_padj/compute_padj.py
compute_adjusted_p_values(train_data_nodes, aggregation_node, local_states, wald_test_shared_state, round_idx, clean_models)
Compute adjusted p-values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
train_data_nodes
|
List of TrainDataNode. |
required | |
aggregation_node
|
The aggregation node. |
required | |
local_states
|
Local states. Required to propagate intermediate results. |
required | |
wald_test_shared_state
|
Shared states containing the Wald test results. |
required | |
round_idx
|
The current round. |
required | |
clean_models
|
If True, the models are cleaned. |
required |
Returns:
Name | Type | Description |
---|---|---|
local_states |
dict
|
Local states. Required to propagate intermediate results. |
round_idx |
int
|
The updated round index. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_stats/compute_padj/compute_padj.py
substeps
IndependentFiltering
Mixin class implementing independent filtering.
Attributes:
Name | Type | Description |
---|---|---|
local_adata |
AnnData
|
Local AnnData object. |
alpha |
float
|
Significance level. |
Methods:
Name | Description |
---|---|
run_independent_filtering |
Run independent filtering on the p-values trend |
Source code in fedpydeseq2/core/deseq2_core/deseq2_stats/compute_padj/substeps.py
run_independent_filtering(data_from_opener, shared_state)
Run independent filtering on the p-values trend.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_from_opener
|
AnnData
|
Not used. |
required |
shared_state
|
dict
|
Shared state containing the results of the wald tests, namely - "p_values" : p-values - "wald_statistics" : Wald statistics - "wald_se" : Wald standard errors |
required |
Source code in fedpydeseq2/core/deseq2_core/deseq2_stats/compute_padj/substeps.py
PValueAdjustment
Mixin class implementing p-value adjustment.
Attributes:
Name | Type | Description |
---|---|---|
local_adata |
AnnData
|
Local AnnData object. |
Methods:
Name | Description |
---|---|
run_p_value_adjustment |
Run p-value adjustment on the p-values trend using the Benjamini-Hochberg method. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_stats/compute_padj/substeps.py
run_p_value_adjustment(data_from_opener, shared_state)
Run p-value adjustment on the p-values trend using the BH method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_from_opener
|
AnnData
|
Not used. |
required |
shared_state
|
dict
|
Shared state containing the results of the Wald tests, namely - "p_values" : p-values, as a numpy array - "wald_statistics" : Wald statistics - "wald_se" : Wald standard errors |
required |
Source code in fedpydeseq2/core/deseq2_core/deseq2_stats/compute_padj/substeps.py
cooks_filtering
Substep to perform cooks filtering.
cooks_filtering
Module to implement the base Mixin class for Cooks filtering.
CooksFiltering
Bases: LocFindCooksOutliers
, AggregateCooksOutliers
, LocGetMaxCooks
, AggMaxCooks
, LocGetMaxCooksCounts
, AggMaxCooksCounts
, LocCountNumberSamplesAbove
, AggCooksFiltering
A class to perform Cooks filtering of p-values.
Methods:
Name | Description |
---|---|
cooks_filtering |
The method to find Cooks outliers. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_stats/cooks_filtering/cooks_filtering.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 |
|
cooks_filtering(train_data_nodes, aggregation_node, local_states, wald_test_shared_state, round_idx, clean_models)
Perform Cooks filtering.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
train_data_nodes
|
List of TrainDataNode. |
required | |
aggregation_node
|
The aggregation node. |
required | |
local_states
|
Local states. Required to propagate intermediate results. |
required | |
wald_test_shared_state
|
dict
|
A shared state containing the Wald test results. These results are the following fields: - "p_values": p-values of the Wald test. - "wald_statistics" : Wald statistics. - "wald_se" : Wald standard errors. |
required |
round_idx
|
Index of the current round. |
required | |
clean_models
|
Whether to clean the models after the computation. |
required |
Returns:
Name | Type | Description |
---|---|---|
local_states |
dict
|
Local states. The new local state contains Cook's distances. |
shared_state |
dict
|
A new shared state containing the following fields: - "p_values": p-values of the Wald test, updated to be nan for Cook's outliers. - "wald_statistics" : Wald statistics, for compatibility. - "wald_se" : Wald standard errors, for compatibility. |
round_idx |
int
|
The updated round index. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_stats/cooks_filtering/cooks_filtering.py
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 |
|
substeps
AggCooksFiltering
Mixin class to aggregate the cooks filtering.
Methods:
Name | Description |
---|---|
agg_cooks_filtering |
Aggregate the local number of samples above. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_stats/cooks_filtering/substeps.py
agg_cooks_filtering(shared_states)
Aggregate the local number of samples above to get cooks filtered genes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
shared_states
|
list[dict]
|
List of shared states from the local step with the following keys: - local_num_samples_above: np.ndarray of shape (n_genes,) The local number of samples above the max cooks gene counts. - cooks_outliers: np.ndarray of shape (n_genes,) It is a boolean array indicating whether a gene is a cooks outlier. - p_values: np.ndarray of shape (n_genes,) The p-values from the Wald test. - wald_statistics: np.ndarray of shape (n_genes,) The Wald statistics from the Wald test. - wald_se: np.ndarray of shape (n_genes,) The Wald standard errors from the Wald test. |
required |
Returns:
Type | Description |
---|---|
dict
|
A shared state with the following fields: - p_values: np.ndarray of shape (n_genes,) The p-values from the Wald test with nan for the cooks outliers. - wald_statistics: np.ndarray of shape (n_genes,) The Wald statistics. - wald_se: np.ndarray of shape (n_genes,) The Wald standard errors. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_stats/cooks_filtering/substeps.py
AggMaxCooks
Mixin class to aggregate the max cooks distances.
Methods:
Name | Description |
---|---|
agg_max_cooks |
Aggregate the local max cooks distances. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_stats/cooks_filtering/substeps.py
agg_max_cooks(shared_states)
Aggregate the local max cooks.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
shared_states
|
list[dict]
|
List of shared states from the local step with the following keys: - local_max_cooks: np.ndarray of shape (n_genes,) The local maximum cooks distance for the outliers. - cooks_outliers: np.ndarray of shape (n_genes,) It is a boolean array indicating whether a gene is a cooks outlier. |
required |
Returns:
Name | Type | Description |
---|---|---|
shared_state |
dict
|
Aggregated max cooks. It is a dictionary with the following fields: - max_cooks: np.ndarray of shape (n_cooks_genes,) The maximum cooks distance for the outliers in the aggregated dataset. - cooks_outliers: np.ndarray of shape (n_genes,) It is a boolean array indicating whether a gene is a cooks outlier. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_stats/cooks_filtering/substeps.py
AggMaxCooksCounts
Mixin class to aggregate the max cooks gene counts.
Methods:
Name | Description |
---|---|
agg_max_cooks_gene_counts |
Aggregate the local max cooks gene counts. The goal is to have the gene counts corresponding to the maximum cooks distance for each gene across all datasets. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_stats/cooks_filtering/substeps.py
agg_max_cooks_gene_counts(shared_states)
Aggregate the local max cooks gene counts.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
shared_states
|
list[dict]
|
List of shared states from the local step with the following keys: - local_max_cooks_gene_counts: np.ndarray of shape (n_genes,) The local maximum cooks gene counts for the outliers. - cooks_outliers: np.ndarray of shape (n_genes,) It is a boolean array indicating whether a gene is a cooks outlier. |
required |
Returns:
Name | Type | Description |
---|---|---|
shared_state |
dict
|
A shared state with the following fields: - max_cooks_gene_counts: np.ndarray of shape (n_cooks_genes,) For each gene, the array contains the gene counts corresponding to the maximum cooks distance for that gene across all datasets. - cooks_outliers: np.ndarray of shape (n_genes,) It is a boolean array indicating whether a gene is a cooks outlier. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_stats/cooks_filtering/substeps.py
AggregateCooksOutliers
Mixin class to aggregate the cooks outliers.
Methods:
Name | Description |
---|---|
agg_cooks_outliers |
Aggregate the local cooks outliers. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_stats/cooks_filtering/substeps.py
agg_cooks_outliers(shared_states)
Aggregate the local cooks outliers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
shared_states
|
list[dict]
|
List of shared states from the local step with the following keys: - local_cooks_outliers: np.ndarray of shape (n_genes,) - cooks_cutoff: float |
required |
Returns:
Name | Type | Description |
---|---|---|
shared_state |
dict
|
Aggregated cooks outliers. It is a dictionary with the following fields: - cooks_outliers: np.ndarray of shape (n_genes,) It is a boolean array indicating whether a gene is a cooks outlier in any of the local datasets - cooks_cutoff: float The cutoff used to define the fact that a gene is a cooks outlier. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_stats/cooks_filtering/substeps.py
LocCountNumberSamplesAbove
Mixin class to count the number of samples above the max cooks gene counts.
Attributes:
Name | Type | Description |
---|---|---|
local_adata |
AnnData
|
|
Methods:
Name | Description |
---|---|
count_local_number_samples_above |
Count the number of samples above the max cooks gene counts. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_stats/cooks_filtering/substeps.py
count_local_number_samples_above(data_from_opener, shared_state)
Count the number of samples above the max cooks gene counts.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_from_opener
|
AnnData
|
Not used. |
required |
shared_state
|
dict
|
Shared state from the previous step with the following keys: - cooks_outliers: np.ndarray of shape (n_genes,) It is a boolean array indicating whether a gene is a cooks outlier. - max_cooks_gene_counts: np.ndarray of shape (n_genes,) For each gene, the array contains the gene counts corresponding to the maximum cooks distance for that gene across all datasets. |
required |
Returns:
Name | Type | Description |
---|---|---|
shared_state |
dict
|
A shared state with the following fields: - local_num_samples_above: np.ndarray of shape (n_cooks_genes,) For each gene, the array contains the number of samples above the maximum cooks gene counts. - cooks_outliers: np.ndarray of shape (n_genes,) It is a boolean array indicating whether a gene is a cooks outlier. - p_values: np.ndarray of shape (n_genes,) The p-values from the Wald test. - wald_statistic: np.ndarray of shape (n_genes,) The Wald statistics from the Wald test. - wald_se: np.ndarray of shape (n_genes,) The Wald standard errors from the Wald test. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_stats/cooks_filtering/substeps.py
LocFindCooksOutliers
Mixin class to find the local cooks outliers.
Attributes:
Name | Type | Description |
---|---|---|
local_adata |
AnnData
|
Local AnnData object. Is expected to have a "tot_num_samples" key in uns. |
refit_cooks |
bool
|
Whether to refit the cooks outliers. |
Methods:
Name | Description |
---|---|
find_local_cooks_outliers |
Find the local cooks outliers. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_stats/cooks_filtering/substeps.py
find_local_cooks_outliers(data_from_opener, shared_state)
Find the local cooks outliers.
This method is expected to run on the results of the Wald tests.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_from_opener
|
AnnData
|
Not used. |
required |
shared_state
|
dict
|
Shared state from the previous step with the following keys: - p_values: np.ndarray of shape (n_genes,) - wald_statistics: np.ndarray of shape (n_genes,) - wald_se: np.ndarray of shape (n_genes,) |
required |
Returns:
Name | Type | Description |
---|---|---|
shared_state |
dict
|
A shared state with the following fields: - local_cooks_outliers: np.ndarray of shape (n_genes,) It is a boolean array indicating whether a gene is a cooks outlier. - cooks_cutoff: float The cutoff used to define the fact that a gene is a cooks outlier. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_stats/cooks_filtering/substeps.py
LocGetMaxCooks
Mixin class to get the maximum cooks distance for the outliers.
Attributes:
Name | Type | Description |
---|---|---|
local_adata |
AnnData
|
Local AnnData object. |
Methods:
Name | Description |
---|---|
get_max_local_cooks |
Get the maximum cooks distance for the outliers. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_stats/cooks_filtering/substeps.py
154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 |
|
get_max_local_cooks(data_from_opener, shared_state)
Get the maximum cooks distance for the outliers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_from_opener
|
AnnData
|
Not used. |
required |
shared_state
|
dict
|
Shared state from the previous step with the following keys: - cooks_outliers: np.ndarray of shape (n_genes,) It is a boolean array indicating whether a gene is a cooks outlier. - cooks_cutoff: float |
required |
Returns:
Name | Type | Description |
---|---|---|
shared_state |
dict
|
A shared state with the following fields: - local_max_cooks: np.ndarray of shape (n_cooks_genes,) The maximum cooks distance for the outliers in the local dataset. - cooks_outliers: np.ndarray of shape (n_genes,) It is a boolean array indicating whether a gene is a cooks outlier. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_stats/cooks_filtering/substeps.py
LocGetMaxCooksCounts
Mixin class to get the maximum cooks counts for the outliers.
Attributes:
Name | Type | Description |
---|---|---|
local_adata |
AnnData
|
Local AnnData object. |
Methods:
Name | Description |
---|---|
get_max_local_cooks_gene_counts |
Get the maximum cooks counts for the outliers. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_stats/cooks_filtering/substeps.py
get_max_local_cooks_gene_counts(data_from_opener, shared_state)
Get the maximum cooks counts for the outliers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_from_opener
|
AnnData
|
Not used. |
required |
shared_state
|
dict
|
Shared state from the previous step with the following keys: - max_cooks: np.ndarray of shape (n_cooks_genes,) The maximum cooks distance for the outliers. - cooks_outliers: np.ndarray of shape (n_genes,) It is a boolean array indicating whether a gene is a cooks outlier. |
required |
Returns:
Name | Type | Description |
---|---|---|
shared_state |
dict
|
A shared state with the following fields: - local_max_cooks_gene_counts: np.ndarray of shape (n_cooks_genes,) For each gene, the array contains the gene counts corresponding to the maximum cooks distance for that gene if the maximum cooks distance in the local dataset is equal to the maximum cooks distance in the aggregated dataset, and nan otherwise. - cooks_outliers: np.ndarray of shape (n_genes,) It is a boolean array indicating whether a gene is a cooks outlier. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_stats/cooks_filtering/substeps.py
deseq2_stats
DESeq2Stats
Bases: RunWaldTests
, CooksFiltering
, ComputeAdjustedPValues
Mixin class to compute statistics with DESeq2.
This class encapsulates the Wald tests, the Cooks filtering and the computation of adjusted p-values.
Methods:
Name | Description |
---|---|
run_deseq2_stats |
Run the DESeq2 statistics pipeline. Performs Wald tests, Cook's filtering and computes adjusted p-values. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_stats/deseq2_stats.py
run_deseq2_stats(train_data_nodes, aggregation_node, local_states, round_idx, clean_models)
Run the DESeq2 statistics pipeline.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
train_data_nodes
|
List of TrainDataNode. |
required | |
aggregation_node
|
The aggregation node. |
required | |
local_states
|
Local states. Required to propagate intermediate results. |
required | |
round_idx
|
Index of the current round. |
required | |
clean_models
|
Whether to clean the models after the computation. |
required |
Returns:
Name | Type | Description |
---|---|---|
local_states |
dict
|
Local states. |
round_idx |
int
|
The updated round index. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_stats/deseq2_stats.py
wald_tests
substeps
AggRunWaldTests
Mixin to run Wald tests.
Source code in fedpydeseq2/core/deseq2_core/deseq2_stats/wald_tests/substeps.py
agg_run_wald_tests(shared_states)
Run the Wald tests.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
shared_states
|
list
|
List of shared states containing: - local_H_matrix: np.ndarray The local H matrix. - LFC: np.ndarray The log fold changes, in natural log scale. - contrast_vector: np.ndarray The contrast vector. |
required |
Returns:
Type | Description |
---|---|
dict
|
Contains: - p_values: np.ndarray The (unadjusted) p-values (n_genes,). - wald_statistics: np.ndarray The Wald statistics (n_genes,). - wald_se: np.ndarray The standard errors of the Wald statistics (n_genes,). |
Source code in fedpydeseq2/core/deseq2_core/deseq2_stats/wald_tests/substeps.py
LocBuildContrastVectorHMatrix
Mixin to get compute contrast vectors and local H matrices.
Source code in fedpydeseq2/core/deseq2_core/deseq2_stats/wald_tests/substeps.py
compute_contrast_vector_and_H_matrix(data_from_opener, shared_state)
Build the contrast vector and the local H matrices.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_from_opener
|
AnnData
|
AnnData returned by the opener. Not used. |
required |
shared_state
|
dict
|
Not used. |
required |
Returns:
Type | Description |
---|---|
dict
|
Contains: - local_H_matrix: np.ndarray The local H matrix. - LFC: np.ndarray The log fold changes, in natural log scale. - contrast_vector: np.ndarray The contrast vector. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_stats/wald_tests/substeps.py
wald_tests
RunWaldTests
Bases: LocBuildContrastVectorHMatrix
, AggRunWaldTests
Mixin class to implement the computation of the Wald tests.
Methods:
Name | Description |
---|---|
run_wald_tests |
The method to compute the Wald tests. |
Source code in fedpydeseq2/core/deseq2_core/deseq2_stats/wald_tests/wald_tests.py
run_wald_tests(train_data_nodes, aggregation_node, local_states, round_idx, clean_models)
Compute the Wald tests.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
train_data_nodes
|
List of TrainDataNode. |
required | |
aggregation_node
|
The aggregation node. |
required | |
local_states
|
Local states. Required to propagate intermediate results. |
required | |
round_idx
|
Index of the current round. |
required | |
clean_models
|
Whether to clean the models after the computation. |
required |