Skip to content

OliverHennhoefer/awesome-multiple-hypothesis-testing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 

Repository files navigation

Awesome Multiple Hypothesis Testing Awesome

Extensive collection of resources on the topic of multiple hypothesis testing.


1 Publications

1.1 Philosophical

1.2 Introductory

1.3 Seminal: Static

Bonnferroni-Correction

Details

Algorithm for controlling the FWER in (static) hypothesis testing. The adjusted threshold $\alpha_i$ for $k$ tested hypotheses is calculated as:

$$\alpha_i = \frac{\alpha}{k}$$

Details

Algorithm for controlling the FDR in (static) hypothesis testing for p-values that are independent or with positive regression dependency on subsets:

  • Given $\alpha$, sort all p-values $P_k$ and find the largest $k$ for $P_k \leq \frac{k}{m} \alpha$.
  • Reject $\mathcal{H}_0$ for all $H_i$ for $i=1, 2, \ldots, k$.

[BenjaminiHochberg1995]

Details

Algorithm for controlling the FDR in (static) hypothesis testing for p-values under arbitrary dependence. This modifies the threshold as obtained by Benjamini-Hochberg Procedure [BenjaminiYekutieli2001] as follows:

$$P_k \leq \frac{k}{m c(m)} \alpha$$

  • The standard Benjamini-Hochberg Procedure can be recovered by $c(m)=1$ for independent or positively correlated p-values.
  • Under arbitrary dependence $c(m)$ is defined as the Harmonic number $c(m)=\sum^{m}{i=1}\frac{1}{i}$.

1.4 Seminal: Sequential

Serial estimate of the Alpha Fraction that is Futilely Rationed On true Null hypotheses. [RamdasZrnic2018]

Details

Algorithm for controlling FDR in sequential (online) hypothesis testing for independent p-values that was proposed by [RamdasZrnic2018].

SAFFRON estimates the proportion of $\mathcal{H}_0$, i.e. adjusts the test levels $\alpha_i$ based on an estimate of the amount of alpha wealth that is allocated to testing true $\mathcal{H}_0$. SAFFRON depends on the constants $w_0$ and $\lambda$, with $w_0$ as the initial alpha wealth, satisfying $0 \leq w_0 \leq \alpha$. The parameter $\lambda \in (0,1)$ defines the threshold for a candidate as SAFFRON never rejects p-values $\geq \lambda$. Candidates are hypotheses that are more likely to be discoveries:

  • At each time $t$, define the number of candidates after the j-th rejection as

$C_{j+} = C_{j+}(t) = \sum_{i = \tau_j + 1}^{t-1} C_i$

with $C_t = 1{p_t \leq \lambda }$ as the indicator for candidacy.

  • Subsequent test levels are chosen as $\alpha_t = \min{ \lambda, \tilde{\alpha}_t}$ with the exception

$\alpha_1 = \min\{(1 - \lambda)\gamma_1 w_0, \lambda\}$

and subsequent

$\tilde{\alpha}_t = (1 - \lambda) [w_0 \gamma_{t-C_{0+}} + (\alpha - w_0)\gamma_{t-\tau_1-C_{1+}} + \alpha \sum_{j \geq 2} \gamma_{t - \tau_j- C_{j+}}]$

Typically, $\gamma_j \propto j^{-1.6}$ is used as the $\gamma$ sequence.

An ADaptive algorithm that DIScards conservative nulls. [TianRamdas2019]

Details

Algorithm for controlling FDR in sequential (online) hypothesis testing for independent p-values that was proposed by [TianRamdas2019]. ADDIS iterates on SAFFRON by extending SAFFRONs adaptivity in the fraction of $\mathcal{H}_0$ by adaptivity in the conservativeness of $\mathcal{H}_0$. ADDIS depends on the constants $W_0$, $\lambda$ and $\tau$, with $W_0$ as the initial alpha wealth, satisfying $0 \leq w_0 \leq \alpha$. The new parameter $\tau \in (0,1]$ defines the threshold for discarding (conservative) p-values as p-values $\geq \tau$ are discarded (i.e. not considered for testing, with no wealth invested). As for SAFFRON, the parameter $\lambda \in [0,\tau)$ defines the threshold for candidates as ADDIS will never reject _p_values $\geq \lambda$.

$\alpha_t = \min\{\lambda, \tilde{\alpha}_t\}$

$\tilde{\alpha}_t = (\tau - \lambda)[w_0 \gamma_{S^t-C_{0+}} + (\alpha - w_0)\gamma_{S^t - \kappa_1^*-C_{1+}} + \alpha \sum_{j \geq 2} \gamma_{S^t - \kappa_j^* - C_{j+}}$

$\kappa_j = \min\{i \in [t-1] : \sum_{k \leq i} 1 \{p_k \leq \alpha_k\} \geq j\}, \kappa_j^* = \sum_{i \leq \kappa_j} 1 \{p_i \leq \tau \}, S^t = \sum_{i < t} 1 \{p_i \leq \tau \}, C_{j+} = \sum_{i = \kappa_j + 1}^{t-1} 1\{p_i \leq \lambda\}$

Typically, $\gamma_j \propto j^{-1.6}$ is used as the $\gamma$ sequence.

1.5 Seminal: Batching

Batch${\text{BH}}$ and Batch${\text{Storey-BH}}$

Interpolation algorithm between existing pure sequential (online) and static (offline) methods providing a trade-off between statistical power and temporal application. [Zrnic2020]

1.6 Applications and Modifications

1.6.1 Anomaly Detection


2 Presentations


3 Software Packages

R: onlineFDR [Robertson2019] Python: multipy [Puoliväli2020] Python: statsmodels [Seabold2010]

3.1 Repositories

3.2 Miscellaneous


4 References

[Dunn1961] Dunn, O. J. (1961). Multiple Comparisons Among Means. Journal of the American Statistical Association, 56(293), 52–64.

[Tuckey1991] Tukey, J. W. (1991). The Philosophy of Multiple Comparisons. Statistical Science, 6(1), 100-116.

[BenjaminiHochberg1995] Benjamini, Y., & Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1), 289–300.

[BenjaminiYekutieli2001] Benjamini, Y., & Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1), 289–300.

[Benjamini2002] Benjamini, Y., & Braun, H. (2002). John W. Tukey's contributions to multiple comparisons. The Annals of Statistics, 30(6), 1576-1594.

[Lee2018] Lee, S., & Lee, D. K. (2018). What is the proper way to apply the multiple comparison test?. Korean journal of anesthesiology, 71(5), 353–360.

[Robertson2023] Robertson, D. S., Wason, J. M. S., & Ramdas, A. (2023). Online multiple hypothesis testing. Statistical science : a review journal of the Institute of Mathematical Statistics, 38(4), 557–575.

[Robertson2019] Robertson DS, Liou L, Ramdas A, Karp NA (2022). onlineFDR: Online error control. R package 2.12.0.

[Puoliväli2020] Puoliväli T, Palva S, Palva JM (2020): Influence of multiple hypothesis testing on reproducibility in neuroimaging research: A simulation study and Python-based software. Journal of Neuroscience Methods 337:108654.

[Seabold2010] Seabold, Skipper, and Josef Perktold. “statsmodels: Econometric and statistical modeling with python.” Proceedings of the 9th Python in Science Conference. 2010.

[RamdasZrnic2018] Ramdas, A., Zrnic, T., Wainwright, M.J., & Jordan, M.I. (2018). SAFFRON: an adaptive algorithm for online control of the false discovery rate. International Conference on Machine Learning.

[TianRamdas2019] Tian, J., & Ramdas, A. (2019). ADDIS: An adaptive discarding algorithm for online FDR control with conservative nulls. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, & R. Garnett (Eds.), Advances in Neural Information Processing Systems (Vol. 32). Curran Associates, Inc.

[Zrnic2020] Zrnic, T., Jiang, D., Ramdas, A., & Jordan, M.I. (2019). The Power of Batching in Multiple Hypothesis Testing. International Conference on Artificial Intelligence and Statistics.