Testing Millions of Hypotheses: FDR

October 5, 2012
By

(This article was originally published at Normal Deviate, and syndicated at StatsBlogs.)

In the olden days, multiple testing meant testing 8 or 9 hypotheses. Today, multiple testing can involve testing thousands or even million of hypotheses.

A revolution occurred with the publication of Benjamini and Hochberg (1995). The method introduced in that paper has made it feasible to test huge numbers of hypotheses with high power. The Benjamini and Hochberg method is now standard in areas like genomics.

1. Multiple Testing

We want to test a large number of null hypotheses {H_1,\ldots, H_N}. Let {H_j =0} if the {j^{\rm th}} null hypothesis is true and let {H_j =1} if the {j^{\rm th}} null hypothesis is false. For example, {H_j} might be the hypothesis that there is no difference in mean gene expression level between healthy and diseased tissue, for the {j^{\rm th}} gene.

For each hypothesis {H_j} we have a test statistic {Z_j} and a p-value {P_j} computed from the test statistic. If {H_j} is true (no difference) then {P_j} has a uniform distribution {U} on {(0,1)}. If {H_j} is false (there is a difference) then {P_j} has some other distribution, typically more concentrated towards 0.

If we were testing one hypotheses, we would reject the null hypothesis if the p-value is less than {\alpha}. The type I error — the probability of a false rejection — is then {\alpha}. But in multiple testing we can’t simply reject all hypotheses for which {P_j \leq \alpha}. When {N} is large, we will make many type I errors.

A common and very simple way to fix the problem is the Bonferroni method: reject when {P_j\leq \alpha/N}. The set of rejected hypotheses is

\displaystyle  R = \Bigl\{j:\ P_j \leq \frac{\alpha}{N}\Bigr\}.

It follows from the union bound that

\displaystyle  {\rm Probability\ of\ any\ false\ rejections\ }= P(R \cap {\cal N}) \leq \alpha

where {{\cal N} = \{j:\ H_j =0\}} is the set of true null hypotheses.

The problem with the Bonferroni method is that the power — the probability of rejecting {H_j} when {H_j=1} — goes to 0 as {N} increases.

2. FDR

Instead of controlling the probability of any false rejections, the Benjamini-Hochberg (BH) method controls the false discovery rate (FDR) defined to be

\displaystyle  {\rm FDR} = \mathbb{E}({\rm FDP})

where

\displaystyle  {\rm FDP} = \frac{F}{R},

{F} is the number of false rejections and {R} is the number of rejections. Here, FDP is the false discovery proportion.

The BH method works as follows. Let

\displaystyle  P_{(1)}\leq P_{(2)}\leq \cdots \leq P_{(N)}

be the ordered p-values. The rejection set is

\displaystyle  R = \Bigl\{j:\ P_j \leq T\Bigr\}

where {T=P_{(j)}} and

\displaystyle  j = \max \left\{ s:\ P_{(s)} \leq \frac{\alpha s}{N} \right\}.

(If the p-values are not independent, an adjustment may be required.) Benjamini and Hochberg proved that, if this method is used then {{\rm FDR} \leq \alpha}.

3. Why Does It Work?

The original proof that {{\rm FDR} \leq \alpha} is a bit complicated. A slick martingale proof can be found in Storey, Taylor and Siegmund (2003). Here, I’ll give a less than rigorous but very simple proof.

Suppose the fraction of true nulls is {\pi}. The distribution of the p-values can be written as

\displaystyle  G = \pi U + (1-\pi)F

where {U(t)=t} is the uniform (0,1) distribution (the nulls) and {F} is some other distribution on (0,1) (the alternatives). Let

\displaystyle  \hat G(t) = \frac{1}{N}\sum_{j=1}^N I(P_j \leq t)

be the empirical distribution of the p-values. Suppose we reject all p-values less than {t}. Now

\displaystyle  F = \sum_{j=1}^n I(P_j\leq t, H_j=0)

and

\displaystyle  R = \sum_{j=1}^n I(P_j\leq t) = N \hat{G}(t).

Thus,

\displaystyle  \begin{array}{rcl}  \mathbb{E}\left( \frac{F}{R}\right) &=& \mathbb{E}\left(\frac{\sum_{j=1}^n I(P_j\leq t, H_j=0)}{\sum_{j=1}^n I(P_j\leq t)}\right)\\ &=& \frac{\mathbb{E}\left(\sum_{j=1}^n I(P_j\leq t, H_j=0)\right)} {\mathbb{E}\left(\sum_{j=1}^n I(P_j\leq t)\right)} + O\left(\sqrt{\frac{1}{N}}\right)\\ &=& \frac{N P(P_j \leq t|H_j=0) P(H_j=0)}{N G(t)} + O\left(\sqrt{\frac{1}{N}}\right)\\ &=& \frac{t\pi}{G(t)} + O\left(\sqrt{\frac{1}{N}}\right)= \frac{t\pi}{\hat G(t)} + O_P\left(\sqrt{\frac{1}{N}}\right) \end{array}

and so

\displaystyle  {\rm FDR} \approx \frac{t\pi}{\hat G(t)}.

Now let {t} be equal to one of the ordered p-values, say {P_{(j)}}. Thus {t = P_{(j)}}, {\hat G(t) = j/N} and

\displaystyle  {\rm FDR} \approx \frac{\pi P_{(j)}N}{j} \leq \frac{P_{(j)}N}{j}.

Setting the right hand side to be less than or equal to {\alpha} yields

\displaystyle  \frac{N P_{(j)}}{j} \leq \alpha

or in other words, choose {t=P_{(j)}} to satisfy

\displaystyle  P_{(j)} \leq \frac{\alpha j}{N}

which is exactly the BH method.

To summarize: we reject all p-values less than {T=P_{(j)}} where

\displaystyle  j = \max \left\{ s:\ P_{(s)} \leq \frac{\alpha s}{N} \right\}.

We then have the guarantee that {FDR \leq \alpha}.

The method is simple and, unlike Bonferroni, the power does not go to 0 as {N\rightarrow\infty}.

There are now many modifications to the BH method. For example, instead of controlling the mean of the FDP you can choose {T} so that

\displaystyle  P({\rm FDP} > \alpha) \leq \beta

which is called FDP control. (Genovese and Wasserman 2006). One can also weight the p-values (Genovese, Roeder, and Wasserman 2006).

4. Limitations

FDR methods control the error rate while maintaining high power. But it is important to realize that these methods give weaker control than Bonferroni. FDR controls the (expected) fraction of false rejections. Bonferroni protects you from making any false rejections. Which you should use is very problem dependent.

5. References

Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological). 289–300.

Genovese, C.R. and Roeder, K. and Wasserman, L. (2006). False discovery control with p-value weighting. Biometrika, 93, 509-524.

Genovese, C.R. and Wasserman, L. (2006). Exceedance control of the false discovery proportion. Journal of the American Statistical Association, 101, 1408-1417.

Storey, J.D. and Taylor, J.E. and Siegmund, D. (2003). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 66, 187–205.




Please comment on the article here: Normal Deviate

Tags:

Subscribe

Email:

  Subscribe