How Close Is The Normal Distribution?

February 4, 2013

(This article was originally published at Normal Deviate, and syndicated at StatsBlogs.)


One of the first things you learn in probability is that the average {\overline{X}_n} has a distribution that is approximately Normal. More precisely, if {X_1,\ldots, X_n} are iid with mean {\mu} and variance {\sigma^2} then

\displaystyle  Z_n \rightsquigarrow N(0,1)


\displaystyle  Z_n = \frac{\sqrt{n}(\overline{X}_n - \mu)}{\sigma}

and {\rightsquigarrow} means “convergence in distribution.”

1. How Close?

But how close is the distribution of {Z_n} to the Normal? The usual answer is given by the Berry-Esseen theorem which says that

\displaystyle  \sup_t |P(Z_n \leq t) - \Phi(t)| \leq \frac{0.4784 \,\beta_3}{\sigma^3 \sqrt{n}}

where {\Phi} is the cdf of a Normal(0,1) and {\beta_3 = \mathbb{E}(|X_i|^3)}. This is good news; the Normal approximation is accurate and so, for example, confidence intervals based on the Normal approximation can be expected to be accurate too.

But these days we are often interested in high dimensional problems. In that case, we might be interested, not in one mean, but in many means. Is there still a good guarantee for closeness to the Normal limit?

Consider random vectors {X_1,\ldots, X_n\in \mathbb{R}^d} with mean vector {\mu} and covariance matrix {\Sigma}. We’d like to say that {\mathbb{P}(Z_n \in A)} is close to {\mathbb{P}(Z \in A)} where {Z_n = \Sigma^{-1/2}(\overline{X}_n - \mu)} and {Z\sim N(0,I)}. We allow the dimension {d=d_n} grow with {n}.

One of the best results I know of is due to Bentkus (2003) who proved that

\displaystyle  \sup_{A\in {\cal A}} | \mathbb{P}(Z_n \in A) - \mathbb{P}(Z \in A) | \leq \frac{400\, d^{1/4} \beta}{\sqrt{n}}

where {{\cal A}} is the class of convex sets and {\beta = \mathbb{E} ||X||^3}. We expect that {\beta = C d^{3/2}} so the error is of order {O(d^{7/4}/\sqrt{n})}. This means that we must have {d = o(n^{2/7})} to make the error go to 0 as {n\rightarrow\infty}.

2. Ramping Up The Dimension

So far we need {d^{7/2}/n \rightarrow 0} to justify the Normal approximation which is a serious restriction. Most of the current results in high dimensional inference, such as the lasso, do not place such as severe restriction on the dimension. Can we do better than this?

Yes. Right now we are witnessing a revolution in Normal approximations thanks to Stein’s method link.
This is a method for bounding the distance from Normal approximations invented by Charles Stein in 1972.

Although the method is 40 years old, there has recently been an explosion of interest in the method. Two excellent references are the book by Chen, Goldstein and Shao (2012) and the review article by Nathan Ross which can be found here.

An example of the power of this method is the very recent paper by Victor Chernozhukov, Denis Chetverikov and Kengo Kato. They showed that, if we restrict {{\cal A}} to rectangles rather than convex sets, then

\displaystyle  \sup_{A\in {\cal A}} | \mathbb{P}(Z_n \in A) - \mathbb{P}(Z \in A) | \rightarrow 0

as long as {(\log d)^7/n \rightarrow 0}. (In fact, they use a lot of tricks besides Stein’s method but Stein’s method plays a key role).

This is an astounding improvement. We only need {d} to be smaller than {e^{n^{1/7}}} instead of {n^{2/7}}.

The restriction to rectangles is not so bad; it leads immediately to a confidence rectangle for the mean, for example. The authors show that their results can be used to derive further results for bootstrapping, for high-dimensional regression and for hypothesis testing.

I think we are seeing the beginning of a new wave of results on high dimensional Berry-Esseen theorems. I will do a post in the future on Stein’s method.


Bentkus, Vidmantas. (2003). On the dependence of the Berry-Esseen bound on dimension. Journal of Statistical Planning and Inference, 385-402.

Chen, Louis Goldstein, Larry and Shao, Qi-Man. (2010). Normal approximation by Stein’s method. Springer.

Victor Chernozhukov, Denis Chetverikov and Kengo Kato. (2012). Central Limit Theorems and Multiplier Bootstrap when p is much larger than n.

Ross, Nathan. (2011). Fundamentals of Stein’s method. Probability Surveys, 8, 210-293.

Stein, Charles. (1986), Approximate computation of expectations. Lecture Notes-Monograph Series 7.

Please comment on the article here: Normal Deviate