# Screening: Everything Old is New Again

September 22, 2012
By

(This article was originally published at Normal Deviate, and syndicated at StatsBlogs.)

Screening: Everything Old Is New Again

Screening is one of the oldest methods for variable selection. It refers to doing a bunch of marginal (single covariate) regressions instead of one multiple regression. When I was in school, we were taught that it was a bad thing to do.

Now, screening is back in fashion. It’s a whole industry. And before I throw stones, let me admit my own guilt: see Wasserman and Roeder (2009).

1. What Is it?

Suppose that the data are ${(X_1,Y_1),\ldots, (X_n,Y_n)}$ with

$\displaystyle Y_i = \beta_0 + \beta_1 X_{i1} + \cdots + \beta_d X_{id} + \epsilon_i.$

To simplify matters, assume that ${\beta_0=0}$, ${\mathbb{E}(X_{ij})=0}$ and ${{\rm Var}(X_{ij})=1}$. Let us assume that we are in the high dimensional case where ${n < d}$. To perform variable selection, we might use something like the lasso.

But if we use screening, we instead do the following. We regress ${Y}$ on ${X_1}$, then we regress ${Y}$ on ${X_2}$, then we regress ${Y}$ on ${X_3}$. In other words, we do ${d}$ one-dimensional regressions. Denote the regression coefficients by ${\hat\alpha_1,\hat\alpha_2,\ldots}$. We keep the covariates associated with the largest values of ${|\hat\alpha_j|}$. We then might do a second step such as running the lasso on the covariates that we kept.

What are we actually estimating when we regression ${Y}$ on the ${j^{\rm th}}$ covariate? It is easy to see that

$\displaystyle \mathbb{E}(\hat\alpha_j) = \alpha_j$

where

$\displaystyle \alpha_j = \beta_j + \sum_{s\neq j} \beta_s \rho_{sj}$

and ${\rho_{sj}}$ is the correlation between ${X_j}$ and ${X_s}$.

2. Arguments in Favor of Screening

If you miss an important variable during the screening phase you are in trouble. This will happen if ${|\beta_j|}$ is big but ${|\alpha_j|}$ is small. Can this happen?

Sure. You can certainly find values of the ${\beta_j}$‘s and the ${\rho_{js}'s}$ to make ${\beta_j}$ big and make ${\alpha_j}$ small. In fact, you can make ${|\beta_j|}$ huge while making ${\alpha_j=0}$. This is sometimes called unfaithfulness in the literature on graphical models.

However, set of ${\beta}$ vectors that are unfaithful has Lebesgue measure 0. Thus, in some sense, unfaithfulness is “unlikely” and so screening is safe.

3. Arguments Against Screening

Not so fast. In order to screw up, it is not necessary to have exact unfaithfulness. All we need is approximate unfaithfulness. And the set of approximately unfaithful ${\beta}$‘s is a non-trivial subset of ${\mathbb{R}^d}$.

But it’s worse than that. Cautious statisticians want procedures that have properties that hold uniformly over the parameter space. Screening cannot be successful in any uniform sense because of the unfaithful (and nearly unfaithful) distributions.

And if we admit that the linear model is surely wrong, then things get even worse.

4. Conclusion

Screening is appealing because it is fast, easy and scalable. But it makes a strong (and unverifiable) assumption that you are not unlucky and have not encountered a case where ${\alpha_j}$ is small but ${\beta_j}$ is big.

Sometimes I find the arguments in favor of screening to be appealing but when I’m in a more skeptical (sane?) frame of mind, I find screening to be quite unreasonable.

What do you think?

Wasserman, L. and Roeder, K. (2009). High dimensional variable selection. Annals of statistics, 37, 2178.

Please comment on the article here: Normal Deviate

Tags:

 Tweet

Email: