(This article was originally published at Econometrics Beat: Dave Giles' Blog, and syndicated at StatsBlogs.)

In an earlier post I discussed Shirley Almon's contribution to the estimation of Distributed Lag (DL) models, with her seminal paper in 1965.

That post drew quite a number of email requests for more information about the Almon estimator, and how it fits into the overall scheme of things. In addition, Almon's approach to modelling distributed lags has been used very effectively more recently in the estimation of the so-called MIDAS model. The MIDAS model (developed by Eric Ghysels and his colleagues -

*e.g*., see Ghysels

*et al*., 2004) is designed to handle regression analysis using data with different observation frequencies. The acronym, "MIDAS", stands for "Mixed-Data Sampling". The MIDAS model can be implemented in R, for instance (

*e.g*., see here), as well as in EViews. (I discussed this in this earlier post.)

For these reasons I thought I'd put together this follow-up post by way of an introduction to the Almon DL model, and some of the advantages and pitfalls associated with using it.

Let's take a look.

Suppose that we want to estimate the coefficients of the following DL model:

y

_{t}= β_{0}x_{t}+ β_{1}x_{t-1}+ β_{2}x_{t-2}+ ........ + β_{n}x_{t-n}+ u_{t}; t = 1, 2, ...., T. (1)This is called a "finite" DL model if the value of n is finite.

We could add an intercept into the model, and/or add other regressors, but that won't alter the basic ideas in the following discussion. So let's keep the model as simple as possible. We'll presume that the error term, u

If the maximum lag length in the model, n, is much less than T, then we could just apply OLS to estimate the regression coefficients. However, even if this is feasible, in the sense that there are positive degrees of freedom, this may not be the smartest way in which to proceed. For most economic time-series, x, the successive lags of the variable are likely to be highly correlated with each other. Inevitably, this will result in quite severe multicollinearity.

How can we deal with this?

In response, Shirley Almon (1965) suggested a pretty neat way of re-formulating the model prior to its estimation. She made use of Weierstrass's Approximation Theorem, which tells us (roughly) that: "Every continuous function defined on a closed interval [a, b] can be uniformly approximated, arbitrarily closely, by a polynomial function of finite degree, P."

_{t}, satisfies all of the usual assumptions - but that can be relaxed too.If the maximum lag length in the model, n, is much less than T, then we could just apply OLS to estimate the regression coefficients. However, even if this is feasible, in the sense that there are positive degrees of freedom, this may not be the smartest way in which to proceed. For most economic time-series, x, the successive lags of the variable are likely to be highly correlated with each other. Inevitably, this will result in quite severe multicollinearity.

How can we deal with this?

In response, Shirley Almon (1965) suggested a pretty neat way of re-formulating the model prior to its estimation. She made use of Weierstrass's Approximation Theorem, which tells us (roughly) that: "Every continuous function defined on a closed interval [a, b] can be uniformly approximated, arbitrarily closely, by a polynomial function of finite degree, P."

Notice that the theorem

(Almon actually used Lagrangian interpolation in her application of Weierstrass's Theorem to this problem, but there's a simpler (and numerically equivalent) way of describing her idea.)

Let's look into this in model in more detail.

Here's equation (1) again:

*doesn't tell us*what the value of P will be. This presents a type of model-selection problem that we have to solve. The flip-side of this is that if we*select*a value for P, and get it wrong, then there will be model mis-specification issues that we have to face. In fact, we can re-cast these issues in terms of those associated with the incorrect imposition of linear restrictions on the parameters of our model.(Almon actually used Lagrangian interpolation in her application of Weierstrass's Theorem to this problem, but there's a simpler (and numerically equivalent) way of describing her idea.)

Let's look into this in model in more detail.

Here's equation (1) again:

y

_{t}= β_{0}x_{t}+ β_{1}x_{t-1}+ β_{2}x_{t-2}+ ........ + β_{n}x_{t-n}+ u_{t}; t = 1, 2, ...., T. (1)What we're going to do is to treat the values of the regression coefficients, β

That is, we'll write:

β

If we set P = 3, here's an example of what we're imposing on the problem:

_{i}, as unknown functions of "i" That is, we'll set β_{i}= g(i). Then we'll approximate g(i) using a polynomial, f(i), of order P. Typically, P will take a small value, such 2, 3, or 4.That is, we'll write:

β

_{i}= a_{0}+ a_{1}i + a_{2}i^{2}+ .... + a_{P}i^{P}; i = 1, 2, ....., n (2)If we set P = 3, here's an example of what we're imposing on the problem:

Substituting (2) into (1), we get:

y

_{t}= a

_{0}x

_{t}+ (a

_{0}+ a

_{1}+ a

_{2}+ .... + a

_{P}) x

_{t-1}+ (a

_{0}+ 2a

_{1}+ 4a

_{2}+ .... + 2

^{P}a

_{P}) x

_{t-2}+ ........

+ (a

_{0}+ na

_{1}+ n

^{2}a

_{2}+ .... + n

^{P}a

_{P}) x

_{t-n}+ u

_{t}; t = 1, 2, ...., T. (3)

Re-arranging the right-hand side of (3), and gathering up terms, we get:

y

_{t}= a

_{0}(x

_{t}+ x

_{t-1 }+ x

_{t-2}+ ......+ x

_{t-n}) + a

_{1}(x

_{t-1}+ 2x

_{t-2}+ .... + nx

_{t-n}) + a

_{2}(x

_{t-1}+ 4x

_{t-2}+ 9x

_{t-3 }+......

+ n

^{2 }x

_{t-n}) + ......... + a

_{P }(x

_{t-1}+ 2

^{P}x

_{t-2}+ .... + n

^{P}x

_{t-n}) + u

_{t}; t = 1, 2, ...., T. (4)

If we've decided on a maximum lag-length (n), and we have chosen a degree (P) for the approximating polynomial, f(.), then we can re-write (4) as:

y

_{t}= a

_{0}z

_{0t}+ a

_{1}z

_{1t}+ a

_{2}z

_{2t}+ ......... + a

_{P }z

_{Pt}+ u

_{t}; t = 1, 2, ...., T. (5)

where:

z

_{0t}= (x

_{t}+ x

_{t-1 }+ x

_{t-2}+ ......+ x

_{t-n})

z

_{1t}= (x

_{t-1}+ 2x

_{t-2}+ .... + nx

_{t-n})

z

_{2t}= (x

_{t-1 }+ 4x

_{t-2}+ 9x

_{t-3 }+...... + n

^{2 }x

_{t-n})

.

.

.

z

_{Pt}= (x

_{t-1}+ 2

^{P}x

_{t-2}+ .... + n

^{P}x

_{t-n}) .

Notice that if P is much smaller than n in value, then the number of regression coefficients that have to be estimated in (5) is much less than in (1). We have effectively imposed (n - P) exact linear restrictions on the original coefficient vector. We now have a particular application of

*restricted least squares*coming up. If these restrictions are incorrect, then there will be serious implications for the properties of our final estimator. Positively, however,, the z variables are likely to exhibit far less multicollinearity than do the successive lags of x itself in model (1).For a given n and P, we can construct the z variables, then estimate a

_{0}, a

_{1},....., a

_{P}by applying OLS to (5), and finally "recover" the estimates for the β

_{i}'s using (2):

β

_{i}*= a

_{0}*+ a

_{1}*i + a

_{2}*i

^{2}+ .... + a

_{P}*i

^{P}; i = 1, 2, ....., n (6)

where a * superscript denotes an OLS estimate.

Because the relationship between the β

_{i}*'s and the aj*'s in (6) is a linear one, it is trivial to "recover" the standard errors for the former estimates form the covariance matrix associated with the latter estimates.

All of this is described in some detail in an old discussion paper by Smith and Giles (1976), referenced below.

Now let's consider a specific example, to make all of this more "concrete".

**An Example**

Here's the original model, again:

y

_{t}= β_{0}x_{t}+ β_{1}x_{t-1}+ β_{2}x_{t-2}+ ........ + β_{n}x_{t-n}+ u_{t}; t = 1, 2, ...., T. (7)Let's choose P = 2. This is very restrictive indeed, in terms of the "shapes" that the lag distribution can take. However, it will simplify the discussion here. More complex (and realistic) cases are discussed in detail by Smith and Giles (1976).

So, we have:

β

_{i}= a_{0}+ a_{1}i + a_{2}i^{2}; i = 1, 2, ....., n (8)which implies that

y

_{t}= a_{0}x_{t}+ (a_{0}+ a_{1}+ a_{2}) x_{t-1}+ (a_{0}+ 2a_{1}+ 4a_{2}) x_{t-2}+ ........ + (a_{0}+ na_{1 }+ n^{2}a_{2}) x_{t-n} + u

_{t}; t = 1, 2, ...., Tor,

y

_{t}= a_{0}(x_{t}+ x_{t-1}+ x_{t-2}+ ........ + x_{t-n}) + a_{1 }(x_{t-1}+ 2x_{t-2}+ ........ + nx_{t-n}) + a

_{2 }(x_{t-1}+ 4x_{t-2}+ ........ + n^{2}x_{t-n}) + u_{t}; t = 1, 2, ...., Tor,

y

_{t}= a_{0}z_{0t}+ a_{1 }z_{1t}+ a_{2 }z_{2t}_{ }+ u_{t}; t = 1, 2, ...., T (9)Here,

z

_{0t }= (x_{t}+ x_{t-1}+ x_{t-2}+ ........ + x_{t-n})_{ }z

_{1t}= (x

_{t-1}+ 2x

_{t-2}+ ........ + nx

_{t-n})

z

_{2t}_{ }= (x_{t-1}+ 4x_{t-2}+ ........ + n^{2}x_{t-n}) ; t = 1, 2, ...., TWe construct the z variables; estimate the coefficients in (9) by OLS; and then create estimates of the original β

_{i}'s using (8). Effectively, we now have (particular) restricted least squares estimates of the original coefficients in (7). Everything that you know about restricted least square applies here!**An Extension**

Often, we'll have economic information that will suggest something further about the pattern (lag distribution) that the values of the β

_{i}'s should follow. For instance, we may know that it makes sense for the lag weights to "die out" to zero when i = n+1. Or we may want the*slope*of the lag distribution to be zero when i = n. There are lots of such pieces of prior information that we may want to impose on the problem, and some of these are discussed by Smith and Giles (1976), together with graphs and details of the associated formulae.Of course, these shape restrictions

*add to*those already in play as a result of choosing a value for P. They further extend the chance that we may be imposing false restrictions on the parameter space, and this would lead our OLS estimates to be both biased and inconsistent. So, extreme care should be taken, and there are some important model-selection issues to be taken into account here.Let's illustrate this by extending the previous example. We'll stick with the choice of P = 2, but we'll add the restriction that the derivative of f(i) should be zero when i = n. This is Case 5 in the paper by Smith and Giles (1976).

Noting that β

_{i}= f(i) = a_{0}+ a_{1}i + a_{2}i^{2}, it follows that f '(i) = a_{1}+ 2a_{2}i, and we're going to set f '(n) = a

_{1}+ 2na_{2 }= 0 ,implying that a

_{1}= - 2na_{2}.Now we can eliminate a

_{1}from the problem (that's another linear restriction that we're imposing, right there).Looking back at equation (9) we can see that the resulting equation that we'll be estimating is now of the form:

y

_{t}= a_{0}z_{0t}+ a_{2 }(z_{2t}_{ }- 2n z_{1t}) + u_{t}; t = 1, 2, ...., T (10)Then, the estimates of the original regression coefficients are

β

_{i}*= a_{0}*+ a_{1}*i + a_{2}*i^{2 }; i = 1, 2, ...., n.So, what are the take-away messages here? They can be summarized pretty simply:

- The Almon estimator provides a rather neat way of circumventing the multicollinearity problems that would arise if we simply estimated a DL model, with lots of lags, directly by OLS.
- It does this by approximating the "shape" of the distribution of the lag coefficients through time by a polynomial of order P.
- The value of P has to be chosen by the user, and this leads to a model-selection problem.
- The choice of P also affects the form of certain exact linear restrictions that are effectively being placed on the regression coefficients.
- This leads to the possibility that false restrictions are imposed, and this would lead to the resulting estimator being both biased and inconsistent.
- Additional restrictions can be placed on the lag distribution, based on our knowledge of the underlying economics of the relationship we're estimating.
- Applying such restrictions should also be undertaken with care, again to avoid adversely affecting the properties of our estimator.
- The validity of some of the restrictions can be tested in the usual way. For instance, when we have fixed P and then we add end-point restrictions, the latter restrictions are "nested", so we can use a Wald test (for instance).

It's worth commenting that the choice of P, and some of the potential model-selection and mis-specification issues that can plague the Almon DL estimator can also be resolved in a straightforward manner if one takes a Bayesian approach to the problem. For more details, and an empirical application, see Giles (1977).

**References**

Almon, S., 1965. The distributed lag between capital appropriations and net expenditures.

*Econometrica*, 33, 178-196.Ghysels, E., P. Santa-Clara, & R. Valkanov, 2004. The MIDAS touch: Mixed data sampling regression models. Mimeo.

Giles, D.E.A., 1977. Current payments for New Zealand's imports: A Bayesian analysis.

*Applied Economics*, 9, 185-201.Weierstrass, K., 1885. Über die analytische Darstellbarkeit sogenannter willkürlicher Functionen einer reellen Veränderlichen.

*Sitzungsberichte der Königlich Preußischen Akademie der Wissenschaften zu Berlin*, (II).

**Please comment on the article here:** **Econometrics Beat: Dave Giles' Blog**