The Bandwidth for the KPSS Test

July 12, 2017

(This article was originally published at Econometrics Beat: Dave Giles' Blog, and syndicated at StatsBlogs.)

Recently, I received an email from a follower of this blog, who asked:
"May I know what is the difference between the bandwidth of Newey-West and Andrews for the KPSS test. It is because when I test the variable with Newey-West, it is I(2), but then I switch the bandwidth to Andrews, it becomes I(1)."
First of all, it's worth noting that the unit root and stationarity tests that we commonly use can be very sensitive to the way in which they're constructed and applied. An obvious example arises with the choice of the maximum lag length when we're using the Augmented Dickey-Fuller test. Another example would be the treatment of the drift and trend components when using that test, So, the situation that's mentioned in the email above is not unusual, in general terms.

Now, let's look at the specific question that's been raised here.
When the KPSS test (Kwiatkowski et al, 1992) is used, there are basically three choices that need to be made in the construction of the test statistic. The first of these is associated with the formulation of the null and alternative hypotheses. Is the null going to be that the time-series is "level stationary", or is it going to be"stationary about a deterministic trend"? Let's suppose that we've decided on this (and hence on the form of the alternative hypothesis).

The remaining two decisions relate to the construction of the denominator in the formula for the KPSS test statistic itself. This denominator provides a consistent estimate of the long-run variance of the time-series. Lots of different estimators for this variance can be used, and studies such as those of Den Haan and Levin (1997), indicate that the behaviour/performance of the KPSS test can depend critically on the choice of this estimator. For a really good discussion of this, see Hobijn et al. (2004).

Whatever variance estimator is used, it's generally based on the empirical autocorrelation function for the time-series. If a non-parametric estimator is adopted (which is frequently the case), this is where the remaining two decisions arise. 

Depending on the form of the null hypothesis for the KPSS test, we regress the time-series on either a constant, or a constant and linear trend variable, using OLS. Let et denote the resulting t'th residual (t = 1, 2, ..., T). Then the estimator of the long-run variance will be of the form:

          s2 = γ0 + 2Σj km(j) γ

where the range of summation is from j = 1 to (T-1); km(j) is discussed below; and

         γj = T-1 Σ (et et-j)  ,

(where the range of summation is from t = j +1 to T) is the j'th empirical autocovariance.

Two quantities have to be chosen in order to construct the non-parametric estimator, s2(i) The so-called "kernel"function, km(j); and (ii) the "bandwidth, or "lag order", m. The two most common choices for the kernel are the Bartlett kernel, which was employed in the original KPSS paper; or the Quadratic Spectral kernel, as used by Andrews (1991) and Newey and West (1994). Both of the latter studies suggest that this choice of kernel leads to more accurate estimates of the variance of the time series, in finite samples, than do other kernels. Some results reported by Hobijn et al. (2004) support this.

So, this leaves us with the final choice that has to be made - that of the "bandwidth". What exactly is this?

To make matter more specific, and for simplicity, consider the Bartlett kernel, which is of the form:

                       km(j) = [1 - j / (m + 1)]   ;   j ≤ m       (and zero, otherwise)

The "bandwidth" is just the quantity, m. In the original KPSS paper, the value of m was set to be a non-stochastic function of the sample size, T. In practice, if this is done, it's common to put m equal to the integer part of  [4 (T / 100)](2/9). For example, if T = 200, then the bandwidth would be 1 (i.e., the integer part of 1.5874). 

In contrast, both Andrews (1991) and Newey and West (1994) suggested specific data-based/automated ways of estimating the optimal value of m.

How do their approaches to this differ? That's question that was asked in the email that I received. A proper answer to this question requires a lot of detail. You'll find a full discussion on the "Help" site for the EViews econometrics package: here and here. (Note that you don't have to be an EViews user to access this information, but why wouldn't you be?)

The take-away message here is simple enough. When we are testing for unit roots, all of the tests that we use require us to make choices about various quantities or "settings". In many cases, these choices are really important, and they can affect the results that we obtain quite dramatically.

Tread carefully, and when you;re reporting your empirical results make sure that you state what choices you've made.


Andrews, D. W. K., 1991.  Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica, 59, 817-858.

Den Haan, W. J. and A. Levin, 1997. A practitioner's guide to robust covariance matrix estimation. In Handbook of Statistics 15: Robust Inference, G. S. Maddala and C. R. Rao (eds.), Chapter 12, 291-341. Amsterdam, North-Holland. (Free download.)

Hobijn, B., Franses, P. H., and M. Ooms. 2004. Generalizations of the KPSS-test for Stationarity. Statistica Neerlandica, 58, 483-502. (Working Paper version, 1998.)

Kwiatkowski, D., Phillips, P. C. B., Schmidt, P. and Y. Shin, 1992. Testing the null hypothesis of stationarity against the alternative of a unit root:  How sure are we that economic time series have a unit root? Journal of Econometrics, 54, 159-178.

Newey, W. K. and K. D. West, 1994. Automatic lag selection in covariance matrix estimation. Review of Economic Studies, 61, 631-653.
© 2017, David E. Giles

Please comment on the article here: Econometrics Beat: Dave Giles' Blog

Tags: , ,