A few days ago I released version 4.0 of the forecast package for R. There were quite a few changes and new features, so I thought it deserved a new version number. I keep a list of changes in the Changelog for the package, but I doubt that many people look at it. So for the record, here are the most important changes to the forecast package made since v3.0 was released.
ets() function handles exponential smoothing in the ETS framework, but for teaching purposes it is sometimes useful to discuss a more traditional implementation that does not involve optimizing the initial values. The
HoltWinters() function in the
stats package attempts to do this, although the initialization used for the Holt and Holt-Winters methods is non-standard, making it unsuitable for teaching.
Consequently, I’ve now modified the
hw() functions to allow both traditional heuristic initialization and ETS-style optimization of initial states. These functions also now allow damped trend and multiplicative trend.
Multiple seasonality and BATS and TBATS models
Time series with multiple seasonal patterns (e.g., hourly data that contains a daily pattern, weekly pattern and an annual pattern) now have their own model class
msts. This is useful when plotting the data, and in using the
dshw() double seasonal Holt-Winters function for example.
The BATS and TBATS models (fitted using
tbats()) also handle multiple seasonalities as well as complex seasonalities (such as non-integer seasonality, non-nested seasonality and large period seasonality). The models are in the exponential-smoothing framework and were proposed by De Livera, Hyndman and Snyder (JASA 2011).
BATS and TBATS models allow a form of time series decomposition, and so the fitted objects can now plotted in the same way as other decomposition objects (e.g.,
These models are quite computationally intensive, and so the functions will use parallel processing when it is available.
This is probably the most popular function in the package, and I am always looking at ways to speed it up and improve the results.
- Default model selection has been switched from AIC to AICc (the bias-corrected form of AIC) which may affect model selection for very short time series.
- The maximum orders are now restricted to less than 1/3 of the length of the data.
- Parallel processing is used where possible when
accuracy function now works with both time series forecasts and cross-sectional forecasts.
The MASE for seasonal time series is now defined using seasonal naive forecasts for the in-sample scaling. The MASE for cross-sectional forecasts is defined using the mean forecasts for in-sample scaling. This is consistent with my new book.
Neural network AR models
A new experimental function
nnetar has been introduced to fit a neural network with lagged values of the time series as inputs. It is a feed-forward network with one hidden layer, specified using the notation NNAR() to indicate there are lagged inputs and nodes in the hidden layer. For example, a NNAR(9,5) model is a neural network with the last nine observations ) used as inputs to forecast the output , and with five neurons in the hidden layer. A NNAR() model is equivalent to an ARIMA() model but without the restrictions on the parameters to ensure stationarity.
With seasonal data, it is useful to also add the last observed values from the same season as inputs. For example, an NNAR(3,1,2) model has inputs , , and , and two neurons in the hidden layer. More generally, an NNAR() model has inputs and neurons in the hidden layer. A NNAR() model is equivalent to an ARIMA()() model but without the restrictions on the parameters to ensure stationarity.
nnetar() function fits an NNAR() model. If the values of and are not specified, they are automatically selected. For non-seasonal time series, the default is the optimal number of lags (according to the AIC) for a linear AR() model. For seasonal time series, the default values are and is chosen from the optimal linear model fitted to the seasonally adjusted data. If is not specified, it is set to (rounded to the nearest integer).
By default, 25 networks with random starting values are trained and their predictions averaged.
Like many of the functions in the forecast package, the default arguments usually give reasonably good forecasts without the user needing to know much. For example:
fit <- nnetar(lynx) plot(forecast(fit,h=20))
Because random starting values are used, the forecasts will be slightly different each time the function is called.
It is possibly to compute prediction intervals from NNAR models via simulation, but I have not yet implemented this. The function will only give point forecasts for now.
New default colors for forecast plots
The old yellow and orange colors have been replaced by shades of blue-grey which are a little easier on the eye. The darker the color, the higher the density. Here is an example with 80% and 95% prediction intervals.
library(fpp) fit <- Arima(euretail, order=c(0,1,3), seasonal=c(0,1,1)) plot(forecast(fit,h=12))
Earlier this year I started using the package for an undergraduate forecasting class which gave it a good work-out, and I’m also using it as the basis of my new book with George Athanasopoulos. As a result, a lot of new functionality has been added, and lots of bugs have been discovered and fixed.
The process is continuing and I have lots of ideas I want to implement, and no doubt more bugs will be discovered. The development of the forecast package is now maintained on github where you can see what is in development and report any new problems. Please let me know if you find anything you think is an error, or if you have a suggestion.
Please comment on the article here: Hyndsight