Batch forecasting in R

January 7, 2013
By

(This article was originally published at Hyndsight, and syndicated at StatsBlogs.)

I sometimes get asked about forecasting many time series automatically. Here is a recent email, for example:

I have looked but cannot find any info on generating forecasts on multiple data sets in sequence. I have been using analysis services for sql server to generate fitted time series but it is too much of a black box (or I don’t know enough to tweak/manage the inputs). In short, what package should I research that will allow me to load data, generate a forecast (presumably best fit), export the forecast then repeat for a few thousand items. I have read that R does not like ‘loops’ but not sure if the current cpu power offsets that or not. Any guidance would be greatly appreciated. Thank you!!

My response

Loops are fine in R. The are frowned upon because people use them inappropriately when there are often much more efficient vectorized versions available. But for this task, a loop is the only approach.

Reading data and exporting forecasts is standard R and does not require any additional packages to load. To generate the forecasts, use the forecast package. Either the ets() function or the auto.arima() function depending on what type of data you are modelling. If it’s high frequency data (frequency greater than 24) than you would need the tbats() function but that is very slow.

Some sample code

In the following example, there are many columns of monthly data in a csv file with the first column containing the month of observation (beginning with April 1982). Forecasts have been generated by applying forecast() directly to each time series. That will select an ETS model using the AIC, estimate the parameters, and generate forecasts. Although it returns prediction intervals, in the following code, I’ve simply extracted the point forecasts (named mean in the returned forecast object because they are usually the mean of the forecast distribution).

library(forecast)
 
retail <- read.csv("http://robjhyndman.com/data/ausretail.csv",header=FALSE)
retail <- ts(retail[,-1],f=12,s=1982+3/12)
 
ns <- ncol(retail)
h <- 24
fcast <- matrix(NA,nrow=h,ncol=ns)
for(i in 1:ns)
  fcast[,i] <- forecast(retail[,i],h=h)$mean
 
write(t(fcast),file="retailfcasts.csv",sep=",",ncol=ncol(fcast))

Note that the transpose of the fcast matrix is used in write() because the file is written row-by-row rather than column-by-column.

This code does not actually do what the questioner asked as I am writing all forecasts at once rather than exporting them at each iteration. The latter is much less efficient.

If ns is large, this could probably be more efficiently coded using the parallel package.



Please comment on the article here: Hyndsight

Tags: , , , ,


Subscribe

Email:

  Subscribe