I've added a couple of new functions to the forecast package for R which implement two types of cross-validation for time series. K-fold cross-validation for autoregression The first is regular k-fold cross-validation for autoregressive models. Although cross-validation is sometimes not valid for time series models, it does work for autoregressions, which includes many machine learning

A recent issue of Astronomy magazine mentioned Kepler's third law of planetary motion, which states "the square of a planet's orbital period is proportional to the cube of its average distance from the Sun" (Astronomy, Dec 2016, p. 17). The article included a graph (shown at the right) that shows

Eric Loken writes: Do by any chance remember the bogus survey that Augusta National carried out in 2002 to deflect criticism about not having any female members? I even remember this survey being ridiculed by ESPN who said their polls showed much more support for a boycott and sympathy with Martha Burke. Anyway, sure that's

Thomas Heister writes: Your recent post about Per Pettersson-Lidbom frustrations in reproducing study results reminded me of our own recent experience that we had in replicating a paper in PLOSone. We found numerous substantial errors but eventually gave up as, frustratingly, the time and effort didn't seem to change anything and the journal's editors quite

Paul Alper points to this excellent news article by Aaron Carroll, who tells us how little information is available in studies of diet and public health. Here's Carroll: Just a few weeks ago, a study was published in the Journal of Nutrition that many reports in the news media said proved that honey was no

One thing I teach is: when evaluating the performance of regression models you should not use correlation as your score. This is because correlation tells you if a re-scaling of your result is useful, but you want to know if the result in your hand is in fact useful. For example: the Mars Climate Orbiter

Ari Lamstein writes: I chuckled when I read your recent "R Sucks" post. Some of the comments were a bit … heated … so I thought to send you an email instead. I agree with your point that some of the datasets in R are not particularly relevant. The way that I've addressed that is

Nate Silver agrees with me that much of that shocking 2% swing can be explained by systematic differences between sample and population: survey respondents included too many Clinton supporters, even after corrections from existing survey adjustments. In Nate's words, "Pollsters Probably Didn't Talk To Enough White Voters Without College Degrees." Last time we looked carefully

Possibly the last post on random number generation by Kinderman and Monahan's (1977) ratio-of-uniform method. After fiddling with the Gamma(a,1) distribution when a<1 for a while, I indeed figured out a way to produce a bounded set with this method: considering an arbitrary cdf Φ with corresponding pdf φ, the uniform distribution on the set

Shea Levy writes: You ended a post from last month [i.e., Feb.] with the injunction to not take the fact of a paper's publication or citation status as meaning anything, and instead that we should "read each paper on its own." Unfortunately, while I can usually follow e.g. the criticisms of a paper you might

[Following my post of lastTuesday, Matt Graham commented on the paper with force détails. Here are those comments. A nicer HTML version of the Markdown reply below is also available on Github.] Thanks for the comments on the paper! A few additional replies to augment what Amos wrote: This however sounds somewhat intense in that

Jeff Lax points to this post from Matt Novak linking to a post by Matt Taibbi that shares the above graph from newspaper columnist / rich guy Thomas Friedman. I'm not one to spend precious blog space mocking bad graphs, so I'll refer you to Novak and Taibbi for the details. One thing I do

Presentations can be dreadful. Badly thought-out slides, boring structure, poorly delivered. I once told a colleague after a practice talk to please shoot me before she'd ever make me sit through such a talk again (to be fair, she had called the talk boring herself before she even began). Instead of suffering through more bad presentations, Jon

Visual storytelling Visualising data helps understanding facts. Sometimes it's very easy to understand a graph; sometimes it's necessary to read it and to study it to discover unknown territory. Such graphs are little masterpieces. Here's one of these and I am sure the authors had more than one iteration and discussion while creating it. The

Jon Zelner writes: Just thought I'd send along this paper by Justin Lessler et al. Thought it was both clever & useful and a nice ad for using Stan for epidemiological work. Basically, what this paper is about is estimating the true prevalence and case fatality ratio of MERS-CoV [Middle East Respiratory Syndrome Coronavirus Infection]