Posts Tagged ‘ R ’

Cross-validation for time series

December 5, 2016
By
Cross-validation for time series

I’ve added a couple of new functions to the forecast package for R which implement two types of cross-validation for time series. K-fold cross-validation for autoregression The first is regular k-fold cross-validation for autoregressive models. Although cross-validation is sometimes not valid for time series models, it does work for autoregressions, which includes many machine learning […]

Read more »

Be careful evaluating model predictions

December 3, 2016
By
Be careful evaluating model predictions

One thing I teach is: when evaluating the performance of regression models you should not use correlation as your score. This is because correlation tells you if a re-scaling of your result is useful, but you want to know if the result in your hand is in fact useful. For example: the Mars Climate Orbiter … Continue reading Be careful evaluating model predictions

Read more »

ratio-of-uniforms [#4]

December 1, 2016
By
ratio-of-uniforms [#4]

Possibly the last post on random number generation by Kinderman and Monahan’s (1977) ratio-of-uniform method. After fiddling with the Gamma(a,1) distribution when a<1 for a while, I indeed figured out a way to produce a bounded set with this method: considering an arbitrary cdf Φ with corresponding pdf φ, the uniform distribution on the set […]

Read more »

Efficiently Saving and Sharing Data in R

December 1, 2016
By
Efficiently Saving and Sharing Data in R

After spending a day the other week struggling to make sense of a federal data set shared in an archaic format (ASCII fixed format dat file). It is essential for the effective distribution and sharing of data that it use the minimum amount of disk spac...

Read more »

asymptotically exact inference in likelihood-free models [a reply from the authors]

November 30, 2016
By
asymptotically exact inference in likelihood-free models [a reply from the authors]

[Following my post of lastTuesday, Matt Graham commented on the paper with force détails. Here are those comments. A nicer HTML version of the Markdown reply below is also available on Github.] Thanks for the comments on the paper! A few additional replies to augment what Amos wrote: This however sounds somewhat intense in that […]

Read more »

BERT: a newcomer in the R Excel connection

November 30, 2016
By
BERT: a newcomer in the R Excel connection

A few months ago a reader point me out this new way of connecting R and Excel. I don’t know for how long this has been around, but I never came across it and I’ve never seen any blog post or article about it. So I decided to write a post as the tool is really […]

Read more »

vtreat data cleaning and preparation article now available on arXiv

November 30, 2016
By

Nina Zumel and I are happy to announce a formal article discussing data preparation and cleaning using the vtreat methodology is now available from arXiv.org as citation arXiv:1611.09477 [stat.AP]. vtreat is an R data.frame processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. It prepares variables so that data has fewer … Continue reading vtreat data cleaning and preparation article now available on arXiv

Read more »

sampling by exhaustion

November 24, 2016
By
sampling by exhaustion

The riddle set by The Riddler of last week sums up as follows: Within a population of size N, each individual in the population independently selects another individual. All individuals selected at least once are removed and the process iterates until one or zero individual is left. What is the probability that there is zero […]

Read more »

Monty Python generator

November 22, 2016
By
Monty Python generator

By some piece of luck I came across a paper by the late George Marsaglia, genial contributor to the field of simulation, and Wai Wan Tang, entitled The Monty Python method for generating random variables. As shown by the below illustration, the concept is to flip the piece H outside the rectangle back inside the […]

Read more »

postdoc on missing data at École Polytechnique

November 17, 2016
By
postdoc on missing data at École Polytechnique

Julie Josse contacted me for advertising a postdoc position at École Polytechnique, in Palaiseau, south of Paris. “The fellowship is focusing on missing data. Interested graduates should apply as early as possible since the position will be filled when a suitable candidate is found. The Centre for Applied Mathematics (CMAP) is  looking for highly motivated […]

Read more »


Subscribe

Email:

  Subscribe