“If you torture the data long enough, it will confess.” This aphorism, attributed to Ronald Coase, sometimes has been used in a disrespective manner, as if was wrong to do creative data analysis. This view obviously is misleading. In contra...

“If you torture the data long enough, it will confess.” This aphorism, attributed to Ronald Coase, sometimes has been used in a disrespective manner, as if was wrong to do creative data analysis. This view obviously is misleading. In contra...

Sanjay Srivastava blogged some interesting thoughts about the process of post-publication peer review (PPPR), reflecting about his own comment on a PLOS ONE publication. I agree that open peer commentaries after publication are one important part of th...

Update Feb 17, 2014: WRS moved to Github – This installation procedure has been updated and still is valid Some users had trouble installing the WRS package from R-Forge. Here’s a method that should work automatically and fail-safe: [cc lan...

Maybe you have encountered this situation: you run a large-scale study over the internet, and out of curiosity, you frequently the correlation between two variables. My experience with this practice is usually frustrating, as in small sample sizes (a...

[Update June 12: Data.tables functions have been improved (thanks to a comment by Matthew Dowle); for a similar approach see also Tal Galili's post] The guys from RStudio now provide CRAN download logs (see also this blog post). Great work! I always as...

One critique frequently heard about Bayesian statistics is the subjectivity of the assumed prior distribution. If one is cherry-picking a prior, of course the posterior can be tweaked, especially when only few data points are at hand. For example, see ...

Today a new version (0.23.1) of the WRS package (Wilcox’ Robust Statistics) has been released. This package is the companion to his rather exhaustive book on robust statistics, “Introduction to Robust Estimation and Hypothesis Testing”...

The probably most frequent criticism of Bayesian statistics sounds something like “It’s all subjective – with the ‘right’ prior, you can get any result you want.”. In order to approach this criticism it has been sugg...

My last lesson introduced the matched pairs experimental design, which is a special type of the randomized blocked design. Let’s now talk about how to analyze the data from such a design. Since the experimental units are organized in pairs, the units between pairs (blocks) are not independently assigned. (The units within each pair are […]

My friends Randal Douc and Éric Moulines just published this new time series book with David Stoffer. (David also wrote Time Series Analysis and its Applications with Robert Shumway a year ago.) The books reflects well on the research of Randal and Éric over the past decade, namely convergence results on Markov chains for validating […]

Interview with Nick Chamandy, statistician at Google You and Your Research + video Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained A Survival Guide to Starting and Finishing a PhD Six Rules For Wearing Suits For Beginners Why I Created C++ More advice to scientists on blogging Software engineering practices for graduate students Statistics Matter […]

Someone who wants to remain anonymous writes: I am working to create a more accurate in-game win probability model for basketball games. My idea is for each timestep in a game (a second, 5 seconds, etc), use the Vegas line, the current score differential, who has the ball, and the number of possessions played already […]The post Basketball Stats: Don’t model the probability of win, model the expected score differential.…

Finally, here it is, check out the video below as Stephen Wolfram showcases the Wolfram language, From my previous post, I said that I used Wolfram Mathematica for about a year before I embrace R. And frankly, I've been in love with Mathematica; it nev...

Many functions in the forecast package for R will allow a Box-Cox transformation. The models are fitted to the transformed data and the forecasts and prediction intervals are back-transformed. This preserves the coverage of the prediction intervals, and the back-transformed point forecast can be considered the median of the forecast densities (assuming the forecast densities on the transformed scale are symmetric). For many purposes, this is acceptable, but occasionally the…