## Lexicographic combinations in SAS

July 28, 2014
In a previous blog post, I described how to generate combinations in SAS by using the ALLCOMB function in SAS/IML software. The ALLCOMB function in Base SAS is the equivalent function for DATA step programmers. Recall that a combination is a unique arrangement of k elements chosen from a set […]

## Cigarette and life expectancy

July 28, 2014
Yesterday evening, I uploaded a graph, with the labor productivity as a function of coffee consumption. Of course, it was for fun ! With this kind of regression, base on aggregated data, we can say almost anything, since most of them are correlated because of some (hidden) common factor, such as the wealth of the country. For instance, with a similar approach, we can see that there is an increasing…

## Coffee and Productivity

July 27, 2014
On Twitter, I was asked if there were serious research papers published on coffee consumption and labour productivity. There are some papers on coffee breaks and productivity, e.g. Productivity Through Coffee Breaks, but I could not find anything on coffee consumptions. Since I could not find any dataset with personal consumption (maybe I should start keeping tracks of my own consumption to run a study) I tried to find data for national…

## Stan 2.4, New and Improved

July 27, 2014
We’re happy to announce that all three interfaces (CmdStan, PyStan, and RStan) are up and ready to go for Stan 2.4. As usual, you can find full instructions for installation on the Stan Home Page. Here are the release notes with a list of what’s new and improved: New Features ------------ * L-BFGS optimization (now […] The post Stan 2.4, New and Improved appeared first on Statistical Modeling, Causal Inference,…

## Stan found using directed search

July 27, 2014
X and I did some “Sampling Through Adaptive Neighborhoods” ourselves the other day and checked out the nearby grave of Stanislaw Ulam, who is buried with his wife, Françoise Aron, and others of her family. The above image of Stanislaw and Françoise Ulam comes from this charming mini-biography from Roland Brasseur, which I found here. […] The post Stan found using directed search appeared first on Statistical Modeling, Causal Inference,…

## NYC workshop 22 Aug on open source machine learning systems

July 26, 2014
The workshop is organized by John Langford (Microsoft Research NYC), along with Alekh Agarwal and Alina Beygelzimer, and it features Liblinear, Vowpal Wabbit, Torch, Theano, and . . . you guessed it . . . Stan! Here’s the current program: 8:55am: Introduction 9:00am: Liblinear by CJ Lin. 9:30am: Vowpal Wabbit and Learning to Search (John […] The post NYC workshop 22 Aug on open source machine learning systems appeared first…

## Statistics, and the Goldilocks Principle

July 26, 2014
$\hat{f}_h(x) = \frac{1}{n}\sum_{i=1}^n K_h (x - x_i) \quad = \frac{1}{nh} \sum_{i=1}^n K\Big(\frac{x-x_i}{h}\Big)$

By the end of May, in Toronto, we had that great talk at the SSC by Jeff Rosenthal, on monte carlo techniques, and Jeff mention the name of “the Goldilocks principle” (it was in the contect of MCMC, and I did mention it in my talk in London on MCMC, when I discussed the value of the rejection rate of the Hastings Metropolis algorithm, which should be not to large,…

## S. Senn: “Responder despondency: myths of personalized medicine” (Guest Post)

July 26, 2014
Stephen Senn Head, Methodology and Statistics Group Competence Center for Methodology and Statistics (CCMS) Luxembourg Responder despondency: myths of personalized medicine The road to drug development destruction is paved with good intentions. The 2013 FDA report, Paving the Way for Personalized Medicine  has an encouraging and enthusiastic foreword from Commissioner Hamburg and plenty of extremely […]

## Guns are Cool – States

July 26, 2014
Last week I looked at time effects of the shootingtracker database. This week I will look at the states. Some (smaller) states never made it on the database. Other states, far too frequently. The worst of these California. After correcting for populati...

## Student forecasting awards from the IIF

July 26, 2014
At the IIF annual board meeting last month in Rotterdam, I suggested that we provide awards to the top students studying forecasting at university level around the world, to the tune of \$100 plus IIF membership for a year. I’m delighted that the idea...

## A Few Notes on UseR! 2014

July 26, 2014
It has been a month since the UseR! 2014 conference, and I'm probably the last one who writes about it. UseR! is my favorite conference because it is technical and not too big. I have completely lost interest in big and broad conferences like JSM (to m...

## library() vs require() in R

July 26, 2014
While I was sitting in a conference room at UseR! 2014, I started counting the number of times that require() was used in the presentations, and would rant about it after I counted to ten. With drums rolling, David won this little award (sorry, I did n...

## Academic statisticians: there is no shame in developing statistical solutions that solve just one problem

July 25, 2014
I think that the main distinction between academic statisticians and those calling themselves data scientists is that the latter are very much willing to invest most of their time and energy into solving specific problems by analyzing specific data sets. … Continue reading →

## “An Experience with a Registered Replication Project”

July 25, 2014
Anne Pier Salverda writes: I came across this blog entry, “An Experience with a Registered Replication Project,” and thought that you would find this interesting. It’s written by Simone Schnall, a social psychologist who is the first author of an oft-cited Psych Science(!) paper (“Cleanliness reduces the severity of moral judgments”) that a group of […] The post “An Experience with a Registered Replication Project” appeared first on Statistical Modeling,…

## The top dog among jealous dogs

July 25, 2014
Is data visualization worth paying for? In some quarters, this may be a controversial question. If you are having doubts, just look at some examples of great visualization. This week, the NYT team brings us a wonderful example. The story...

## Pat pat

July 25, 2014
This is probably akin to an exercise in self-pleasing, but I'll indulge in this anyway to celebrate the fact that our paper on the Bias in the Eurovision song contest voting (the last in a relatively long series of posts on this is here) has now over 4...

## Interactive visualization of non-linear logistic regression decision boundaries with Shiny

July 24, 2014
(skip to the shiny app) Model building is very often an iterative process that involves multiple steps of choosing an algorithm and hyperparameters, evaluating that model / cross validation, and optimizing the hyperparameters. I find a great aid in this process, for classification tasks, is not only to keep track of the accuracy across models, »more

## If it was good enough for Martin Luther King and Laurence Tribe . . .

July 24, 2014
People keep pointing me to this. P.S. I miss the old days when people would point me to bad graphs. The post If it was good enough for Martin Luther King and Laurence Tribe . . . appeared first on Statistical Modeling, Causal Inference, and Social Scie...

## NFL players keep getting bigger and bigger

July 24, 2014
Aleks points us to this beautiful dynamic graph by Noah Veltman showing the heights and weights of NFL players over time. The color is pretty but I think I’d prefer something simpler, just one dot per player (with some jittering to handle the discrete reporting of heights and weights). In any case, it’s a great […] The post NFL players keep getting bigger and bigger appeared first on Statistical Modeling,…

## Putting Data Into Context

July 24, 2014
Raw numbers are easy to report and analyze, but without the proper context, they can be misleading. Is the effect you’re seeing real, or a simple result of the underlying, obvious distribution? Too many analyses and news stories end up reporting things we already know. This is a particular issue with data that has a […]

## Coherent population forecasting using R

July 24, 2014
This is an example of how to use the demography package in R for stochastic population forecasting with coherent components. It is based on the papers by Hyndman and Booth (IJF 2008) and Hyndman, Booth and Yasmeen (Demography 2013). I will use Australian data from 1950 to 2009 and forecast the next 50 years. In demography, “coherent” forecasts are where male and females (or other sub-groups) do not diverge over…

## Making random draws from an arbitrarily defined pdf

July 23, 2014
I recently found myself in need of a function to sample randomly from an arbitrarily defined probability density function. An excellent post by Quantitations shows how to accomplish this using some of Rs fairly sophisticated functional approximation to...

## Continued:”P-values overstate the evidence against the null”: legit or fallacious?

July 23, 2014
Since the comments to my previous post are getting too long, I’m reblogging it here to make more room. I say that the issue raised by J. Berger and Sellke (1987) and Casella and R. Berger (1987) concerns evaluating the evidence in relation to a given hypothesis (using error probabilities). Given the information that this hypothesis H* was randomly […]