Fitting hierarchical GLMs in package X is like driving car Y

April 17, 2017

(This article was originally published at Statistical Modeling, Causal Inference, and Social Science, and syndicated at StatsBlogs.)

Given that Andrew started the Gremlin theme, I thought it would only be fitting to link to the following amusing blog post:

It’s exactly what it says on the tin. I won’t spoil the punchline, but will tell you the packages considered are: lme4, JAGS, RStan(Arm), and INLA.

What do you think?

Anyway, don’t take my word for it—read the original post. I’m curious about others’ take on systems for fitting GLMs and how they compare (to cars or otherwise).

You might also like…

My favorite automative analogy was made in the following essay, from way back in the first dot-com boom:

Although it’s about operating systems, the satirical take on closed- vs. open-source is universal.

(Some of) what I thought

Chris Brown reports in the post,

I simulated a simple hierarchical data-set to test each of the models. The script is available here. The data-set has 100 binary measurements. There is one fixed covariate (continuous) and one random effect with five groups. The linear predictor was transformed to binomial probabilities using the logit function. For the Bayesian approaches, slightly different priors were used for each package, depending on what was available. See the script for more details on priors.

Apples and oranges. This doesn’t make a whole lot of sense, given that lme4 is giving you max marginal likelihood, whereas JAGS and Stan give you full Bayes. And if you use different priors in Stan and JAGS, you’re not even fitting the same posterior. I’m afraid I’ve never understood INLA (lucky for me Dan Simpson’s visiting us this week, so there’s no time like the present to learn it). You’ll also find that relative performance of Stan and JAGS will vary dramatically based on the shape of the posterior and scale of the data.

It’s all about effective sample size. The author doesn’ tmention the subtlety of choosing a way to estimate effective sample size (RStan’s is more conservative than the Coda package, using a variance approach like that of the split R-hat we use to detect convergence problems in RStan).

Random processes are hard to compare. You’ll find a lot of variation across runs with different random inits. You really want to start JAGS and Stan at the same initial points and run to the same effective sample size over multiple runs and compare averages and variation.

RStanArm, not RStan. I looked at the script, and it turns out the post is comparing RStanArm, not coding a model in Stan itself and running it in RStan. Here’s the code.

t_prior <- student_t(df = 4, location = 0, scale = 2.5)
mb.stanarm <- microbenchmark(mod.stanarm <- stan_glmer(y ~ x + (1|grps),
                                                       data = dat,
                                                       family = binomial(link = 'logit'),
                                                       prior = t_prior,
                                                       prior_intercept = t_prior,
                                                       chains = 3, cores = 1, seed = 10),
                             times = 1L)

Parallelization reduces wall time. This script runs RStanArm three Markov chains on a single core, meaning they have to run one after the other. This can obviously be sped up by the up to the number of cores you have and letting them all run at the same time. Presuambly JAGS could be sped up the same way. The multiple chains are embarassingly parallelizable, after all.

It's hard to be fair! There's a reason we don't do a lot of these comparisons ourselves!

The post Fitting hierarchical GLMs in package X is like driving car Y appeared first on Statistical Modeling, Causal Inference, and Social Science.

Please comment on the article here: Statistical Modeling, Causal Inference, and Social Science

Tags: ,