Category: Statistical computing

Read this: it’s about importance sampling!

Importance sampling plays an odd role in statistical computing. It’s an old-fashioned idea and can behave just horribly if applied straight-up—but it keeps arising in different statistics problems. Aki came up with Pareto-smoothed importance sampling (PSIS) for leave-one-out cross-validation. We recently revised the PSIS article and Dan Simpson wrote a useful blog post about it […]

How does Stan work? A reading list.

Bob writes, to someone who is doing work on the Stan language: The basic execution structure of Stan is in the JSS paper (by Bob Carpenter, Andrew Matt Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell) and in the reference manual. The details of autodiff are in […]

AnnoNLP conference on data coding for natural language processing

This workshop should be really interesting: Aggregating and analysing crowdsourced annotations for NLP EMNLP Workshop. November 3–4, 2019. Hong Kong. Silviu Paun and Dirk Hovy are co-organizing it. They’re very organized and know this area as well as anyone. I’m on the program committee, but won’t be able to attend. I really like the problem […]

How to simulate an instrumental variables problem?

Edward Hearn writes: In an effort to buttress my own understanding of multi-level methods, especially pertaining to those involving instrumental variables, I have been working the examples and the exercises in Jennifer Hill’s and your book. I can find general answers at the Github repo for ARM examples, but for Chapter 10, Exercise 3 (simulating […]

Neural nets vs. regression models

Eliot Johnson writes: I have a question concerning papers comparing two broad domains of modeling: neural nets and statistical models. Both terms are catch-alls, within each of which there are, quite obviously, multiple subdomains. For instance, NNs could include ML, DL, AI, and so on. While statistical models should include panel data, time series, hierarchical […]

Maintenance cost is quadratic in the number of features

Bob Carpenter shares this story illustrating the challenges of software maintenance. Here’s Bob: This started with the maintenance of upgrading to the new Boost version 1.69, which is this pull request: for this issue: The issue happens first, then the pull request, then the fun of debugging starts. Today’s story starts an issue […]

Several post-doc positions in probabilistic programming etc. in Finland

There are several open post-doc positions in Aalto and University of Helsinki in 1. probabilistic programming, 2. simulator-based inference, 3. data-efficient deep learning, 4. privacy preserving and secure methods, 5. interactive AI. All these research programs are connected and collaborating. I (Aki) am the coordinator for the project 1 and contributor in the others. Overall […]

Markov chain Monte Carlo doesn’t “explore the posterior”

First some background, then the bad news, and finally the good news. Spoiler alert: The bad news is that exploring the posterior is intractable; the good news is that we don’t need to. Sampling to characterize the posterior There’s a misconception among Markov chain Monte Carlo (MCMC) practitioners that the purpose of sampling is to […]

My two talks in Montreal this Friday, 22 Mar

McGill University Biostatistics seminar, Purvis Hall, 102 Pine Ave. West, Room 25, 1-2pm Fri 22 Mar: Resolving the Replication Crisis Using Multilevel Modeling In recent years we have come to learn that many prominent studies in social science and medicine, conducted at leading research institutions, published in top journals, and publicized in respected news outlets, […]

stanc3: rewriting the Stan compiler

I’d like to introduce the stanc3 project, a complete rewrite of the Stan 2 compiler in OCaml. Join us! With this rewrite and migration to OCaml, there’s a great opportunity to join us on the ground floor of a new era. Your enthusiasm for or expertise in programming language theory and compiler development can help […]

HMC step size: How does it scale with dimension?

A bunch of us were arguing about how the Hamiltonian Monte Carlo step size should scale with dimension, and so Bob did the Bob thing and just ran an experiment on the computer to figure it out. Bob writes: This is for standard normal independent in all dimensions. Note the log scale on the x […]

R fixed its default histogram bin width!

I remember hist() in R as having horrible defaults, with the histogram bars way too wide. (See this discussion: A key benefit of a histogram is that, as a plot of raw data, it contains the seeds of its own error assessment. Or, to put it another way, the jaggedness of a slightly undersmoothed histogram […]

Should he go to grad school in statistics or computer science?

Someone named Nathan writes: I am an undergraduate student in statistics and a reader of your blog. One thing that you’ve been on about over the past year is the difficulty of executing hypothesis testing correctly, and an apparent desire to see researchers move away from that paradigm. One thing I see you mention several […]