Andrew suggested I cross-post these from the Stan forums to his blog, so here goes.
- Maximum marginal likelihood and posterior approximations with Monte Carlo expectation maximization: I unpack the goal of max marginal likelihood and approximate Bayes with MMAP and Laplace approximations. I then go through the basic EM algorithm (with a traditional analytic example in the appendix). Only then do I get to the (Markov chain) Monte Carlo approach to the marginalization, stochastic averaging EM (SAEM), generalized EM, computing gradients of expectations with Monte Carlo (the trick used in Stan’s variational inference algorithm ADVI), and then I conclude with Andrew’s new algorithm, gradient-based marginal optimization (GMO). My goal is to define the algorithms well enough to be implemented. I was just trying to understand MML and the SAEM algorithm (from Monolix) so I could talk to the folks like Julie Bertrand and France Mentre here at Paris-Diderot. Eventually, it led me to a much better understanding of GMO and why Andrew thinks of MMAP not as a Bayesian-motivated estimator but as the basis of a posterior approximation.
- C++ parameter packs for (de)serialization: On a completely different note, which gets down to actual C++ code, I show how you can use the parameter packs feature in C++ to implement variadic functions and show how to do it for serialization and deserialization (packing and unpacking structured data into simple arrays). This will be the basis of a Stan feature that should be very helpful for marshaling and unmarshaling arguments to our functionals like the ODE solvers, integrators, and algebraic solvers. I did this one so I could understand Ben Bales’s groovy new work on the variadic adjoint-Jacobian product implementation of reverse mode. This is also the key that’s going to unlock our ability to test and get out a reliable higher-order autodiff implementation, which in turn is the gateway to releasing Riemannian HMC.
- A new continuation-based autodiff by refactoring: I walk through four stages of developing a new autodiff system in C++. I explain how reverse-mode autodiff can be viewed as continuations. These continuations can be implemented cleanly with C++ lambdas and std::function types, but it’s not very efficient. So I develop custom closures and then show how it can all be put together for matrices without the need to hold matrices of autodiff variables.
Comments welcome, of course, either here or even better, on the linked forum discussions.
P.S. I figured out how to install the old WordPress editor without sysadmin help. The new one’s horrible!