Matthijs Vákár writes:
Re the popularity of functional programming and Church-style languages in the programming languages community: there is a strong sentiment in that community that functional programming provides important high-level primitives that make it easy to write correct programs. This is because functional code tends to be very short and easy to reason about because of referential transparency, making it quick to review and maintain. This becomes even more important in domains where correctness is non-trivial, like in the presence of probabilistic choice and conditioning (or concurrency).
The sense – I think – is that for the vast majority of your code, correctness and programmer-time are more important considerations than run-time efficiency. A typical workflow should consist of quickly writing an obviously correct but probably inefficient program, profiling it to locate the bottlenecks and finally – but only then – thinking hard about how to use computational effects like mutable state and concurrency to optimise those bottlenecks without sacrificing correctness.
There is also a sense that compilers will get smarter with time, which should result in pretty fast functional code, allowing the programmer to think more about what she wants to do rather than about the how. I think you could also see the lack of concern for performance of inference algorithms in this light. This community is primarily concerned with correctness over efficiency. I’m sure they will get to more efficient inference algorithms eventually. (Don’t forget that they are not statisticians by training so it may take some time for knowledge about inference algorithms to percolate into the PL community.)
Justified or not, there is a real conviction in the programming languages community that functional ideas will become more and more important in mainstream programming. People will point to the increase of functional features in Scala, Java, C# and even C++. Also, people note that OCaml and Haskell these days are getting to the point where they are really quite fast. Jane Street would not be using OCaml if it weren’t competitive. In fact, one of the reasons they use it – as I understand it – is that their code involves a lot of maths which functional languages lend themselves well to writing. Presumably, this is also part of the motivation of using functional programming for prob prog? In a way, functional programming feels closer to maths.
Another reason I think people care about language designs in the style of Church is that they make it easy to write certain programs that are hard in Stan. For instance, perhaps you care about some likelihood-free model, like a model where some probabilistic choices are made, then a complicated deterministic program is run and only then the results are observed (conditioned on). An example that we have here in Oxford is a simulation of the standard model of particle physics. This effectively involves pushing forward your priors through some complicated simulator and then observing. It would be difficult to write down a likelihood for this model. Relatedly, when I was talking to Erik Meijer, I got the sense that Facebook wants to integrate prob prog into existing large software systems. The resulting models would not necessarily have a smooth likelihood wrt the Lebesgue measure. I am told that in some cases inference in these models is quite possible with rejection sampling or importance sampling or some methods that you might consider savage. The models might not necessarily be very hard statistically, but they are hard computationally in the sense that they do involve running a non-trivial program. (Note though that this has less to do with functional programming than it does with combining probabilistic constructs with more general programming constructs!)
A different example that people often bring up is given by models where observations might be of variable length, like in language models. Of course, you could do it in Stan with padding, but some people don’t seem to like this.
Of course, there is the question whether all of this effort is wasted if eventually inference is completely intractable in basically all but the simplest models. What I would hope to eventually see – and many people in the PL community with me – is a high-level language in which you can do serious programming (like in OCaml) and have access to serious inference capabilities (like in Stan). Ideally, the compiler should decide/give hints as to which inference algorithms to try — use NUTS when you can, but otherwise back-off and try something else. And there should be plenty of diagnostics to figure out when to distrust your results.
Finally, though, something I have to note is that programming language people are the folks writing compilers. And compilers are basically the one thing that functional languages do best because of their good support for user-defined data structures like trees and recursing over such data structures using pattern matching. Obviously, therefore, programming language folks are going to favour functional languages. Similarly, because of their rich type systems and high-level abstractions like higher-order functions, polymorphism and abstract data types, functional languages serve as great hosts for DSLs like PPLs. They make it super easy for the implementer to write a PPL even if they are a single academic and do not have the team required to write a C++ project.
The question now is whether they also genuinely make things easier for the end user. I believe they ultimately have the potential to do so, provided that you have a good optimising compiler, especially if you are trying to write a complicated mathematical program.
On one hand, people are setting up models in Church etc. that they may never be able to fit—it’s not just that Church etc are too slow to fit these models, it’s that they’re unidentified or just don’t make sense or have never really been thought out. (My problem here is not with the programming language called Church; I’m just here concerned about the difficulty with fitting any model correctly the first time, hence the need for flexible tools for model fitting and exploration.)
But, on the other hand, people are doing inference and making decisions all the time, using crude regressions or t-tests or whatever.
To me, it’s close to meaningless for someone to say they can write “an obviously correct but probably inefficient program” if the program is fitting a model that you can’t really fit!
When I say “write an obviously correct but probably inefficient program”, I am talking about usual functional programming in a deterministic setting. I think many programming language (PL) folks do not fully appreciate what correctness for a probabilistic program means. I think they are happy when they can prove that their program asymptotically does the right thing. I do not think they quite realise yet that that gives no guarantees about what to think about your results when you run your program for a finite amount of time. It sounds silly, but I think at least part of the community is still to realise that. Personally, I expect that people will wake up eventually and will realise that correctness in a probabilistic setting is a much more hairy beast. In particular, I expect that the PL community will realise that run-time diagnostics and model checking are the things that they should have been doing all along. I am hopeful though that at some point enough knowledge of statistics will trickle through to them such that genuinely useful collaboration is possible. I strongly feel that there is mostly just a lot of confusion rather than wilful disregard.