(This article was originally published at Statistical Modeling, Causal Inference, and Social Science, and syndicated at StatsBlogs.)

A month ago I (Aki) started a series of tweets about “scientific books which have had big influence on me…”. They are partially in time order, but I can’t remember the exact order. I may have forgotten some, and some stretched the original idea, but I can recommend all of them.

I have collected all those book tweets below and fixed only some typos. These are my personal favorites, and there are certainly many great books I haven’t listed. Please, tell your own favorite books and short description why you like those books in the comments.

I start to tweet about scientific books which have had big influence on me…

- Bishop, Neural Networks for Pattern Recognition, 1995. The first book where I read about Bayes. I learned a lot about probabilities, inference, model complexity, GLMs, NNs, gradients, Hessian, chain rule, optimization, integration, etc. I used it a lot for many years.
Looking again at contents, it is still a great book although naturally some parts are bit outdated.

- Bishop (1995) referred to Neal, Bayesian Learning for Neural Networks, 1996, from which I learned about sampling in high dimensions, HMC, prior predictive analysis, evaluation of methods and models. Neal’s free FBM code made it easy to test everything in practice.

- Jaynes, Probability Theory: The Logic of Science, 1996: I read this because it was freely available online. There is not much for practical work, but plenty of argumentation why using Bayesian inference makes sense, which I did find useful when I was just learning B.
15 years later I participated in a reading circle with mathematicians and statisticians going through the book in detail. The book was still interesting, but not that spectacular anymore. The discussion in the reading circle was worth it.

- Gilks, Richardson & Spiegelhalter (eds), Markov Chain Monte Carlo in Practice (1996). Very useful introductions to different MCMC topics by Gilks, Richardson & Spiegelhalter Ch1, Roberts Ch3, Tierney Ch4, Gilks Ch5, Gilks & Roberts Ch6, Raftery & Lewis Ch7.
And with special mentions to Gelman on monitoring convergence Ch8, Gelfand on importance-sampling leave-one-out cross-validation Ch9, and Gelman & Meng on posterior predictive checking Ch11. My copy is worn out from heavy use.

- Gelman, Carlin, Stern, and Rubin (1995). I just loved the writing style, and it had so many insights and plenty of useful material. During my doctoral studies I also made about 90% of the exercises as self-study.
I considered using the first edition when I started teaching Bayesian data analysis, but I thought it was maybe too much for a introduction course, and it didn’t have model assessment and selection, which is important for me.

This book (and its later editions) is the one I have re-read most, and when re-reading I keep finding things I didn’t remember being there (I guess I have a bad memory). I still use the last edition regularly, and I’ll get later back to these later editions.

- Bernardo and Smith, Bayesian Theory, 1994. Great coverage (although not complete) of foundations and axioms of Bayesian theory with emphasize that actions and utilities are inseparable part of the theory.
They admit problems of theory in continuous space (which seem to not have a solution that would please everyone, even if it works in practice) and review general probability theory. They derive basic models from simple exchangeability and invariance assumptions.

They review utility and discrepancy based model comparison and rejection with definitions of M-open, -complete, and -closed. This and Bernardo’s many papers had strong influence how I think about model assessment and selection (see, e.g. http://dx.doi.org/10.1214/12-SS102).

- Box and Tiao, Bayesian Inference in Statistical Analysis, 1973. Wonderful book, if you want to see how difficult inference was before MCMC and prob. programming. Includes some useful models, and we used one of them as a prior in a neuromagnetic inverse problem http://becs.aalto.fi/en/research/bayes/brain/lpnorm.pdf

- Jeffreys, Theory of Probability, 3rd ed, 1961. Another book with historical interest. The intro and estimation part are sensible. I was very surprised to learn that he wrote about all the problems of Bayes factor, which was not evident from the later literature on BF.

- Jensen, A introduction to Bayesian Networks, 1996. I’m travelling to Denmark, which reminded me about this nice book on Bayesian networks. It’s out of print, but Jensen & Nielsen, Bayesian Networks and Decision Graphs, 2007, seems to be a good substitute.

- Dale, A History of Inverse Probability: From Thomas Bayes to Karl Pearson, 1991. Back to historically interesting books. Dale has done lot of great research on history of statistics. This one helps to understand Bayesian-Frequentist conflict in 20th century.
The conflict can be seen, eg, Lindley writing in 1968: “The approach throughout is Bayesian: there is no discussion of this point, I merely ask the non-Bayesian reader to examine the results and consider whether they provide sensible and practical answers”.

McGrayne, The Theory That Would Not Die: How Bayes’ Rule Cracked The Enigma Code, Hunted Down Russian Submarines, & Emerged Triumphant from Two Centuries of Controversy, 2011 is more recent and entertaining, but based also on much of Dale’s research.

- Laplace, Philosophical Essay on Probabilities, 1825. English translation with notes by Dale, 1995. Excellent book. I enjoyed how Laplace justified the models and priors he used. Considering clarity of the book, it’s strange how little these ideas were used before 20th century

- Press & Tanur, The Subjectivity of Scientists and the Bayesian Approach, 2001. Many interesting and fun stories about progress of science by scientists being very subjective. Argues that Bayesian approach at least tries to be more explicit on assumptions.

- Spirer, Spirer & Jaffe, Misused Statistics, 1998. Examples of common misuses of statistics (deliberate or inadvertent) in graphs, methodology, data collection, interpretation, etc. Great and fun (or scary) way to teach common pitfalls and how to do things better.

- Howson & Urbach, Scientific Reasoning: The Bayesian Approach, 2nd ed, 1999. Nice book on Bayesianism and philosophy of science: induction, confirmation, falsificationism, axioms, Popper, Lakatos, Kuhn, Cox, Good, and contrast to Fisherian & Neyman-Pearson significance tests.
There are also 1st ed 1993 and 3rd ed 2005.

- Gentle, Random Number Generation and Monte Carlo Methods, 1998, 2.ed 2003. Great if you want to understand or implement: pseudo rng’s, checking quality, quasirandom, transformations from uniform, methods for specific distributions, permutations, dependent samples & sequences.

- Sivia, Data Analysis. A Bayesian tutorial, 1996. I started teaching a Bayesian analysis course in 2002 using this thin very Jaynesian book, as it had many good things. Afterward I realized that it missed too much from the workflow, so that students could do their own projects

- Gelman, Carlin, Stern, & Rubin, BDA2, 2003. This hit the spot. Improved model checking, new model comparison, more on MCMC, and new decision analysis made it at that time the best book for the whole workflow. I started using it in my teaching the same year it was published.
Of course it still had some problems, like using DIC instead of cross-validation, effective sample size estimate without autocorrelation analysis, etc., but additional material I needed to introduce in my course was minimal compared what any other book would had required.

My course included the chapters 1-11 and 22 (with varying emphasis), and I recommended for students to read other chapters.

- MacKay, Information Theory, Inference, and Learning Algorithms, 2003. Super clear introduction to information theory and codes. Has also excellent chapters on probabilities, Monte Carlo, Laplace approximation, inference methods, Bayes, and ends up with neural nets and GPs.
The book is missing the workflow part, but it has many great insights clearly explained. For example, in Monte Carlo chapter, I love how MacKay tells when the algorithms fail and what happens in high dimensions.

Before the 2003 version, I had been reading also drafts which had been available since 1997.

- O’Hagan and Forster, Bayesian Inference, 2nd ed, vol 2B of Kendall’s Advanced Theory of Statistics, 2004. A great reference on all the important concepts in Bayesian inference. Fits well between BDA and Bayesian Theory, and one of my all of favorite books on Bayes.
Covers, e.g., inference, utilities, decisions, value of information, estimation, likelihood principle, sufficiency, ancillarity, nuisance, non-identifiability, asymptotics, Lindley’s paradox, conflicting information, probability as a degree of belief, axiomatic formulation, …

finite additivity, comparability of events, weak prior information, exchangeability, non-subjective theories, specifying probabilities, calibration, elicitation, model comparison (a bit outdated), model criticism, computation (MCMC part is a bit outdated), and some models…

- Rasmussen and Williams, Gaussian Processes for Machine Learning, 2006. I was already familiar with GPs through many articles, but this become very much used handbook and course book for us. The book is exceptional in that it also explains how to implement stable computation.
It has a nice chapter on Laplace approximation and expectation propagation conditional on hyperparameters, but has only Type II MAP estimate for hyperparameters. It has a ML flavor overall, and I know statisticians who have difficulties following the story.

The book was very useful when writing GPstuff. It’s also available free online.

- Gelman & Hill, Data Analysis Using Regression and Multilevel/Hierarchical Models, 2006. I was already familiar with the models and methods in the book, but I loved how it focused on how to think about models, modeling and inference, using many examples to illustrate concepts.
The book starts from a simple linear models and has a patience to progress slowly not to go to early to details on computation and it works surprisingly well even if Bayesian inference comes only after 340 pages.

Gaussian linear model, logistic regression, generalized linear models, simulation, model checking, causal inference, multilevel models, Bugs, Bayesian inference, sample size and power calculations, summarizing models, ANOVA, model comparison, missing data.

I recommended the book to my students after BDA2 and O’Hagan & Forster, as it seemed to be a good and quick read for someone who knows how to do the computation already, but I couldn’t see how I would use it in teaching as Bayesian inference comes late and it was based on BUGS!

More recently re-reading the book, I still loved the good bits, but also was shocked to see how much it was encouraging to wander around in a garden of forking paths. AFAIK there is a new edition in progress which updates it to use more modern computation and model comparison.

- Harville, Matrix Algebra From a Statistician’s Perspective, 1997. 600 pages of matrix algebra with focus on that part of matrix algebra commonly used in statistics. Great book for people implementing computational methods for GPs and multivariate linear models.
Nowadays with Matrix cookbook online, I use it less often to check simpler matrix algebra tricks, but my students still find it useful as it goes deeper and has more derivations in many topics.

- Gelman and Nolan, Teaching Statistics: A Bag of Tricks, 2002 (2.ed 2017). A large number of examples, in-class activities, and projects to be used in teaching concepts in intro stats course. I’ve used ideas from different parts and especially from decision analysis part.

- Abrams, Spiegelhalter & Myles, Bayesian Approaches to Clinical Trials and Health-Care Evaluation, 2004. This was helpful book to learn basic statistical issues in clinical trials and health-care evaluation, and how to replace “classic” methods with Bayesian.
Medical trials, sequential analysis, randomised controlled trials, ethics of randomization, sample-size assessment, subset and multi-center analysis, multiple endpoints and treatments, observational studies, meta-analysis, cost-effectiveness, policy-making, regulation, …

- Ibrahim, Chen & Sinha, Bayesian Survival Analysis, 2001. The book goes quickly to the details of model and inference and thus is not an easy one. There has been a lot of progress in models and inference afterwards, but it’s still very valuable reference on survival analysis.

- O’Hagan et al, Uncertain Judgments: Eliciting Experts’ Probabilities, 2006. A great book on very important but too much ignored topic of eliciting prior information. A must read for anyone considering using (weakly) informative priors.
The book reviews psychological research that shows, e.g., how the form of the questions affect the experts’ answers. The book also provides recommendations how to make better elicitation and how to validate the results of elicitation.

Uncertainty & the interpretation of probability, aleatory & epistemic, what is an expert?, elicitation process, the psychology of judgment under uncertainty, biases, anchoring, calibration, representations, debiasing, elicitation, evaluating elicitation, multiple experts, …

- Bishop, Pattern Recognition and Machine Learning, 2006. It’s quite different from 1995 book, although it covers mostly the same models. For me there was not much new to learn, but my students have used it a lot as a reference, and I also enjoyed the explanations of VI and EP.
Based on the contents and the point of view, the name of the book could also be “Probabilistic Machine Learning”

Due to the theme “influence on me”, it happened that all books I listed were published 2006 or earlier. After that I’ve seen great books, but those have had less influence on me. I may later make a longer list of more recent books I can recommend, but here are some as a bonus:

- McGrayne, The Theory That Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy, 2012. Entertaining book about history of Bayes theory.

- Gelman, Carlin, Stern, Dunson, Vehtari & Rubin, Bayesian Data Analysis, 3rd ed, 2013. Obviously a great update of the classic book.

- Särkkä, Bayesian Filtering and Smoothing, 2013. A concise introduction to non-linear Kalman filtering and smoothing, particle filtering and smoothing, and to the related parameter estimation methods from the Bayesian point of view.

- McElreath, Statistical Rethinking: A Bayesian Course with Examples in R and Stan, 2015. Easier than BDA3 and well written. I don’t like how the model comparison is presented, but after reading this book, just check my related articles which were mostly published after this book.

- Goodfellow, Bengio & Sourville, Deep Learning, 2016. This deep learning introduction has enough probabilistic view that I also can recommend it.

- Stan Development Team, Stan Modeling Language: User’s Guide and Reference Manual, 2017. It’s not just Stan language manual, it’s also full of well written text about Bayesian inference and models. There is a plan to divide this in parts, and one part would make a great text book.

I’ve read more than these and the list was just the ones I enjoyed most. I think people, and also I, read less books now when it’s easier to find articles, case studies, and blog posts in internet. Someday I’ll make similar list for the top papers I’ve enjoyed.

The post Aki’s favorite scientific books (so far) appeared first on Statistical Modeling, Causal Inference, and Social Science.

**Please comment on the article here:** **Statistical Modeling, Causal Inference, and Social Science**