(This article was originally published at Statistics – John D. Cook, and syndicated at StatsBlogs.)

The beta-binomial model is the “hello world” example of Bayesian statistics. I would call it a toy model, except it is actually useful. It’s not nearly as complicated as most models used in application, but it illustrates the basics of Bayesian inference. Because it’s a conjugate model, the calculations work out trivially.

For more on the beta-binomial model itself, see A Bayesian view of Amazon Resellers and Functional Folds and Conjugate Models.

I mentioned in a recent post that the Kullback-Leibler divergence from the prior distribution to the posterior distribution is a measure of how much information was gained.

Here’s a little Python code for computing this. Enter the *a* and *b* parameters of the prior and the posterior to compute how much information was gained.

from scipy.integrate import quad from scipy.stats import beta as beta from scipy import log2 def infogain(post_a, post_b, prior_a, prior_b): p = beta(post_a, post_b).pdf q = beta(prior_a, prior_b).pdf (info, error) = quad(lambda x: p(x) * log2(p(x) / q(x)), 0, 1) return info

This code works well for medium-sized inputs. It has problems with large inputs because the generic integration routine `quad`

needs some help when the beta distributions become more concentrated.

You can see that surprising input carries more information. For example, suppose your prior is beta(3, 7). This distribution has a mean of 0.3 and so your expecting more failures than successes. With such a prior, a success changes your mind more than a failure does. You can quantify this by running these two calculations.

print( infogain(4, 7, 3, 7) ) print( infogain(3, 8, 3, 7) )

The first line shows that a success would change your information by 0.1563 bits, while the second shows that a failure would change it by 0.0297 bits.

**Please comment on the article here:** **Statistics – John D. Cook**