Final post on the Wang-Landau and the Flat Histogram criterion

December 15, 2012

(This article was originally published at Statisfaction » Statistics, and syndicated at StatsBlogs.)

Gaussian density biased such that 75% of the mass is in the negative values and 25% in the positive values

Gaussian density biased such that 75% of the mass is in the negative values and 25% in the positive values


With Robin Ryder we wrote a paper titled The Wang-Landau Algorithm Reaches the Flat Histogram in Finite Time and it has been accepted in Annals of Applied Probability (arXiv preprint  here). I’m especially happy about it since it was the last remaining unpublished chapter of my PhD thesis. In this post I’ll try to explain what we proved here on a simple example.

Suppose you are given a target density \pi, and you wish to sample from it BUT with the additional constraint that you want some parts of the state space to be favoured and some other to be penalized. For instance, you are given an univariate Gaussian density. It is symmetric so if you sample directly from it, in average you will get half of your samples in the negative values, and half in the positive values. Suppose that you would like to have 75% of your samples in the negative values instead. This kind of constraint can be interesting if you want to explore multimodal distributions, in which case you might want to penalize already-visited regions of the state space, in order to explore yet unvisited regions (see this old post for instance, which was about a related paper now accepted in JCGS).

In the case of a Gaussian distribution, it is easy because you can compute exactly the required quantities For instance if you partition the state space in two regions A_1 = ]-\infty,0] and A_2 = ]0,+\infty[, and you want to have a proportion of samples \phi_i on each bin A_i (say \phi_1 = 75\% and \phi_2 = 25\%). You can compute for each bin A_i

\psi(A_i) = \frac{1}{\phi_i} \int_{A_i} \frac{1}{\sqrt{2\pi\sigma^2}} \exp(-\frac{1}{2\sigma^2}(x-\mu)^2) dx

and then if you define the “biased” density \tilde\pi as follows (up to a multiplicative constant):

\tilde{\pi} (x) \propto (\frac{1}{\psi(A_1)}1_{x\in A_1} + \frac{1}{\psi(A_2)}1_{x\in A_2}) \exp(-\frac{1}{2\sigma^2}(x-\mu)^2)

then you can check that the biased density has a mass of \phi_1 on bin A_1 and \phi_2 on bin A_2, ie \int_{A_i} \tilde{\pi}(x) dx = \phi_i. Hence drawing from this biased density should yield the desired samples. However in general it is not possible to compute \psi(A_i), so you need an algorithm to estimate these quantities. The Wang-Landau algorithm does this in an online manner, which means that it both gets the desired draws X_1, X_2, \ldots from \tilde{\pi} and estimates the penalties simultaneously. It’s an iterative algorithm that you can run for as long as you want. Now we say that the Flat Histogram criterion is reached when we obtain (approximately) the desired frequencies \phi_i in each bin, ie when the empirical proportions of samples in each bin are close to the desired ones. What we proved with Robin is that the Flat Histogram criterion is reached in a random time \tau that has a finite expectation, i.e. you’re not going to wait forever in front of your computer waiting for the Flat Histogram criterion.

The following plot shows the empirical proportions converging to the desired ones.

The empirical proportions of samples in each bin (red lines, one for each bin) converge to the desired ones (dotted lines), as the iterations go along.

The empirical proportions of samples in each bin (red lines, one for each bin) converge to the desired ones (dotted lines), as the iterations go along.

To produce these figures you can use the PAWL package that is on CRAN. Here’s a bit of code producing both figures of this post.

rinit <- function(size) rep(0, size)
parameters <- list(mean = 0, sd = 1)
logdensity <- function(x, parameters){
  return(dnorm(x, parameters$mean, parameters$sd, log = TRUE) * (x > -10 & x < 10))
gaussiantarget <- target(name = "gaussian", dimension = 1,
                         rinit = rinit, logdensity = logdensity,
                         parameters = parameters)
Nchains <- 1; Niterations <- 10000
proposal <- createAdaptiveRandomWalkProposal(Nchains, gaussiantarget@dimension,
                                             adaptiveproposal = FALSE)
pawlparameters <- tuningparameters(nchains = Nchains, niterations = Niterations, storeall = TRUE)
getPos <- function(points, logdensity) points
desfreq <- c(0.75, 0.25)
positionbinning <- binning(position = getPos,
                           name = "position",
                           bins = c(-Inf, 0),
                           desiredfreq = desfreq,
                           useLearningRate = FALSE,
                           autobinning = FALSE)
pawlresults <- pawl(gaussiantarget, binning = positionbinning, AP = pawlparameters,
                    proposal = proposal)
chains <- ConvertResults(pawlresults)
alllocations <- positionbinning@getLocations(positionbinning@bins, pawlresults$allreaction)
locationsdf <- data.frame(cbind(cumsum(alllocations == 1) / 1:(Niterations+1),
                                cumsum(alllocations == 2) / 1:(Niterations+1),
names(locationsdf) <- c("proportion1", "proportion2",
meltedlocations <- melt(data=locationsdf, id = c("iterations"))

g <- ggplot(data = meltedlocations, aes(x = iterations, y = value, colour = variable))
g <- g + geom_line() + theme(legend.position = "none")
g <- g + scale_colour_manual(values= c("red", "red"))
g <- g + geom_hline(yintercept = positionbinning@desiredfreq, col = "black", linetype = 3)
g <- g + xlab("iterations") + ylab("Proportions of visits to each bin")

ghist <- qplot(chains[,1], geom = "histogram", binwidth = 0.3) + geom_vline(xintercept = 0, colour = "red", size = 3)
ghist <- ghist + xlab("X")

Please comment on the article here: Statisfaction » Statistics

Tags: ,