Sub-Gaussian property for the Beta distribution (part 3, final)

December 26, 2017
By

(This article was originally published at R – Statisfaction, and syndicated at StatsBlogs.)

beta_to_bernoulli

When a Beta random variable wants to act like a Bernoulli: convergence of optimal proxy variance.

In this third and last post about the Sub-Gaussian property for the Beta distribution [1] (post 1 and post 2), I would like to show the interplay with the Bernoulli distribution as well as some connexions with optimal transport (OT is a hot topic in general, and also on this blog with Pierre’s posts on Wasserstein ABC).

Let us see how sub-Gaussian proxy variances can be derived from transport inequalities. To this end, we need first to introduce the Wasserstein distance (of order 1) between two probability measures P and  Q on a space \mathcal{X}. It is defined wrt a distance d on \mathcal{X} by

W(P,Q)=\inf_{\pi\in\Pi(P,Q)}\int_{\mathcal{X}\times\mathcal{X}}d(x,y)\pi(\text{d}x,\text{d}y),

where \Pi(P,Q) is the set of probability measures on \mathcal{X}\times \mathcal{X} with fixed marginal distributions respectively P and Q. Then, a probability measure P is said to satisfy a transport inequality with positive constant \sigma, if for any probability measure Q dominated by P,

W(P,Q) \leq\sigma\sqrt{2 D(Q||P)},

where D(Q||P) is the entropy, or Kullback–Leibler divergence, between P and Q. The nice result proven by Bobkov and Götze (1999) [2] is that the constant \sigma^2 is a sub-Gaussian proxy variance for P.

For a discrete space \mathcal{X} equipped with the Hamming metric, d(x,y) = \mathbf{1}_{\{x\neq y\}}, the induced Wasserstein distance reduces to the total variation distance, W(P,Q) = \Vert P-Q\Vert_{\text{TV}}. In that setting, Ordentlich and Weinberger (2005) [3] proved the distribution-sensitive transport inequality:

\Vert P-Q\Vert_{\text{TV}} \leq \sqrt{\frac{1}{g(\mu_P)}D(Q||P)},

where the function g is defined by g(\mu)=\frac{1}{1-2\mu}\ln\frac{1-\mu}{\mu} and the coefficient \mu_P is called the balance coefficient of P, and is defined by \mu_P=\underset{A\subset \mathcal{X}}\max\min\{P(A),1-P(A)\}. In particular, the Bernoulli balance coefficient is easily shown to coincide with its mean. Hence, applying the result of Bobkov and Götze (1999) [2] to the above transport inequality yields a distribution-sensitive proxy variance of \frac{1}{2g(\mu)}=\frac{1-2\mu}{2\ln((1-\mu)/\mu)} for the Bernoulli with mean \mu, as plotted in blue above.

In the Beta distribution case, we have not been able to extend this transport inequality methodology since the support is not discrete. However, a nice limiting argument holds. Consider a sequence of Beta(\alpha,\beta) random variables with fixed mean \mu=\frac{\alpha}{\alpha+\beta} and with a sum \alpha+\beta going to zero. This converges to a Bernoulli random variable with mean \mu, and we have shown that the limiting optimal proxy variance of such a sequence of Beta with decreasing sum \alpha+\beta is the one of the Bernoulli.

References

[1] Marchal, O. and Arbel, J. (2017), On the sub-Gaussianity of the Beta and Dirichlet distributions. Electronic Communications in Probability, 22:1–14, 2017. Code on GitHub.
[2] Bobkov, S. G. and Götze, F. (1999). Exponential integrability and transportation cost related to logarithmic Sobolev inequalities. Journal of Functional Analysis, 163(1):1–28.
[3] Ordentlich, E. and Weinberger, M. J. (2005). A distribution dependent refinement of Pinsker’s inequality. IEEE Transactions on Information Theory, 51(5):1836–1840.



Please comment on the article here: R – Statisfaction

Tags: , , , ,


Subscribe

Email:

  Subscribe