Posts Tagged ‘ Probability and Statistics ’

How can a statistician help a lawyer?

December 9, 2017
By
How can a statistician help a lawyer?

I’ll be presenting at a webinar on Wednesday, December 13 at 1:00 PM Eastern. The title of the presentation is “Seven questions a statistician and answer for an attorney.” I will discuss, among other things, when common sense applies and when correct analysis can be counter-intuitive. There will be ample time at the end of […]

Read more »

Pareto distribution and Benford’s law

November 16, 2017
By
Pareto distribution and Benford’s law

The Pareto probability distribution has density for x ≥ 1 where a > 0 is a shape parameter. The Pareto distribution and the Pareto principle (i.e. “80-20” rule) are named after the same person, the Italian economist Vilfredo Pareto. Samples from a Pareto distribution obey Benford’s law in the limit as the parameter a goes to […]

Read more »

Random number generation posts

November 15, 2017
By

Random number generation is typically a two step process: first generate a uniformly distributed value, then transform that value to have the desired distribution. The former is the hard part, but also the part more likely to have been done for you in a library. The latter is relatively easy in principle, though some distributions […]

Read more »

Quantifying information gain in beta-binomial Bayesian model

November 13, 2017
By

The beta-binomial model is the “hello world” example of Bayesian statistics. I would call it a toy model, except it is actually useful. It’s not nearly as complicated as most models used in application, but it illustrates the basics of Bayesian inference. Because it’s a conjugate model, the calculations work out trivially. For more on […]

Read more »

Wheels about to be reinvented

November 8, 2017
By

As companies get into data analysis for the first time, many of them are going to start by making the same mistakes that were common a century ago, then gradually recapitulate the development of modern statistics.  

Read more »

Database anonymization for testing

November 3, 2017
By

How do you create a database for testing that is like your production database? It depends on in what way you want the test database to be “like” the production one. Replacing sensitive data Companies often use an old version of their production database for testing. But what if the production database has sensitive information […]

Read more »

Quantifying the information content of personal data

September 12, 2017
By

It can be surprisingly easy to identify someone from data that’s not directly identifiable. One commonly cited result is that the combination of birth date, zip code, and sex is enough to identify 87% of Americans. This post will look at how to quantify the amount of information contained in such data. If the answer […]

Read more »

Negative correlation introduced by success

September 10, 2017
By
Negative correlation introduced by success

Suppose you measure people on two independent attributes, X and Y, and take those for whom X+Y is above some threshold. Then even though X and Y are uncorrelated in the full population, they will be negatively correlated in your sample. This article gives the following example. Suppose beauty and acting ability were uncorrelated. Knowing how […]

Read more »

Bayesian methods at Bletchley Park

July 25, 2017
By
Bayesian methods at Bletchley Park

From Nick Patterson’s interview on Talking Machines: GCHQ in the ’70s, we thought of ourselves as completely Bayesian statisticians. All our data analysis was completely Bayesian, and that was a direct inheritance from Alan Turing. I’m not sure this has ever really been published, but Turing, almost as a sideline during his cryptoanalytic work, reinvented […]

Read more »

Testing the PCG random number generator

July 7, 2017
By

M. E. O’Neill’s PCG family of random number generators looks very promising. It appears to have excellent statistical and cryptographic properties. And it takes remarkably little code to implement. (PCG stands for Permuted Congruential Generator.) The journal article announcing PCG gives the results of testing it with the TestU01 test suite. I wanted to try it out […]

Read more »


Subscribe

Email:

  Subscribe