Category: Probability and Statistics

Entropy extractor used in μRNG

Yesterday I mentioned μRNG, a true random number generator (TRNG) that takes physical sources of randomness as input. These sources are independent but non-uniform. This post will present the entropy extractor μRNG uses to take non-uniform bits as input and produce uniform bits as output. We will present Python code for playing with the entropy extractor. (μRNG […]

Exploring the sum-product conjecture

Quanta Magazine posted an article yesterday about the sum-product problem of Paul Erdős and Endre Szemerédi. This problem starts with a finite set of real numbers A then considers the size of the sets A+A and A*A. That is, if we add every element of A to every other element of A, how many distinct sums are there? If we […]

Normal approximation to Laplace distribution?

I heard the phrase “normal approximation to the Laplace distribution” recently and did a double take. The normal distribution does not approximate the Laplace! Normal and Laplace distributions A normal distribution has the familiar bell curve shape. A Laplace distribution, also known as a double exponential distribution, it pointed in the middle, like a pole […]

Probabilisitic Identifiers in CCPA

The CCPA, the California Privacy Protection Act, was passed last year and goes into effect at the beginning of next year. And just as the GDPR impacts businesses outside Europe, the CCPA will impact businesses outside California. The law specifically mentions probabilistic identifiers. “Probabilistic identifier” means the identification of a consumer or a device to a […]

Varsity versus junior varsity sports

Last night my wife and I watched our daughter’s junior varsity soccer game. Several statistical questions came to mind. Larger schools tend to have better sports teams. If the talent distributions of a large school and a small school are the same, the larger school will have a better team because its players are the […]

Unstructured data is an oxymoron

Strictly speaking, “unstructured data” is a contradiction in terms. Data must have structure to be comprehensible. By “unstructured data” people usually mean data with a non-tabular structure. Tabular data is data that comes in tables. Each row corresponds to a subject, and each column corresponds to a kind of measurement. This is the easiest data to […]

Can I have the last four digits of your social?

Imagine this conversation. “Could you tell me your social security number?” “Absolutely not! That’s private.” “OK, how about just the last four digits?” “Oh, OK. That’s fine.” When I was in college, professors would post grades by the last four digits of student social security numbers. Now that seems incredibly naive, but no one objected […]