The California Consumer Privacy Act, or CCPA, takes effect January 1, 2020, less than six months from now. What does the act say about using deidentified data? First of all, I am not a lawyer; I work for lawyers, advising them on matters where law touches statistics. This post is not legal advice, but my […]

## a non-riddle

Unless I missed a point in the last riddle from the Riddler, there is very little to say about it: Given N ocre balls, N aquamarine balls, and two urns, what is the optimal way to allocate the balls to the urns towards drawing an ocre ball with no urn being empty? Both my reasoning […]

## Common Ensemble Models can be Biased

In our previous article , we showed that generalized linear models are unbiased, or calibrated: they preserve the conditional expectations and rollups of the training data. A calibrated model is important in many applications, particularly when financial data is involved. However, when making predictions on individuals, a biased model may be preferable; biased models may … Continue reading Common Ensemble Models can be Biased

## Serious applications of a party trick

In a group of 30 people, it’s likely that two people have the same birthday. For a group of 23 the probability is about 1/2, and it goes up as the group gets larger. In a group of a few dozen people, it’s unlikely that anyone will have a particular birthday, but it’s likely that […]

## Causal inference using repeated cross sections

Sadish Dhakal writes: I am struggling with the problem of conditioning on post-treatment variables. I was hoping you could provide some guidance. Note that I have repeated cross sections, not panel data. Here is the problem simplified: There are two programs. A policy introduced some changes in one of the programs, which I call the […]

## the most probable cluster

In the last issue of Bayesian Analysis, Lukasz Rajkowski studies the most likely (MAP) cluster associated with the Dirichlet process mixture model. Reminding me that most Bayesian estimates of the number of clusters are not consistent (when the sample size grows to infinity). I am always puzzled by this problem, as estimating the number of […]

## Channel quantity and quality

Years ago, when there were a couple dozen television stations, someone [1] speculated that when we got more channels we’d also get better content. The argument was that people are more similar in the base interests than in their more refined interests. Therefore if there are only a few channels, they will all appeal to […]

## “Widely cited study of fake news retracted by researchers”

Chuck Jackson forwards this amusing story: Last year, a study was published in the Journal of Human Behavior, explaining why fake news goes viral on social media. The study itself went viral, being covered by dozens of news outlets. But now, it turns out there was an error in the researchers’ analysis that invalidates their […]

## CRAN does not validate R packages!

A friend called me the other day for advice on how to submit an R package to CRAN along with a proof his method was mathematically sound. I replied with some items of advice taken from my (limited) experience with submitting packages. And with the remark that CRAN would not validate the mathematical contents of […]

## Symmetry in exponential sums

Today’s exponential sum is highly symmetric: These sums are often symmetric, but not always. For example, here’s the sum from a couple days ago: It’s not obvious from looking at the parameters whether a sum will be symmetric or not. Maybe someone could find a prove criteria for a sum to have certain symmetries. For […]

## “Did Austerity Cause Brexit?”

Carsten Allefeld writes: Do you have an opinion on the soundness of this study by Thiemo Fetzer, Did Austerity Cause Brexit?. The author claims to show that support for Brexit in the referendum is correlated with the individual-level impact of austerity measures, and therefore possibly caused by them. Here’s the abstract of Fetzer’s paper: Did […]

## “Did Austerity Cause Brexit?”

Carsten Allefeld writes: Do you have an opinion on the soundness of this study by Thiemo Fetzer, Did Austerity Cause Brexit?. The author claims to show that support for Brexit in the referendum is correlated with the individual-level impact of austerity measures, and therefore possibly caused by them. Here’s the abstract of Fetzer’s paper: Did […]

## and it only gets worse [verbatim]

“Increasing export capacity from the Freeport LNG project is critical to spreading freedom gas throughout the world by giving America’s allies a diverse and affordable source of clean energy” M. Menezes, US Secretary of Energy “NASA should NOT be talking about going to the Moon – We did that 50 years ago. They should be […]

## Inshallah

This came up in comments the other day:

I kinda like the idea of researchers inserting the word “Inshallah” at appropriate points throughout their text. “Our results will replicate, inshallah. . . . Our code has no more bugs, inshallah,” etc.

Related:

…

## Le Monde puzzle [#1107]

A light birthday problem as Le Monde mathematical puzzle: Each member of a group of 35 persons writes down the number of those who share the same birth-month and the number of those who share the same birth-date [with them]. It happens that these 70 numbers include all integers from 0 to 10. Show that […]

## Le Monde puzzle [#1105]

Another token game as Le Monde mathematical puzzle: Archibald and Beatrix play with a pile of n>100 tokens, sequentially picking m tokens from the pile with m being a prime number [including m=1] or a multiple of 6, the winner taking the last tokens. If Beatrix knows n and proposes to Archibald to start, what […]

## Link Functions versus Data Transforms

In the linear regression section of our book Practical Data Science in R, we use the example of predicting income from a number of demographic variables (age, sex, education and employment type). In the text, we choose to regress against log10(income) rather than directly against income. One obvious reason for not regressing directly against income … Continue reading Link Functions versus Data Transforms

## Link Functions versus Data Transforms

In the linear regression section of our book Practical Data Science in R, we use the example of predicting income from a number of demographic variables (age, sex, education and employment type). In the text, we choose to regress against log10(income) rather than directly against income. One obvious reason for not regressing directly against income … Continue reading Link Functions versus Data Transforms

## Collinearity in Bayesian models

Dirk Nachbar writes: We were having a debate about how much of a problem collinearity is in Bayesian models. I was arguing that it is not much of a problem. Imagine we have this model Y ~ N(a + bX1 + cX2, sigma) where X1 and X2 have some positive correlation (r > .5), they […]

## good omens and bad jokes

Following the news that members of a religious sect had petitioned Netflix not to show Good Omens as they deemed the story blasphemous, mistaking Netflix for Amazon Prime!, I could not resist but engage into watching this show. While having skipped reading the original book. as I am fairly tone-deaf when it comes to Terry […]