What does CCPA say about de-identified data?

The California Consumer Privacy Act, or CCPA, takes effect January 1, 2020, less than six months from now. What does the act say about using deidentified data? First of all, I am not a lawyer; I work for lawyers, advising them on matters where law touches statistics. This post is not legal advice, but my […]

Common Ensemble Models can be Biased

In our previous article , we showed that generalized linear models are unbiased, or calibrated: they preserve the conditional expectations and rollups of the training data. A calibrated model is important in many applications, particularly when financial data is involved. However, when making predictions on individuals, a biased model may be preferable; biased models may … Continue reading Common Ensemble Models can be Biased

Causal inference using repeated cross sections

Sadish Dhakal writes: I am struggling with the problem of conditioning on post-treatment variables. I was hoping you could provide some guidance. Note that I have repeated cross sections, not panel data. Here is the problem simplified: There are two programs. A policy introduced some changes in one of the programs, which I call the […]

the most probable cluster

In the last issue of Bayesian Analysis, Lukasz Rajkowski studies the most likely (MAP) cluster associated with the Dirichlet process mixture model. Reminding me that most Bayesian estimates of the number of clusters are not consistent (when the sample size grows to infinity). I am always puzzled by this problem, as estimating the number of […]

Channel quantity and quality

Years ago, when there were a couple dozen television stations, someone [1] speculated that when we got more channels we’d also get better content. The argument was that people are more similar in the base interests than in their more refined interests. Therefore if there are only a few channels, they will all appeal to […]

CRAN does not validate R packages!

A friend called me the other day for advice on how to submit an R package to CRAN along with a proof his method was mathematically sound. I replied with some items of advice taken from my (limited) experience with submitting packages. And with the remark that CRAN would not validate the mathematical contents of […]

Symmetry in exponential sums

Today’s exponential sum is highly symmetric: These sums are often symmetric, but not always. For example, here’s the sum from a couple days ago: It’s not obvious from looking at the parameters whether a sum will be symmetric or not. Maybe someone could find a prove criteria for a sum to have certain symmetries. For […]

and it only gets worse [verbatim]

“Increasing export capacity from the Freeport LNG project is critical to spreading freedom gas throughout the world by giving America’s allies a diverse and affordable source of clean energy” M. Menezes, US Secretary of Energy “NASA should NOT be talking about going to the Moon – We did that 50 years ago. They should be […]

Inshallah

This came up in comments the other day:
I kinda like the idea of researchers inserting the word “Inshallah” at appropriate points throughout their text. “Our results will replicate, inshallah. . . . Our code has no more bugs, inshallah,” etc.
Related:

Link Functions versus Data Transforms

In the linear regression section of our book Practical Data Science in R, we use the example of predicting income from a number of demographic variables (age, sex, education and employment type). In the text, we choose to regress against log10(income) rather than directly against income. One obvious reason for not regressing directly against income … Continue reading Link Functions versus Data Transforms

Link Functions versus Data Transforms

In the linear regression section of our book Practical Data Science in R, we use the example of predicting income from a number of demographic variables (age, sex, education and employment type). In the text, we choose to regress against log10(income) rather than directly against income. One obvious reason for not regressing directly against income … Continue reading Link Functions versus Data Transforms