## Why I use Panel/Multilevel Methods

July 24, 2015
I don’t understand why any researcher would choose not to use panel/multilevel methods on panel/hierarchical data. Let’s take the following linear regression as an example: , where is a random effect for the i-th group. A pooled OLS regression model for the above is unbiased and consistent. However, it will be inefficient, unless for all […]

## The relationship between toothlessness and income

July 24, 2015
My colleague Robert Allison finds the most interesting data sets to visualize! Yesterday he posted a visualization of toothless seniors in the US. More precisely, he created graphs that show the estimated prevalence of adults (65 years or older) who have had all their natural teeth extracted. The dental profession […] The post The relationship between toothlessness and income appeared first on The DO Loop.

## 45 years ago in the sister blog

July 24, 2015
The post 45 years ago in the sister blog appeared first on Statistical Modeling, Causal Inference, and Social Science.

## Deja vu! Doping accusations at Tour de France

July 24, 2015
Gabe Murray wrote to Andrew Gelman, asking for comments about the accusations hurled at the current Tour de France front-runner Chris Froome. He said: This post by VeloClinic has been getting a lot of media attention in the past few days, within the context of Chris Froome's dominant performance in the Tour de France: http://veloclinic.com/estimating-the-probability-of-doping-as-a-function-of-power/ The assumptions seem very dubious to me, and I would love to see a critique…

## PLS think twice about partial least squares

July 23, 2015
One of the great things about writing a statistics book was finding an excuse to read about dozens of topics that I knew a little about but hadn't got around to studying in depth. Even so, there were a number of topics I ended up missing out on complet...

## I try hard to not hate all hover-overs. Here is one I love

July 23, 2015
One of the smart things Noah (at WNYC) showed to my class was his NFL fan map, based on Facebook data. This is the "home" of the visualization: The fun starts by clicking around. Here are the Green Bay fans...

## More gremlins: “Instead, he simply pretended the other two estimates did not exist. That is inexcusable.”

July 23, 2015
Brandon Shollenberger writes: I’ve spent some time examining the work done by Richard Tol which was used in the latest IPCC report.  I was troubled enough by his work I even submitted a formal complaint with the IPCC nearly two months ago (I’ve not heard back from them thus far).  It expressed some of the same concerns […] The post More gremlins: “Instead, he simply pretended the other two estimates did not…

## Call for participation: AusDM 2015, Sydney, 8-9 August

July 23, 2015
************************************************************* The 13th Australasian Data Mining Conference (AusDM 2015) Sydney, Australia, 8–9 August 2015 URL: http://ausdm15.ausdm.org/ ************************************************************* The Australasian Data Mining Conference is devoted to the art and science of intelligent data mining: the meaningful analysis of (usually large) data … Continue reading →

## 3 YEARS AGO (JULY 2012): MEMORY LANE

July 23, 2015
3 years ago… MONTHLY MEMORY LANE: 3 years ago: July 2012. I mark in red three posts that seem most apt for general background on key issues in this blog.[1]  This new feature, appearing the last week of each month, began at the blog’s 3-year anniversary in Sept, 2014. (Once again it was tough to pick just 3; please check out others which might […]

## Stan 2.7 (CRAN, variational inference, and much much more)

July 22, 2015
Stan 2.7 is now available for all interfaces. As usual, everything you need can be found starting from the Stan home page: http://mc-stan.org/ Highlights RStan is on CRAN!(1) Variational Inference in CmdStan!!(2) Two new Stan developers!!!  A whole new logo!!!!  Math library with autodiff now available in its own repo!!!!!  (1) Just doing install.packages(“rstan”) isn’t […] The post Stan 2.7 (CRAN, variational inference, and much much more) appeared first on…

## Le Monde puzzle [#920]

July 22, 2015
A puzzling Le Monde mathematical puzzle (or blame the heat wave): A pocket calculator with ten keys (0,1,…,9) starts with a random digit n between 0 and 9. A number on the screen can then be modified into another number by two rules: 1. pressing k changes the k-th digit v whenever it exists into […]

## Statistically significant. What does it mean?

July 22, 2015
Andrew Gelman has a great post about the concept of statistical significance, starting with a published definition by the Department of Health that is technically wrong on many levels. (link) Statistical significance is one of the most important concepts in statistics. In recent years, there is a vocal group who claims this idea is misguided and/or useless. But what they are angry about is the use (and frequently, mis-use) of…

## BREAKING . . . Kit Harrington’s height

July 22, 2015
Rasmus “ticket to” Bååth writes: I heeded your call to construct a Stan model of the height of Kit “Snow” Harrington. The response on Gawker has been poor, unfortunately, but here it is, anyway. Yeah, I think the people at Gawker have bigger things to worry about this week. . . . Here’s Rasmus’s inference […] The post BREAKING . . . Kit Harrington’s height appeared first on Statistical Modeling,…

## A new method to simulate the triangular distribution

July 22, 2015
The triangular distribution has applications in risk analysis and reliability analysis. It is also a useful theoretical tool because of its simplicity. Its density function is piecewise linear. The standardized distribution is defined on [0,1] and has one parameter, 0 ≤ c ≤ 1, which determines the peak of the […] The post A new method to simulate the triangular distribution appeared first on The DO Loop.

## Variation de Température

July 22, 2015
Hier, je suis tombé (via limportant.fr/) sur un documentaire intéressant, en ligne sur francetvinfo.fr/monde/environnement/. Mais le passage du début (retranscrit sur le site) m’a laissé une impression très étrange, Au Groenland, la glace fond à vue d’œil. Cette année, le thermomètre est passé à 25 degrés au-dessus de 0. Il y a huit ans, pour la même période, le blizzard soufflait et les scientifiques devaient affronter des températures de – 35…

## Mathematical Statistics Lesson of the Day – Basu’s Theorem

$Mathematical Statistics Lesson of the Day – Basu’s Theorem$

Today’s Statistics Lesson of the Day will discuss Basu’s theorem, which connects the previously discussed concepts of minimally sufficient statistics, complete statistics and ancillary statistics.  As before, I will begin with the following set-up. Suppose that you collected data in order to estimate a parameter .  Let be the probability density function (PDF) or probability […]

## United Nations gets dataviz

July 21, 2015
The UN, as I noted before, is getting into the dataviz game. Here is an announcement about a Data Viz Challenge that has just started. Flood them with ideas! *** I am writing to invite you and your network of...

## "Models, Models Everywhere!" Brought to You by R

July 21, 2015
Statistical software packages sell solutions. If you go to the home page for SAS, they will tell you upfront that they sell products and solutions. They link both together under the first tab just below "The Power to Know" mantra. SPSS separates produc...

## A bad definition of statistical significance from the U.S. Department of Health and Human Services, Effective Health Care Program

July 21, 2015
As D.M.C. would say, bad meaning bad not bad meaning good. Deborah Mayo points to this terrible, terrible definition of statistical significance from the Agency for Healthcare Research and Quality: Statistical Significance Definition: A mathematical technique to measure whether the results of a study are likely to be true. Statistical significance is calculated as the […] The post A bad definition of statistical significance from the U.S. Department of Health…

## Choosing a Classifier

July 21, 2015
In order to illustrate the problem of chosing a classification model consider some simulated data, > n = 500 > set.seed(1) > X = rnorm(n) > ma = 10-(X+1.5)^2*2 > mb = -10+(X-1.5)^2*2 > M = cbind(ma,mb) > set.seed(1) > Z = sample(1:2,size=n,replace=TRUE) > Y = ma*(Z==1)+mb*(Z==2)+rnorm(n)*5 > df = data.frame(Z=as.factor(Z),X,Y) A first strategy is to split the dataset in two parts, a training dataset, and a testing dataset. >…

## MacBook Air battery replacement

July 21, 2015
After four years of daily use our MacBook Air informed us that it needed a battery replacement. That's kind of nice to know, in particular as it still feels speedy and otherwise just works. A new battery isn't that expensive and according to iFixit it ...

## Parametric Inference: Karlin-Rubin Theorem

July 20, 2015
A family of pdfs or pmfs $\{g(t|\theta):\theta\in\Theta\}$ for a univariate random variable $T$ with real-valued parameter $\theta$ has a monotone likelihood ratio (MLR) if, for every $\theta_2>\theta_1$, $g(t|\theta_2)/g(t|\theta_1)$ is a monotone ...