## another riddle with a stopping rule

May 26, 2016
$another riddle with a stopping rule$

A puzzle on The Riddler last week that is rather similar to an earlier one. Given the probability (1/2,1/3,1/6) on {1,2,3}, what is the mean of the number N of draws to see all possible outcomes and what is the average number of 1's in those draws? The second question is straightforward, as the proportions

## When doing causal inference, define your treatment decision and then consider the consequences that flow from it

May 26, 2016
Danielle Fumia writes: I am a research at the Washington State Institute for Public Policy, and I work on research estimating the effect of college attendance on earnings. Many studies that examine the effect of attending college on earnings control for college degree receipt and work experience. These models seem to violate the practice you

## “99.60% for women and 99.58% for men, P < 0.05.”

May 26, 2016
Gur Huberman pointed me to this paper by Tamar Kricheli-Katz and Tali Regev, "How many cents on the dollar? Women and men in product markets." It appeared in something called ScienceAdvances, which seems to be some extension of the Science brand, i.e., it's in the tabloids! I'll leave the critical analysis of this paper to

## Using Support Vector Machines as Flower Finders: Name that Iris!

May 25, 2016
Nature field guides are filled with pictures of plants and animals that teach us what to look for and how to name what we see. For example, a flower finder might display pictures of different iris species, such as the illustrations in the plot below.

## The Scrollytelling Scourge

May 25, 2016
Scrollytelling is a common way of interacting with stories these days. Scroll down and the story unfolds! Except it's often awkward, brittle, and gets in the way. The Age of Scrollytelling Scrolling is a funny thing. It was long considered something people rarely did, and many news organizations will still talk about stories being "above the

## The difference between “significant” and “not significant” is not itself statistically significant: Education edition

May 25, 2016
In a news article entitled "Why smart kids shouldn't use laptops in class," Jeff Guo writes: For the past 15 years, educators have debated, exhaustively, the perils of laptops in the lecture hall. . . . Now there is an answer, thanks to a big, new experiment from economists at West Point, who randomly banned

## Compute the square root matrix

May 25, 2016
Children in primary school learn that every positive number has a real square root. The number x is a square root of s, if x2 = s. Did you know that matrices can also have square roots? For certain matrices S, you can find another matrix X such that X*X

## Annals of really pitiful spammers

May 25, 2016
Here it is: On May 18, 2016, at 8:38 AM, ** wrote: Dr. Gelman, I hope all is well. I looked at your paper on [COMPANY] and would be very interested in talking about having a short followup or a review article about this published in the next issue of the Medical Research Archives. It

## Here’s something I know nothing about

May 24, 2016
Paul Campos writes: Does it seem at all plausible that, as per the CDC, rates of smoking among people with GED certificates are double those among high school dropouts and high school graduates? My reply: It does seem a bit odd, but I don't know who gets GED's. There could be correlations with age and

## A multidimensional graphic that holds a number of surprises, via NYT

May 24, 2016
The New York Times has an eye-catching graphic illustrating the Amtrak crash last year near Philadelphia. The article is here. The various images associated with this article vary in the amount of contextual details offered to readers. This graphic provides...

## Early bird registration for R in Insurance closes 30 May

May 24, 2016
Hurry! The early bird registration offer for the 4th R in Insurance conference, 11 July 2016, at Cass Business School closes 30 May.This one-day conference will focus once more on applications in insurance and actuarial science that use R, the lingua f...

## The Kernel Trick in Support Vector Machines: Seeing Similarity in More Intricate Dimensions

May 24, 2016
The "kernel" is the seed or the essence at the heart or the core, and the kernel function measures distance from that center. In the following example from Wikipedia, the kernel is at the origin and the different curves illustrate alternative depiction...

## Albedo-boy is back!

May 24, 2016
New story here. Background here and here.

## Sometimes there’s friction for a reason

May 24, 2016
Thinking about my post on Theranos yesterday it occurred to me that one thing that's great about all of the innovation and technology coming out of places like Silicon Valley is the tremendous reduction of friction in our lives. With Uber it's much...

## “Lots of hype around pea milk, with little actual scrutiny”

May 23, 2016
Paul Alper writes: Had no idea that "Pea Milk" existed, let alone controversial. Learn something new every day. Indeed, I'd never heard of it either. I guess "milk" is now a generic word for any white sugary drink? Sort of like "tea" is a generic word for any drink made from a powder steeped in

## Principal Components Regression, Pt. 2: Y-Aware Methods

May 23, 2016
In our previous note, we discussed some problems that can arise when using standard principal components analysis (specifically, principal components regression) to model the relationship between independent (x) and dependent (y) variables. In this note, we present some dimensionality reduction techniques that alleviate some of those problems, in particular what we call Y-Aware Principal Components

## Listening to Your Sentences, II

May 23, 2016
Here's a continuation of this recent post (for students) on listening to writing.OK, you say, Martin Amis interviews are entertaining, but it's not obvious what Martin Amis' writing has to do with mere mortals, so what's the practical advice for t...

## Splitsville for Thiel and Kasparov?

May 23, 2016
The tech zillionaire and the chess champion were always a bit of an odd couple, and I've felt for awhile that it was just as well that they never finished that book they were talking about. But given that each of them has taken a second career in political activism, I can't imagine that they're

## On deck this week

May 23, 2016
Mon: Splitsville for Thiel and Kasparov? Tues: Here's something I know nothing about Wed: The "power pose" of the 6th century B.C. Thurs: "99.60% for women and 99.58% for men, P < 0.05." Fri: Stan on the beach Sat: Michael Lacour vs John Bargh and Amy Cuddy Sun: Should he major in political science and

## Tip of the day: don’t be Theranosed

May 23, 2016
Theranos (v): to spin stories that appeal to data while not presenting any data To be Theranosed is to fall for scammers who tell stories appealing to data but do not present any actual data. This is worse than story time, in which the storyteller starts out with real data but veers off mid-stream into unsubstantiated froth, hoping you and…

## How to fit a variety of logistic regression models in SAS

May 23, 2016
SAS software can fit many different kinds of regression models. In fact a common question on the SAS Support Communities is "how do I fit a <name> regression model in SAS?" And within that category, the most frequent questions involve how to fit various logistic regression models in SAS. There

## Bayes 2016

May 23, 2016
Earlier this week I was at the Bayes 2016 meeting, in lovely Leuven. Although I've been to Belgium quite a few times before, this was my first trip to Leuven \$-\$ somebody who used to work at UCL once told me that they didn't really like the place,...

## Row-Level Thinking vs. Cube Thinking

May 23, 2016
Our mental model of a dataset changes the way we ask questions. One aspect of that is the shape of the data (long or wide); an equally important issue is whether we think of the data as a collection of rows of numbers that we can aggregate bottom-up, or as a complete dataset that we can slice top-down to ask