## The difference between data hype and data hope

June 23, 2014
By

I was reading one of my favorite stats blogs, StatsChat, where Thomas points to this article in the Atlantic and highlights this quote: Dassault Systèmes is focusing on that level of granularity now, trying to simulate propagation of cholesterol in human … Continue reading →

## Smullyan and the Randomistas

June 23, 2014
By

Steve Ziliak wrote in: I thought you might be interested in the following exchanges on randomized trials: Here are a few exchanges on the economics and ethics of randomized controlled trials, reacting to my [Zilliak's] study with Edward R. Teather-Posadas, “The Unprincipled Randomization Principle in Economics and Medicine”. Our study is forthcoming in the Oxford […] The post Smullyan and the Randomistas appeared first on Statistical Modeling, Causal Inference, and…

## Mayo’s Error Statistics as a case study in the inevitability of Bayes

June 23, 2014
By

Cox’s Theorem implies that we either use Bayes or our methods will violate some simple but desirable properties. This has two consequences: (1) Frequentist methods such as p-values, which aren’t equivalent to posteriors, are guaranteed to b...

## On deck this week

June 23, 2014
By

Mon: Smullyan and the Randomistas Tues: Too Linear To Be True: The curious case of Jens Forster Wed: More on those randomistas Thurs: Estimating a customer satisfaction regression, asking only a subset of predictors for each person Fri: Quantifying luck vs. skill in sports Sat, Sun: Hey, it’s summer—time to take the weekends off. Have […] The post On deck this week appeared first on Statistical Modeling, Causal Inference, and…

## Getting the basics right is half the battle

June 23, 2014
By

I was traveling quite a lot recently, and last week, read the Wall Street Journal cover to cover for the first time in a while. I am happy to report that there are many more data graphics than I remember...

June 23, 2014
By

As others binge watch Netflix TV, I binge read Gelman posts, while riding a train with no wifi and a dying laptop battery. (This entry was written two weeks ago.) Andrew Gelman is statistics’ most prolific blogger. Gelman-binging has become a necessity since I have not managed to keep up with his accelerated posting schedule. Earlier this year, he began publishing previews of future posts, one week in advance, and…

## Creating ODS graphics from the SAS/IML language

June 23, 2014
By

As you develop a program in the SAS/IML language, it is often useful to create graphs to visualize intermediate results. I do this all the time in my preferred development environment, which is SAS/IML Studio. In SAS/IML Studio, you can write a single statement to create a scatter plot, bar […]

## Ma conférence 11 h, lundi 23 juin à l’Université Paris Dauphine

June 22, 2014
By

￼Les coalitions, le pouvoir des électeurs, et l’instabilité politique: Coalitions are central to politics, at all levels. We discuss some mathematical results relating to the stability of coalitions and the probability of a decisive vote, with connections to the prisoner’s dilemma, agent-based modeling, and probability distributions on trees. Our empirical analysis suggests that the votes […] The post Ma conférence 11 h, lundi 23 juin à l’Université Paris Dauphine appeared…

## It’s not matching or regression, it’s matching and regression.

June 22, 2014
By

A colleague writes: Why do people keep praising matching over regression for being non parametric? Isn’t it f’ing parametric in the matching stage, in effect, given how many types of matching there are… you’re making structural assumptions about how to deal with similarities and differences…. the likelihood two observations are similar based on something quite […] The post It’s not matching or regression, it’s matching and regression. appeared first on…

## Making Statistical Data Meaningful

June 22, 2014
By

Part 4 of UNECE’s series “Making Data Meaningful” is about to be published in 2014. Its title: ‘A Guide to …Continue reading →

## stone flakes III

June 22, 2014
By

Stone flakes are waste products from the tool making process in the stone age. This is the second post, first post was clustering, second linking to hominid type. The data also contains a more or less continuous age variable, which gives possibili...

## Big Bayes Stories? (draft ii)

June 21, 2014
By

“Wonderful examples, but let’s not close our eyes,”  is David J. Hand’s apt title for his discussion of the recent special issue (Feb 2014) of Statistical Science called “Big Bayes Stories” (edited by Sharon McGrayne, Kerrie Mengersen and Christian Robert.) For your Saturday night/ weekend reading, here are excerpts from Hand, another discussant (Welsh), scattered remarks of mine, along […]

## Separating Statistical Models of "What Is Learned" from "How It Is Learned"

June 21, 2014
By

Something triggers our interest. Possibly it's an ad, a review or just word of mouth. We want to know more about the movie, the device, the software, or the service. Because we come with different preferences and needs, our searches vary in intensity. ...

## Kristof/Brooks update: NYT columnists correct their mistakes!

June 21, 2014
By

Who will issue a correction first? Nicholas Kristof, who uncritically cited the hurricane/himmicane paper which appeared in the prestigious Proceedings of the National Academy of Sciences but then was debunked in a stunning round of post-publication review? David Brooks, who botched some historical economic statistics and, in an unrelated incident, uncritically cited some education statistics […] The post Kristof/Brooks update: NYT columnists correct their mistakes! appeared first on Statistical Modeling,…

## An open challenge to Frequentists regarding that disasterous application of Classical Statistics.

June 21, 2014
By

Last week I posted an old example showing how everything in the Frequentist arsenal points to while an elementary deduction from the data implies . The predictable response (an example can be found here) was to claim that no Frequentist would be that d...

## Stan hands-on introduction in NYC Tues 24 Jun 7pm

June 20, 2014
By

Ben Goodrich, one of the Stan developers, will be leading the session. Bring a laptop, if that’s what you’re working on. We’ll cover: • installation of CmdStan, RStan, and possibly PyStan (if we can find an expert) • work through parts of the Stan language through a few models Signup information is here. Anyone who’s […] The post Stan hands-on introduction in NYC Tues 24 Jun 7pm appeared first on…

## Principles Bayesians should adopt to deal with Frequentists

June 20, 2014
By

Like kudzu and people who wear socks with sandals, Frequentists are going to be an unsightly part of the landscape for some time to come. Therefore I propose five principles for dealing with them. Absolute Freedom of Research: Seemingly everyone in Sta...

## Avoiding false parallelism in a graph

June 20, 2014
By

“False parallelism”—feel free to come up with a better term here—is when a graph has repeating elements that do not correspond to repeating structure in the underlying topic being graphed. An example appears in the above graphs from Dan Kahan. The content of the graphs is fine (and, more generally, I think he’s making an […] The post Avoiding false parallelism in a graph appeared first on Statistical Modeling, Causal…

## Mathematical and Applied Statistics Lesson of the Day – Don’t Use the Terms “Independent Variable” and “Dependent Variable” in Regression

$Mathematical and Applied Statistics Lesson of the Day – Don’t Use the Terms “Independent Variable” and “Dependent Variable” in Regression$

In math and science, we learn the equation of a line as , with being called the dependent variable and being called the independent variable.  This terminology holds true for more complicated functions with multiple variables, such as in polynomial regression. I highly discourage the use of “independent” and “dependent” in the context of statistics […]

## Applied Statistics Lesson of the Day – Polynomial Regression is Actually Just Linear Regression

$Applied Statistics Lesson of the Day – Polynomial Regression is Actually Just Linear Regression$

Continuing from my previous Statistics Lesson of the Day on what “linear” really means in “linear regression”, I want to highlight a common example involving this nomenclature that can mislead non-statisticians.  Polynomial regression is a commonly used multiple regression technique; it models the systematic component of the regression model as a -order polynomial relationship between the […]

## DataKind Opportunity Analyst Job Opening

June 19, 2014
By

Jake Porway writes: DataKind is looking for a brilliant part-time Opportunity Analyst to find data-informed solutions to the world’s most pressing problems with our NYC team! We’re a fast growing non-profit that tackles humanity’s biggest problems through data science. . . . We’ve helped the World Bank estimate poverty from satellite imagery, teamed with the […] The post DataKind Opportunity Analyst Job Opening appeared first on Statistical Modeling, Causal Inference,…

## A handy guide to Subjectivity and Objectivity in Statistics.

June 19, 2014
By

Objectivity and Subjectivity in statistics are often hard to spot, especially for students. Luckily however I’ve got some examples that’ll illuminate the concepts. Example 1: If a Frequentist has a fixed parameter in their model they may conduct a ...

## The Oracle (4)

June 19, 2014
By

As promised, some consideration of our model performance, so far. I've produced the graph below, which for each of the first 16 games (ie the games it took for all the 32 teams to be involved once) shows the predictive distribution of the results. The ...