## Internet use and religion, part four

November 24, 2015
[If you are jumping into the middle of this series, you might want to start with this article, which explains the methodological approach I am taking.]In the previous article, I presented preliminary results from a study of relationships between I...

## Statistical Models That Support Design Thinking: Driver Analysis vs. Partial Correlation Networks

November 24, 2015
We have been talking about design thinking in marketing since Tim Brown's Harvard Business Review article in 2008. It might be easy for the data scientist to dismiss the approach as merely a type of brainstorming for new products or services. Yet, desi...

## Fitting linear mixed models for QTL mapping

November 24, 2015
Linear mixed models (LMMs) have become widely used for dealing with population structure in human GWAS, and they’re becoming increasing important for QTL mapping in model organisms, particularly for the analysis of advanced intercross lines (AIL), which often exhibit variation in the relationships among individuals. In my efforts on R/qtl2, a reimplementation R/qtl to better […]

## 20 years of Data Science: from Music to Genomics

November 24, 2015
I finally got around to reading David Donoho's 50 Years of Data Science paper.  I highly recommend it. The following quote seems to summarize the sentiment that motivated the paper, as well as why it has resonated among academic statisticians: The statistics profession is caught at a confusing moment: the activities which preoccupied it over centuries are now

## Beyond the median split: Splitting a predictor into 3 parts

November 24, 2015
Carol Nickerson pointed me to a series of papers in the journal Consumer Psychology, first one by Dawn Iacobucci et al. arguing in favor of the “median split” (replacing a continuous variable by a 0/1 variable split at the median) “to facilitate analytic ease and communication clarity,” then a response by Gary McClelland et al. […] The post Beyond the median split: Splitting a predictor into 3 parts appeared first…

## Estimating the exponent of discrete power law data

November 24, 2015
Suppose you have data from a discrete power law with exponent α. That is, the probability of an outcome n is proportional to n-α. How can you recover α? A naive approach would be to gloss over the fact that you have discrete data and use the MLE (maximum likelihood estimator) for continuous data. That […]

## Statbusters: please back up an extreme claim with numbers

November 23, 2015
In this week's Statbusters, my column with Andrew Gelman in the Daily Beast, we take note of Slate's recent rant about "wasteful" anti-smoking advertising, and demonstrate how to think about cost-benefit analysis. The key point is: if you are going to make an extreme claim, you better have some numbers to back it up. These numbers can be approximate, and based on (potentially dubious) Googled data. Not every analysis needs…

## I already know who will be president in 2016 but I’m not telling

November 23, 2015
Nadia Hassan writes: One debate in political science right now concerns how the economy influences voters. Larry Bartels argues that Q14 and Q15 impact election outcomes the most. Doug Hibbs argues that all 4 years matter, with later growth being more important. Chris Wlezien claims that the first two years don’t influence elections but the […] The post I already know who will be president in 2016 but I’m not…

## Efficiency in space usage leads to efficiency in comprehension

November 23, 2015
Consider the following two charts that illustrate the same data. (I deliberately took out the header text to make a point. The original chart came from the Wall Street Journal.) To me, the line chart gets to the point more...

## On Bayesian DSGE Modeling with Hard and Soft Restrictions

November 23, 2015
A theory is essentially a restriction on a reduced form. It can be imposed directly (hard restrictions) or used as as a prior mean in a more flexible Bayesian analysis (soft restrictions). The soft restriction approach -- "theory as a shrinkage directi...

## Determine whether a SAS product is licensed

November 23, 2015
Sometimes you are writing a program that needs to find out whether a particular SAS product (like SAS/ETS, SAS/QC, or SAS/OR) is licensed. I was reminded of this fact when I wrote last week's blog post about how to create a map with PROC SGPLOT. Although the SGPLOT procedure is […] The post Determine whether a SAS product is licensed appeared first on The DO Loop.

## Paper: The Connected Scatterplot for Presenting Paired Time Series

November 23, 2015
I’m very happy to finally be able to announce our paper on the connected scatterplot technique. It describes the technique, provides some historical perspective, and most of all looks into how easy to understand and engaging the technique actually is. The connected scatterplot isn’t really known in visualization, but has gotten some interest in journalism. … Continue reading Paper: The Connected Scatterplot for Presenting Paired Time Series

## Top 9 questions to ask a statistician

November 23, 2015
Someone writes in: I am a student at . . . We have been given an assignment that requires us to interview a professional in the criminal justice field who performs, or has performed, statistical analyses on social science related data. . . . We are supposed to collect information pertaining to job description, job […] The post Top 9 questions to ask a statistician appeared first on Statistical Modeling,…

## If a study is worth a mention, it’s worth a link

November 22, 2015
Gur Huberman points to this op-ed entitled “Are Good Doctors Bad for Your Health?” and writes: Can’t the NYT provide a link or an explicit reference to the JAMA Internal Medicine article underlying this OpEd? A reader could then access the original piece and judge its credibility for himself I replied: Yes, very tacky of […] The post If a study is worth a mention, it’s worth a link appeared…

November 22, 2015
Philipp Hennig, Michael Osborne, and Mark Girolami write: We deliver a call to arms for probabilistic numerical methods: algorithms for numerical tasks, including linear algebra, integration, optimization and solving differential equations, that return uncertainties in their calculations. . . . We describe how several seminal classic numerical methods can be interpreted naturally as probabilistic inference. […] The post Flatten your abs with this new statistical approach to quadrature appeared first…

## Sunday morning puzzle

November 21, 2015
A question from X validated that took me quite a while to fathom and then the solution suddenly became quite obvious: If a sample taken from an arbitrary distribution on {0,1}⁶ is censored from its (0,0,0,0,0,0) elements, and if the marginal probabilities are know for all six components of the random vector, what is an […]

November 21, 2015
We have always regretted that we didn’t get to cover gradient boosting in Practical Data Science with R (Manning 2014). To try make up for that we are sharing (for free) our GBM lecture from our (paid) video course Introduction to Data Science. (link, all support material here). Please help us get the word out … Continue reading Free gradient boosting lecture

## Benford lays down the Law

November 21, 2015
A few months ago I received in the mail a book called An Introduction to Benford’s Law by Arno Berger and Theodore Hill. I eagerly opened it but I lost interest once I realized it was essentially a pure math book. Not that there’s anything wrong with math, it just wasn’t what I wanted to […] The post Benford lays down the Law appeared first on Statistical Modeling, Causal Inference,…

## Mathematics Departments and the Talented Mr. Teacher

November 21, 2015
Today we have a guest post from a colleague named Mathprof. The pseudonym perhaps is needed as Mathprof's colleagues might not be pleased to read all mathprof's comments. I did some very minor editing, but otherwise the content is Mathprof's. I asked ...

## 4 California faculty positions in Design-Based Statistical Inference in the Social Sciences

November 21, 2015
This is really cool. The announcement comes from Joe Cummins: The University of California at Riverside is hiring 4 open rank positions in Design-Based Statistical Inference in the Social Sciences. I [Cummins] think this is a really exciting opportunity for researchers doing all kinds of applied social science statistical work, especially work that cuts across […] The post 4 California faculty positions in Design-Based Statistical Inference in the Social Sciences…

## Erich Lehmann: Neyman-Pearson & Fisher on P-values

November 20, 2015
Today is Erich Lehmann’s birthday (20 November 1917 – 12 September 2009). Lehmann was Neyman’s first student at Berkeley (Ph.D 1942), and his framing of Neyman-Pearson (NP) methods has had an enormous influence on the way we typically view them. I got to know Erich in 1997, shortly after publication of EGEK (1996). One day, I received […]

## Stan Puzzle 2: Distance Matrix Parameters

November 20, 2015
$Stan Puzzle 2: Distance Matrix Parameters$

This puzzle comes in three parts. There are some hints at the end. Part I: Constrained Parameter Definition Define a Stan program with a transformed matrix parameter d that is constrained to be a K by K distance matrix. Recall that a distance matrix must satisfy the definition of a metric for all i, j: […] The post Stan Puzzle 2: Distance Matrix Parameters appeared first on Statistical Modeling, Causal…

## Countries of refugees to the US in 2014 and their destinations

November 20, 2015
A tweet from Kyle Walker introduced me to data from the Office of Refugee Resettlement from the US Department of Health and Human Services. Using multiple R packages such as shiny, rCharts, rcdimple, leaflet, and d3heatmap, this post looks at the count...