## Estimating the exponent of discrete power law data

November 24, 2015
By

Suppose you have data from a discrete power law with exponent α. That is, the probability of an outcome n is proportional to n-α. How can you recover α? A naive approach would be to gloss over the fact that you have discrete data and use the MLE (maximum likelihood estimator) for continuous data. That […]

## Statbusters: please back up an extreme claim with numbers

November 23, 2015
By

In this week's Statbusters, my column with Andrew Gelman in the Daily Beast, we take note of Slate's recent rant about "wasteful" anti-smoking advertising, and demonstrate how to think about cost-benefit analysis. The key point is: if you are going to make an extreme claim, you better have some numbers to back it up. These numbers can be approximate, and based on (potentially dubious) Googled data. Not every analysis needs…

## I already know who will be president in 2016 but I’m not telling

November 23, 2015
By

Nadia Hassan writes: One debate in political science right now concerns how the economy influences voters. Larry Bartels argues that Q14 and Q15 impact election outcomes the most. Doug Hibbs argues that all 4 years matter, with later growth being more important. Chris Wlezien claims that the first two years don’t influence elections but the […] The post I already know who will be president in 2016 but I’m not…

## Efficiency in space usage leads to efficiency in comprehension

November 23, 2015
By

Consider the following two charts that illustrate the same data. (I deliberately took out the header text to make a point. The original chart came from the Wall Street Journal.) To me, the line chart gets to the point more...

## On Bayesian DSGE Modeling with Hard and Soft Restrictions

November 23, 2015
By

A theory is essentially a restriction on a reduced form. It can be imposed directly (hard restrictions) or used as as a prior mean in a more flexible Bayesian analysis (soft restrictions). The soft restriction approach -- "theory as a shrinkage directi...

## Determine whether a SAS product is licensed

November 23, 2015
By

Sometimes you are writing a program that needs to find out whether a particular SAS product (like SAS/ETS, SAS/QC, or SAS/OR) is licensed. I was reminded of this fact when I wrote last week's blog post about how to create a map with PROC SGPLOT. Although the SGPLOT procedure is […] The post Determine whether a SAS product is licensed appeared first on The DO Loop.

## Paper: The Connected Scatterplot for Presenting Paired Time Series

November 23, 2015
By

I’m very happy to finally be able to announce our paper on the connected scatterplot technique. It describes the technique, provides some historical perspective, and most of all looks into how easy to understand and engaging the technique actually is. The connected scatterplot isn’t really known in visualization, but has gotten some interest in journalism. … Continue reading Paper: The Connected Scatterplot for Presenting Paired Time Series

## Top 9 questions to ask a statistician

November 23, 2015
By

Someone writes in: I am a student at . . . We have been given an assignment that requires us to interview a professional in the criminal justice field who performs, or has performed, statistical analyses on social science related data. . . . We are supposed to collect information pertaining to job description, job […] The post Top 9 questions to ask a statistician appeared first on Statistical Modeling,…

## If a study is worth a mention, it’s worth a link

November 22, 2015
By

Gur Huberman points to this op-ed entitled “Are Good Doctors Bad for Your Health?” and writes: Can’t the NYT provide a link or an explicit reference to the JAMA Internal Medicine article underlying this OpEd? A reader could then access the original piece and judge its credibility for himself I replied: Yes, very tacky of […] The post If a study is worth a mention, it’s worth a link appeared…

November 22, 2015
By

Philipp Hennig, Michael Osborne, and Mark Girolami write: We deliver a call to arms for probabilistic numerical methods: algorithms for numerical tasks, including linear algebra, integration, optimization and solving differential equations, that return uncertainties in their calculations. . . . We describe how several seminal classic numerical methods can be interpreted naturally as probabilistic inference. […] The post Flatten your abs with this new statistical approach to quadrature appeared first…

## Sunday morning puzzle

November 21, 2015
By

A question from X validated that took me quite a while to fathom and then the solution suddenly became quite obvious: If a sample taken from an arbitrary distribution on {0,1}⁶ is censored from its (0,0,0,0,0,0) elements, and if the marginal probabilities are know for all six components of the random vector, what is an […]

November 21, 2015
By

We have always regretted that we didn’t get to cover gradient boosting in Practical Data Science with R (Manning 2014). To try make up for that we are sharing (for free) our GBM lecture from our (paid) video course Introduction to Data Science. (link, all support material here). Please help us get the word out … Continue reading Free gradient boosting lecture

## Benford lays down the Law

November 21, 2015
By

A few months ago I received in the mail a book called An Introduction to Benford’s Law by Arno Berger and Theodore Hill. I eagerly opened it but I lost interest once I realized it was essentially a pure math book. Not that there’s anything wrong with math, it just wasn’t what I wanted to […] The post Benford lays down the Law appeared first on Statistical Modeling, Causal Inference,…

## Mathematics Departments and the Talented Mr. Teacher

November 21, 2015
By

Today we have a guest post from a colleague named Mathprof. The pseudonym perhaps is needed as Mathprof's colleagues might not be pleased to read all mathprof's comments. I did some very minor editing, but otherwise the content is Mathprof's. I asked ...

## 4 California faculty positions in Design-Based Statistical Inference in the Social Sciences

November 21, 2015
By

This is really cool. The announcement comes from Joe Cummins: The University of California at Riverside is hiring 4 open rank positions in Design-Based Statistical Inference in the Social Sciences. I [Cummins] think this is a really exciting opportunity for researchers doing all kinds of applied social science statistical work, especially work that cuts across […] The post 4 California faculty positions in Design-Based Statistical Inference in the Social Sciences…

## Erich Lehmann: Neyman-Pearson & Fisher on P-values

November 20, 2015
By

Today is Erich Lehmann’s birthday (20 November 1917 – 12 September 2009). Lehmann was Neyman’s first student at Berkeley (Ph.D 1942), and his framing of Neyman-Pearson (NP) methods has had an enormous influence on the way we typically view them. I got to know Erich in 1997, shortly after publication of EGEK (1996). One day, I received […]

## Stan Puzzle 2: Distance Matrix Parameters

November 20, 2015
By
$Stan Puzzle 2: Distance Matrix Parameters$

This puzzle comes in three parts. There are some hints at the end. Part I: Constrained Parameter Definition Define a Stan program with a transformed matrix parameter d that is constrained to be a K by K distance matrix. Recall that a distance matrix must satisfy the definition of a metric for all i, j: […] The post Stan Puzzle 2: Distance Matrix Parameters appeared first on Statistical Modeling, Causal…

## Countries of refugees to the US in 2014 and their destinations

November 20, 2015
By

A tweet from Kyle Walker introduced me to data from the Office of Refugee Resettlement from the US Department of Health and Human Services. Using multiple R packages such as shiny, rCharts, rcdimple, leaflet, and d3heatmap, this post looks at the count...

## Tip o’ the iceberg to ya

November 20, 2015
By

Paul Alper writes: The Washington Post ran this article by Fred Barbas with an interesting quotation: “Every day, on average, a scientific paper is retracted because of misconduct,” Ivan Oransky and Adam Marcus, who run Retraction Watch, wrote in a New York Times op-ed in May. But, can that possibly be true, just for misconduct […] The post Tip o’ the iceberg to ya appeared first on Statistical Modeling, Causal…

## Internet use and religion, part three

November 19, 2015
By

This article reports preliminary results from an exploration of the relationship between religion and Internet use in Europe, using data from the European Social Survey (ESS).I describe the data processing pipeline and models in this previous article. ...

## Some Links Related to Randomized Controlled Trials for Policymaking

November 19, 2015
By

In response to my previous post, Avi Feller sent me these links related to efforts promoting the use of RCTs  and evidence-based approaches for policymaking:  The theme of this year's just-concluded APPAM conference (the national public policy research organization) was "evidence-based policymaking," with a headline panel on using experiments in policy (see here and here). Jeff Liebman

## Fluid use of data

November 19, 2015
By

Nina Zumel and I recently wrote a few article and series on best practices in testing models and data: Random Test/Train Split is not Always Enough How Do You Know if Your Data Has Signal? How do you know if your model is going to work? A Simpler Explanation of Differential Privacy (explaining the reusable … Continue reading Fluid use of data

## Habits and open data: Helping students develop a theory of scientific mind

November 19, 2015
By

This post is related to my open science talk with Candice Morey at Psychonomics 2015 in Chicago; also read Candice's new post on the pragmatics: "A visit from the Ghost of Research Past". In this post, we suggest three ideas that can be implemented in ...