## Sunday morning puzzle

November 21, 2015
A question from X validated that took me quite a while to fathom and then the solution suddenly became quite obvious: If a sample taken from an arbitrary distribution on {0,1}⁶ is censored from its (0,0,0,0,0,0) elements, and if the marginal probabilities are know for all six components of the random vector, what is an […]

November 21, 2015
We have always regretted that we didn’t get to cover gradient boosting in Practical Data Science with R (Manning 2014). To try make up for that we are sharing (for free) our GBM lecture from our (paid) video course Introduction to Data Science. (link, all support material here). Please help us get the word out … Continue reading Free gradient boosting lecture

## Benford lays down the Law

November 21, 2015
A few months ago I received in the mail a book called An Introduction to Benford’s Law by Arno Berger and Theodore Hill. I eagerly opened it but I lost interest once I realized it was essentially a pure math book. Not that there’s anything wrong with math, it just wasn’t what I wanted to […] The post Benford lays down the Law appeared first on Statistical Modeling, Causal Inference,…

## Mathematics Departments and the Talented Mr. Teacher

November 21, 2015
Today we have a guest post from a colleague named Mathprof. The pseudonym perhaps is needed as Mathprof's colleagues might not be pleased to read all mathprof's comments. I did some very minor editing, but otherwise the content is Mathprof's. I asked ...

## 4 California faculty positions in Design-Based Statistical Inference in the Social Sciences

November 21, 2015
This is really cool. The announcement comes from Joe Cummins: The University of California at Riverside is hiring 4 open rank positions in Design-Based Statistical Inference in the Social Sciences. I [Cummins] think this is a really exciting opportunity for researchers doing all kinds of applied social science statistical work, especially work that cuts across […] The post 4 California faculty positions in Design-Based Statistical Inference in the Social Sciences…

## Erich Lehmann: Neyman-Pearson & Fisher on P-values

November 20, 2015
Today is Erich Lehmann’s birthday (20 November 1917 – 12 September 2009). Lehmann was Neyman’s first student at Berkeley (Ph.D 1942), and his framing of Neyman-Pearson (NP) methods has had an enormous influence on the way we typically view them. I got to know Erich in 1997, shortly after publication of EGEK (1996). One day, I received […]

## Stan Puzzle 2: Distance Matrix Parameters

November 20, 2015
$Stan Puzzle 2: Distance Matrix Parameters$

This puzzle comes in three parts. There are some hints at the end. Part I: Constrained Parameter Definition Define a Stan program with a transformed matrix parameter d that is constrained to be a K by K distance matrix. Recall that a distance matrix must satisfy the definition of a metric for all i, j: […] The post Stan Puzzle 2: Distance Matrix Parameters appeared first on Statistical Modeling, Causal…

## Countries of refugees to the US in 2014 and their destinations

November 20, 2015
A tweet from Kyle Walker introduced me to data from the Office of Refugee Resettlement from the US Department of Health and Human Services. Using multiple R packages such as shiny, rCharts, rcdimple, leaflet, and d3heatmap, this post looks at the count...

## Tip o’ the iceberg to ya

November 20, 2015
Paul Alper writes: The Washington Post ran this article by Fred Barbas with an interesting quotation: “Every day, on average, a scientific paper is retracted because of misconduct,” Ivan Oransky and Adam Marcus, who run Retraction Watch, wrote in a New York Times op-ed in May. But, can that possibly be true, just for misconduct […] The post Tip o’ the iceberg to ya appeared first on Statistical Modeling, Causal…

## Internet use and religion, part three

November 19, 2015
This article reports preliminary results from an exploration of the relationship between religion and Internet use in Europe, using data from the European Social Survey (ESS).I describe the data processing pipeline and models in this previous article. ...

## Some Links Related to Randomized Controlled Trials for Policymaking

November 19, 2015
In response to my previous post, Avi Feller sent me these links related to efforts promoting the use of RCTs  and evidence-based approaches for policymaking:  The theme of this year's just-concluded APPAM conference (the national public policy research organization) was "evidence-based policymaking," with a headline panel on using experiments in policy (see here and here). Jeff Liebman

## Fluid use of data

November 19, 2015
Nina Zumel and I recently wrote a few article and series on best practices in testing models and data: Random Test/Train Split is not Always Enough How Do You Know if Your Data Has Signal? How do you know if your model is going to work? A Simpler Explanation of Differential Privacy (explaining the reusable … Continue reading Fluid use of data

## Habits and open data: Helping students develop a theory of scientific mind

November 19, 2015
This post is related to my open science talk with Candice Morey at Psychonomics 2015 in Chicago; also read Candice's new post on the pragmatics: "A visit from the Ghost of Research Past". In this post, we suggest three ideas that can be implemented in ...

## I like the Monkey Cage

November 19, 2015
The sister blog is a good place to reach a wider audience, also our co-bloggers and guests have interesting posts on important topics, but what I really like about our blog at the Washington Post is its seriousness and its political science perspective. For better or worse, political science does not have a high profile […] The post I like the Monkey Cage appeared first on Statistical Modeling, Causal Inference,…

## Egregious chart brings back bad memories

November 19, 2015
My friend Alberto Cairo said it best: if you see bullshit, say "bullshit!" He was very incensed by this egregious "infographic": (link to his post) Emily Schuch provided a re-visualization: The new version provides a much richer story of how...

## The Unreported War On America’s Poor

November 18, 2015
The Democratic firebrand Bernie Sander's keeps harping on this point about income inequality in the United States, yet I have to wonder, how bad is it really and do we care?First off, there is a legitimate reason to ask, if we should care. After all, t...

## First, second, and third order bias corrections (also, my ugly R code for the mortality-rate graphs!)

November 18, 2015
As an applied statistician, I don’t do a lot of heavy math. I did prove a true theorem once (with the help of some collaborators), but that was nearly twenty years ago. Most of the time I walk along pretty familiar paths, just hoping that other people will do the mathematical work necessary for me […] The post First, second, and third order bias corrections (also, my ugly R code…

## Pareto smoothed importance sampling and infinite variance (2nd ed)

November 18, 2015
This post is by Aki Last week Xi’an blogged about an arXiv paper by Chatterjee and Diaconis which considers the proper sample size in an importance sampling setting with infinite variance. I commented Xi’an’s posting and the end result was my guest blog posting in Xi’an’s og. I made an additional figure below to summarise […] The post Pareto smoothed importance sampling and infinite variance (2nd ed) appeared first on…

## Create a map with PROC SGPLOT

November 18, 2015
Did you know that you can use the POLYGON statement in PROC SGPLOT to draw a map? The graph at the left shows the 48 contiguous states of the US, overlaid with markers that indicate the locations of major cities. The plot was created by using the POLYGON statement, which […] The post Create a map with PROC SGPLOT appeared first on The DO Loop.

November 18, 2015
The conference on Neural Information Processing Systems (NIPS) has conducted a fascinating experiment: split the program committee into two and get 10% of submissions reviewed by both. The article I’m linking to above has a great analysis of what they found (and it’s not encouraging). This would be a great experiment to run at VIS. Anybody who has spent any … Continue reading Link: The NIPS Experiment

## Potato Chips and ANOVA, Part 2: Using Analysis of Variance to Improve Sample Preparation in Analytical Chemistry

In this second article of a 2-part series on the official JMP blog, I use analysis of variance (ANOVA) to assess a sample-preparation scheme for quantifying sodium in potato chips.  I illustrate the use of the “Fit Y by X” platform in JMP to implement ANOVA, and I propose an alternative sample-preparation scheme to obtain […]

## Given the history of medicine, why are randomized trials not used for social policy?

November 17, 2015
Policy changes can have substantial societal effects. For example, clean water and  hygiene policies have saved millions, if not billions, of lives. But effects are not always positive. For example, prohibition, or the "noble experiment", boosted organized crime, slowed economic growth and increased deaths caused by tainted liquor. Good intentions do not guarantee desirable outcomes. The medical establishment is well

## Internet use and religion, part two

November 17, 2015
In the previous article, I posted a preliminary exploration of the relationship between Internet use and religious affiliation in Europe.  In this article I clean up some data issues and present results broken by country.Cleaning and resamplingHer...