Wanna learn Stan? Everybody’s talking bout it. Here’s a way to jump in: Stan Case Studies. Find one you like and try it out. P.S. I blogged this last month but it’s so great I’m blogging it again. For this post, the target ...

Wanna learn Stan? Everybody’s talking bout it. Here’s a way to jump in: Stan Case Studies. Find one you like and try it out. P.S. I blogged this last month but it’s so great I’m blogging it again. For this post, the target ...

I have come several times upon cases of scientists [I mean, real, recognised, publishing, senior scientists!] from other fields blindly copying MCMC code from a paper or website, and expecting the program to operate on their own problem… One illustration is from last week, when I read a X Validated question [from 2013] about an […]

Deborah Mayo asked me some questions about that paper (“Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors”), and here’s how I responded: I am not happy with the concepts of “power,” “type 1 error,” and “type 2 error,” because all these are defined in terms of statistical significance, which I am […] The post More on my paper with John Carlin on Type M and Type…

A rather obscure question on Metropolis-Hastings algorithms on X Validated ended up being about our first illustration in Introducing Monte Carlo methods with R. And exposing some inconsistencies in the following example… Example 7.2 is based on a [toy] joint Beta x Binomial target, which leads to a basic Gibbs sampler. We thought this was […]

Bill Kelleher writes: I recently posted a review of A Model Discipline, by Clarke and Primo on Amazon.com. My review is entitled “Why Physics Envy will Persist,” at http://www.amazon.com/gp/review/R3I8GC5V1ZSYVI/ref=cm_cr_pr_rvw_ttl?ASIN=019538220X As you likely know, they are critical of the widespread belief among political scientists in the hypothetical-deductive method. As part of my review of the book, […] The post The role of models and empirical work in political science appeared first…

Nina Zumel and I have been doing a lot of writing on the (important) details of re-encoding high cardinality categorical variables for predictive modeling. These are variables that essentially take on string-values (also called levels or factors) and vary through many such levels. Typical examples include zip-codes, vendor IDs, and product codes. In a sort … Continue reading You should re-encode high cardinality categorical variables

Background: Hillary Clinton was given a 65% or 80% or 90% chance of winning the electoral college. She lost. Naive view: The poll-based models and the prediction markets said Clinton would win, and she lost. The models are wrong! Slightly sophisticated view: The predictions were probabilistic. 1-in-3 events happen a third of the time. 1-in-10 […] The post Election surprise, and Three ways of thinking about probability appeared first on…

If you obtain data from web sites, social media, or other unstandardized data sources, you might not know the form of dates in the data. For example, the US Independence Day might be represented as "04JUL1776", "07/04/1776", "Jul 4, 1776", or "July 4, 1776." Fortunately, the ANYDTDTE informat makes it […] The post One informat to rule them all: Read any date into SAS appeared first on The DO Loop.

David Rothschild and Sharad Goel write: In a new paper with Andrew Gelman and Houshmand Shirani-Mehr, we examined 4,221 late-campaign polls — every public poll we could find — for 608 state-level presidential, Senate and governor’s races between 1998 and 2014. Comparing those polls’ results with actual electoral results, we find the historical margin of […] The post David Rothschild and Sharad Goel called it (probabilistically speaking) appeared first on…

Daniel Hawkins pointed me to a post by Kevin Drum entitled, “Crime in St. Louis: It’s Lead, Baby, Lead,” and the associated research article by Brian Boutwell, Erik Nelson, Brett Emo, Michael Vaughn, Mario Schootman, Richard Rosenfeld, Roger Lewis, “The intersection of aggregate-level lead exposure and crime.” The short story is that the areas of […] The post Can a census-tract-level regression analysis untangle correlation between lead and crime? appeared…

Dear Lab Members, I know that the results of Tuesday’s election have many of you concerned about your future. You are not alone. I am concerned about my future as well. But I want you to know that I have no plans of going anywhere and I intend to de...

This is the second of a series of two posts. Yesterday we discussed the difficulties of learning from a small, noisy experiment, in the context of a longitudinal study conducted in Jamaica where researchers reported that an early-childhood intervention program caused a 42%, or 25%, gain in later earnings. I expressed skepticism. Today I want […] The post How effective (or counterproductive) is universal child care? Part 2 appeared first…

Depuis plusieurs heures, il y a une phrase qui revient sans cesse, et qui m’agace au plus haut point, sur “le modèle est faux…“. C’est une phrase que j’ai entendu pas plus tard qu’au séminaire ce midi, lorsqu’un collègue a dit que très clairement les modèles étaient faux puisqu’ils n’avaient pas prédit le vainqueur des élections américaines. Par exemple, le jour des élections, Huffington Post annonçait que Donald Trump avait…

This afternoon, one of my Monte Carlo students at ENSAE came to me with an exercise from Monte Carlo Statistical Methods that I did not remember having written. And I thus “charged” George Casella with authorship for that exercise! Exercise 3.3 starts with the usual question (a) about the (Binomial) precision of a tail probability […]

Nina Zumel recently mentioned the use of Laplace noise in “count codes” by Misha Bilenko (see here and here) as a known method to break the overfit bias that comes from using the same data to design impact codes and fit a next level model. It is a fascinating method inspired by differential privacy methods, … Continue reading Laplace noising versus simulated out of sample methods (cross frames)

The title of this post says it all. A 2% shift in public opinion is not so large and usually would not be considered shocking. In this case the race was close enough that 2% was consequential. Here’s the background: Four years ago, Mitt Romney received 48% of the two-party vote and lost the presidential […] The post Explanations for that shocking 2% shift appeared first on Statistical Modeling, Causal…

Jon Schwabish and Severino Ribecca have turned their Graphic Continuum poster into a set of cards. They're a good way to expand your visual vocabulary and find new ideas for how to represent your data. Each card shows one visualization technique as a stylized image on one side and a short definition on the other. They […]

The big story in yesterday’s election is that Donald Trump did about 2% better than predicted from the polls. Trump got 50% of the two-party vote (actually, according to the most recent count, Hillary Clinton won the popular vote, just barely) but was predicted to get only 48%. First let’s compare the 2016 election to […] The post A 2% swing: The poll-based forecast did fine (on average) in blue…

This is the first of a series of two posts. We’ve talked before about various empirically-based claims of the effectiveness of early childhood intervention. In a much-publicized 2013 paper based on a study of 130 four-year-old children in Jamaica, Paul Gertler et al. claimed that a particular program caused a 42% increase in the participants’ […] The post How effective (or counterproductive) is universal child care? Part 1 appeared first…