Abstract: I present a probability puzzle, the Rain in Seattle Problem, and use it to explain differences between the Bayesian and frequentist interpretations of probability, and between Bayesian and frequentist statistical methods. Since I am try...

The Best Presentation Award of the International Marketing and Output Database Conference IMAODBC 2016 in Gozd Martuljek, Slovenia goes to Susanne Hagenkort-Rieger and her team from DESTATIS (Statistisches Bundesamt, Germany). In her presentation Susanne highlighted the importance of web search statistics and why intuition when emphasizing selected statistical data is often not sufficient. To achieve … Continue reading IMAODBC 2016: And the winner is…

I just returned from the University of Chicago conference, "Machine Learning: What's in it for Economics?" Lots of cool things percolating. I'm teaching a Penn Ph.D. course later this fall on aspects of the ML/econometrics interface. ...

My son is taking an AP Statistics course in high school this year. AP Statistics is one of the fastest-growing AP courses, so I welcome the chance to see topics and techniques in the course. Last week I was pleased to see that they teach data exploration techniques, such as […] The post Create an ogive in SAS appeared first on The DO Loop.

Anoop Balachandran writes: This is one of the abstracts of the paper i am about to publish: My question is can I really say both training program were effective for increasing power and function? Studies of similar duration employing sedentary control showed either negative or 1-2% changes. Also, I don’t think strength and function will […] The post No statistically significant differences for get up and go appeared first on…

From the Wall Street Journal: Several weeks ago, Facebook disclosed in a post on its “Advertiser Help Center” that its metric for the average time users spent watching videos was artificially inflated because it was only factoring in video view...

After the New Hampshire primary Nadia Hassan wrote: Some have noted how minor differences in how the candidates come out in these primaries can make a huge difference in the media coverage. For example, only a few thousand voters separate third and fifth and it really impacts how pundits talk about a candidate’s performance. Chance […] The post Politics and chance appeared first on Statistical Modeling, Causal Inference, and Social…

When people screw up or cheat in their research, what do their collaborators say? The simplest case is when coauthors admit their error, as Cexun Jeffrey Cai and I did when it turned out that we’d miscoded a key variable in an analysis, invalidating the empirical claims of our award-winning paper. On the other extreme, […] The post Cracks in the thin blue line appeared first on Statistical Modeling, Causal…

Last week, my Columbia students discussed this nice article in the New York Times called "The Most Detailed Map of Gay Marriages in America". (link) The center of the article is this map: I asked the students to identify the problem that this dataviz is supposed to address. Someone responded that it tells us where gay married couples are found geographically. I asked her what is the answer to the…

Nate Cohn at the New York Times arranged a comparative study on a recent Florida pre-election poll. He sent the raw data to four groups (Charles Franklin; Patrick Ruffini; Margie Omero, Robert Green, Adam Rosenblatt; and Sam Corbett-Davies, David Rothschild, and me) and asked each of us to analyze the data how we’d like to […] The post Trump +1 in Florida; or, a quick comment on that “5 groups…

The title of this post is a line that Thomas Basbøll wrote a couple years ago. Before I go on, let me say that the fact that I have not investigated this case in detail is not meant to imply that it’s not important or that it’s not worth investigating. It’s just not something that […] The post Andrew Gelman is not the plagiarism police because there is no such…

Alexia Gaudeul writes: Maybe you will find this interesting / amusing / frightening, but the Journal of Risk and Uncertainty recently published a paper with a rather obvious multicollinearity problem. The issue does not come up that often in the published literature, so I thought you might find it interesting for your blog. The paper […] The post Multicollinearity causing risk and uncertainty appeared first on Statistical Modeling, Causal Inference,…

A straightforward but probabilistic riddle this week in the Riddler, which is to find the expected order of integer i when the sequence {1,2,…,n} is partitioned at random into two sets, A and B, each of which is then sorted before both sets are merged. For instance, if {1,2,3,4} is divided in A={1,4} and B={2,3}, […]

The replication crisis is a big deal. But it’s a problem in lots of scientific fields. Why is so much of the discussion about psychology research? Why not economics, which is more controversial and gets more space in the news media? Or medicine, which has higher stakes and a regular flow of well-publicized scandals? Here […] The post Why is the scientific replication crisis centered on psychology? appeared first on…

Crimes Against Data Statistics has been described as the science of uncertainty. But, paradoxically, statistical methods are often used to create a sense of certainty where none should exist. The social sciences have been rocked in recent years by highly publicized claims, published in top journals, that were reported as “statistically significant” but are implausible […] The post “Crimes Against Data”: My talk at Ohio State University this Thurs; “Solving…

Someone sent me this article by psychology professor Susan Fiske, scheduled to appear in the APS Observer, a magazine of the Association for Psychological Science. The article made me a little bit sad, and I was inclined to just keep my response short and sweet, but then it seemed worth the trouble to give some […] The post What has happened down here is the winds have changed appeared first…

Although statisticians often assume normally distributed errors, there are important processes for which the error distribution has a heavy tail. A well-known heavy-tailed distribution is the t distribution, but the t distribution is unsuitable for some applications because it does not have finite moments (means, variance,...) for small parameter values. […] The post Simulate data from a generalized Gaussian distribution appeared first on The DO Loop.

Suppose you did a pilot study with 10 subjects and found a treatment was effective in 7 out of the 10 subjects. With no more information than this, what would you estimate the probability to be that the treatment is effective in the next subject? Easy: 0.7. Now what would you estimate the probability to be […]

Journals should not corral shorter papers into sections like "Shorter Papers". Doing so sends a subtle (actually unsubtle) message that shorter papers are basically second-class citizens, somehow less good, or less important, or less so...