December 7, 2014
Until a few weeks ago, I’d never even heard of a “power morcellator.” Nor was I aware of the controversy that has pitted defenders of a woman’s right to choose a minimally invasive laparoscopic procedure in removing fibroids—enabled by the power morcellator–and those who decry the danger it poses in spreading an undetected uterine cancer throughout a […]

December 6, 2014
I've put in a lot of time over the years as an Editor, Associate Editor, or Editorial Board member, for a number of economics and statistics journals, ranging from Journal of Econometrics and Econometric Theory, to Journal of International Tr...

December 6, 2014
I think most of you understand this one already but there still seems to be some confusion on how plagiarism works, so here goes . . . Basbøll links to a twitter feed by Adam Kotsko, a scholar of religion who’s written about the work of controversial philosopher Slavoj Zizek. Kotsko appears to be annoyed […] The post Plaig: it’s not about the copying, it’s about the lack of attribution…

## SAS PROC MCMC in R: Nonlinear Poisson Regression Models

December 6, 2014
In exercise 61.1 the problem is that the model has bad mixing. In the SAS manual the mixing is demonstrated after which a modified distribution is used to fix the model.In this post the same problem is tackled in R; MCMCpack, RJags, RStan and LaplaceDe...

## Archetypal Analysis: Similarity Defined by Distances from Contrasting Ideals

December 5, 2014
Carl Jung was at least partially correct. We do tend to think in terms of the extremes as shown in this archetypal wheel with rulers versus outlaws and heroes versus caregivers at different ends of bipolar dimensions. Happily, we are not required to ac...

## A parable on confidence intervals: why "confidence" is misleading

December 5, 2014
Null hypothesis significance testing (NHST) is increasingly falling out of style with methodologically-minded behavioral and social scientists. Many diverse critiques have been leveled against significance testing; the debate is increasingly what should replace it. Building on work with my colleagues (see here and here), I discuss and critique one replacement option that has been persistently suggested over the years: confidence procedures. We begin with a parable.A parableSusan and Mark were…

## Interview with Cole Trapnell of UW Genome Sciences

December 5, 2014
Cole Trapnell is an Assistant Professor of Genome Sciences at the University of Washington. He is the developer of multiple incredibly widely used tools for genomics including Tophat, Cufflinks, and Monocle. His lab at UW studies cell differentiation, reprogramming, and other transitions between stable or metastable cellular states using a combination of computational and experimental techniques.

## A matrix computation on Pascal’s triangle

December 5, 2014
A colleague asked me a question regarding my recent post about the Pascal triangle matrix. While responding to his question, I discovered a program that I had written in 1999 that computed with a Pascal triangle matrix. Wow, I've been computing with Pascal's triangle for 15 years! I don't know […]

## The persistence of the “schools are failing” story line

December 5, 2014
I happened to come across a post from 2011 about some work of Roland Fryer, a prominent economist who works in education research. In an article, Fryer made the offhand remark that “test scores have been largely constant over the past thirty years,” a claim that was completely contradicted by one of the graphs in […] The post The persistence of the “schools are failing” story line appeared first on…

## Prediction competitions

December 5, 2014
Competitions have a long history in forecasting and prediction, and have been instrumental in forcing research attention on methods that work well in practice. In the forecasting community, the M competition and M3 competition have been particularly influential. The data mining community have the annual KDD cup which has generated attention on a wide range […]

## “Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance” (Dec 3 Seminar slides)

December 5, 2014
Below are the slides from my Rutgers seminar for the Department of Statistics and Biostatistics yesterday, since some people have been asking me for them. The abstract is here. I don’t know how explanatory a bare outline like this can be, but I’d be glad to try and answer questions[i]. I am impressed at how interested in foundational matters I […]

## More on Prediction From Log-Linear Regressions

December 4, 2014
My therapy sessions are actually going quite well. I'm down to just one meeting with Jane a week, now. Yes, there are still far too many log-linear regressions being bandied around, but I'm learning to cope with it!Last year, in an attempt to be helpfu...

## Initial steps towards reproducible research

December 4, 2014
In anticipation of next week’s Reproducible Science Hackathon at NESCent, I was thinking about Christie Bahlai’s post on “Baby steps for the open-curious.” Moving from Ye Olde Standard Computational Science Practice to a fully reproducible workflow seems a monumental task, but partially reproducible is better than not-at-all reproducible, and it’d be good to give people […]

## A comment on preparing data for classifiers

December 4, 2014
I have been working through (with some honest appreciation) a recent article comparing many classifiers on many data sets: “Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?” Manuel Fernández-Delgado, Eva Cernadas, Senén Barro, Dinani Amorim; 15(Oct):3133−3181, 2014 (which we will call “the DWN paper” in this note). This paper applies 179 … Continue reading A comment on preparing data for classifiers → Related posts: The Geometry…

## Degrees of Freedom and Information Criteria

December 4, 2014
$Degrees of Freedom and Information Criteria$

Degrees of freedom and information criteria are two fundamental concepts in statistical modeling, which are also taught in introductory statistics courses. But what are the exact abstract definitions for them which can be used to derive specific calculation formula in different situations. I often use fit criteria like AIC and BIC to choose between models. […]

## Repost: A deterministic statistical machine

December 4, 2014
Editor's note: This is a repost of our previous post about deterministic statistical machines. It is inspired by the recent announcement that the Automatic Statistician received funding from Google. In 2012 we also applied to Google for a small research award to study this same problem, but didn't get it. In the interest of extreme openness

## Designing a study to see if “the 10x programmer” is a real thing

December 4, 2014
Lorin H. writes: One big question in the world of software engineering is: how much variation is there in productivity across programmers? (If you google for “10x programmer” you’ll see lots of hits). Let’s say I wanted to explore this research question with a simple study. Choose a set of participants at random from a […] The post Designing a study to see if “the 10x programmer” is a real…

## The Rock Hyrax Problem

December 4, 2014
This is the third of a series of articles about Bayesian analysis.  The previous article is here.Earlier this semester I posed this problem to my Bayesian statistics class at Olin College:Suppose I capture and tag 10 rock hyrax...

## Useful for referring—12-04-2014

December 4, 2014
Tutorial: How to detect spurious correlations, and how to find the … Practical illustration of Map-Reduce (Hadoop-style), on real data Jackknife logistic and linear regression for clustering and predict… From the trenches: 360-degrees data science A synthetic variance designed for Hadoop and big data Fast Combinatorial Feature Selection with New Definition of Predict… A […]

December 4, 2014
I have been asked to speak at the 2014 LRA Conference on the topic of Academic Blogging. Time: 1:15-2:15 Location: Islands Ballroom Salon B – Lobby Level My Slides: http://clari.buffalo.edu/blog My Précis: http://clari.buffalo.edu/blog/materials/precis.pdf The talk is part of a larger … Continue reading →

## Study of a plot

December 3, 2014
I began to think on a nice way of plotting campaign expenditures in a paper I'm working on. I thought this would be something like the following--simple but meaningful even when there are outliers in both tails. Though I like the seniors Tukey's boxplot and scatter plots, I had already used them the last time […]

## If observational studies are outlawed, then only outlaws will do observational studies

December 3, 2014
My article “Experimental reasoning in social science” begins as follows: As a statistician, I was trained to think of randomized experimentation as representing the gold standard of knowledge in the social sciences, and, despite having seen occasional arguments to the contrary, I still hold that view, expressed pithily by Box, Hunter, and Hunter (1978) that […] The post If observational studies are outlawed, then only outlaws will do observational studies…

## Where a scatter plot fails

December 3, 2014
Found this chart in the magazine that Charles Schwab sends to customers: When there are two variables, and their correlation is of interest, a scatter plot is usually recommended. But not here! The text labels completely dominate this chart and...