Update: you can find the next post in this series here. You probably have a favorite Simpsons character. Maybe you hope to someday block out the sun, Mr. Burns style, maybe you enjoy Homer’s skill in averting meltdowns, or maybe you identify wi...

I have been thinking for a while how hard it is to find statisticians to interview for the blog. When I started the interview series, it was targeted at interviewing statisticians at the early stages of their careers. It is … Continue reading →

Washington Post columnist Richard Cohen brings up one of my research topics: In New York City, blacks make up a quarter of the population, yet they represent 78 percent of all shooting suspects — almost all of them young men. We know them from the nightly news. Those statistics represent the justification for New York […]The post “Stop and frisk” statistics appeared first on Statistical Modeling, Causal Inference, and Social…

Andrew Gelman, Columbia professor, wrote an important post about causal thinking (link) that I highly recommend reading. While he approaches the topic from a researcher's perspective, his framing of the issue is very practical, as I will demonstrate in this post. Gelman's main point is the two modes of causal thinking: Forward causality is asking the question, if we change X, how does that change Y? This is typically answered…

Update: you can find the next post in this series here. You probably have a favorite Simpsons character. Maybe you hope to someday block out the sun, Mr. Burns style, maybe you enjoy Homer's skill in averting meltdowns, or maybe you identify with Lisa's struggles for acceptance. Through its characters, the Simpsons made a huge impact on a generation, and although the show is still running, my best memories will…

Mark Blumenthal writes: What do you think about the “random rejection” method used by PPP that was attacked at some length today by a Republican pollster. Our just published post on the debate includes all the details as I know them. The Storify of Martino’s tweets has some additional data tables linked to toward the […]The post A poll that throws away data??? appeared first on Statistical Modeling, Causal Inference,…

Imagine a world in which people are taught that there’s two kinds of counting: there’s potato-counting, and there’s counting other stuff (beans, points, cards, etc.) Potatoes are special, so that potato-counting gets its own courses, under the name “Kartoffelanalysis”. When you take a Kartoffelanalysis 101 course, nobody mentions that you could use the same techniques […]

Nick Firoozye writes: While I am absolutely sympathetic to the Bayesian agenda I am often troubled by the requirement of having priors. We must have priors on the parameter of an infinite number of model we have never seen before and I find this troubling. There is a similarly troubling problem in economics of utility […]The post Priors appeared first on Statistical Modeling, Causal Inference, and Social Science.

The preface to Elements of Statistical Learning opens with the popular quote In God we trust, all others bring data. — William Edwards Deming The footnote to the quote is better than the quote: On the Web, this quote has been widely attributed to both Deming and Robert W. Hayden; however Professor Hayden told us […]

Over at the McGraw-Hill blog, I wrote about how to consume Big Data (link), which is the core theme of my new book. In that piece, I highlight two recent instances in which bloggers demonstrated numbersense in vetting other people's data analyses. (Since the McGraw-Hill link is not working as I'm writing this, I placed a copy of the post here in case you need it.) Below is a detailed…

When you hear about Big Data, you almost always hear about the supply side: Behold the data in un-pronounceable units of bytes! Admire the new science inspired by all the data! Missing from this narrative is the consumption side. A direct consequence of Big Data will be the explosion of data analyses—there will be more people producing more data analyses more quickly. This will be a world of confusing and…

The determinant of a matrix is a number associated with a square (nxn) matrix. The determinant can tell us if columns are linearly correlated, if a system has any nonzero solutions, and if a matrix is invertible. See the wikipedia entry for more details on this. Computing a determinant is key to a lot of linear algebra, and by extension, to a lot of machine learning. It is easy to…

This saturday the New York Times published an opinion pieces wondering "do clinical trials work?". The answer, of course, is: absolutely. For those that don't know the history, randomized control trials (RCTs) are one of the reasons why life spans skyrocketed … Continue reading →

Consider two broad classes of inferential questions: 1. Forward causal inference. What might happen if we do X? What are the effects of smoking on health, the effects of schooling on knowledge, the effect of campaigns on election outcomes, and so forth? 2. Reverse causal inference. What causes Y? Why do more attractive people earn […]The post Forward causal reasoning statements are about estimation; reverse causal questions are about model…

This is cross-posted on my two blogs. For my fans on either of my two blogs, I'm giving away a free signed copy of my new book, Numbersense. (See my book announcement.) All you have to do is to answer...

This is cross-posted on my two blogs. For my fans on either of my two blogs, I'm giving away a free signed copy of my new book, Numbersense. All you have to do is to answer 3 questions, based on a few sample pages (see the PDF here; also on Slideshare). Click on the quiz to enter. The contest is open until Friday, July 19, 2013 (11:59 PM PST). This…

It is often useful to partition observations for a continuous variable into a small number of intervals, called bins. This familiar process occurs every time that you create a histogram, such as the one on the left. In SAS you can create this histogram by calling the UNIVARIATE procedure. Optionally, [...]