This a machine-learning-vs-traditional-statistics kind of blog post inspired by Leo Breiman's "Statistical Modeling: The Two Cultures". If you're like: "I had enough of this machine learning vs. statistics discussion, BUT I would love to see beau...

Introduction Continuing on the recently born series on numerical integration, this post will introduce rectangular integration. I will describe the concept behind rectangular integration, show a function in R for how to do it, and use it to check that the distribution actually integrates to 1 over its support set. This post follows from my […]

In our new ethics column for Chance, Eric Loken and I write about our current favorite topic: One of our ongoing themes when discussing scientific ethics is the central role of statistics in recognizing and communicating uncer- tainty. Unfortunately, statistics—and the scientific process more generally—often seems to be used more as a way of laundering […]The post The AAA Tranche of Subprime Science appeared first on Statistical Modeling, Causal Inference,…

Tesla is hiring a data scientist. That is all. I'm not sure I buy the idea that Python is taking over for R among people who actually do regular data science. I think it is still context dependent. A huge … Continue reading →

Cette fin de semaine, Martin Grandjean a mis en ligne un billet intéressant sur son blog, sur l’utilisation des statistiques (dans un but de propagande). L’exercice n’est pas nouveau, mais Martin soulève des questions, malheureusement importantes et complexes. Dans un paragraphe, intitulé “faire parler les chiffres… n’importe comment” (que j’ai repris comme titre, j’avoue avoir hésité avec “with great power comes great responsibility“), on retrouve l’analyse (rapide) d’un graphique, présenté ci-dessous.…

Andrew Anthony tells the excellent story of how Nick Brown, Alan Sokal, and Harris Friedman shot down some particularly silly work in psychology. (“According to the graph, it all came down to a specific ratio of positive emotions to negative emotions. If your ratio was greater than 2.9013 positive emotions to 1 negative emotion you […]The post “The British amateur who debunked the mathematics of happiness” appeared first on Statistical…

Nassim Nicholas Taleb recently wrote an article advocating the abandonment of the use of standard deviation and advocating the use of mean absolute deviation. Mean absolute deviation is indeed an interesting and useful measure- but there is a reason that standard deviation is important even if you do not like it: it prefers models that […] Related posts: Don’t use correlation to track prediction performance What does a generalized linear…

Some facts and some speculation. Definition Volatility is the annualized standard deviation of returns — it is often expressed in percent. A volatility of 20 means that there is about a one-third probability that an asset’s price a year from now will have fallen or risen by more than 20% from its present value. In … Continue reading →

Steve Peterson writes: I recently submitted a proposal on applying a Bayesian analysis to gender comparisons on motivational constructs. I had an idea on how to improve the model I used and was hoping you could give me some feedback. The data come from a survey based on 5-point Likert scales. Different constructs are measured […]The post Transformations for non-normal data appeared first on Statistical Modeling, Causal Inference, and Social…

A colleague asked if I had any material for a course in sample surveys. And indeed I do. See here. It’s all the slides for a 14-week course, also the syllabus (“surveyscourse.pdf”), the final exam (“final2012.pdf”) and various misc files. Also more discussion of final exam questions here (keep scrolling thru the “previous entries” until […]The post A course in sample surveys for political science appeared first on Statistical Modeling,…

UPDATE: THE BLOG/SITE HAS MOVED TO GITHUB. THE NEW LINK FOR THE BLOG/SITE IS patilv.github.io and THE LINK TO THIS POST IS: http://bit.ly/1jccIBN. PLEASE UPDATE ANY BOOKMARKS YOU MAY HAVE.This post uses animated choropleths to visualize violent crime r...

Consider a standard linear regression setting with \(K\) regressors and sample size \(N\). We will say that an estimator \(\hat{\beta}\) is consistent for a treatment effect (``T-consistent") if \(plim \hat{\beta}_k = {\partial E(y|x) }/{\partial x_k}\...