There's a lot of discussion and also big hope about what is called Big Data and the role of Data

The top-seeded comedian vs. an unseeded philosopher. Pryor would be much more entertaining, that's for sure ("Arizona State Penitentiary population: 80 percent black people. But there are no black people in Arizona!"). But Karl Popper laid out the philosophy that is the foundation for modern science. His talk, even if it is dry, might ultimately

OK, it's been a busy email day. From Brandon Nakawaki: I know your blog is perpetually backlogged by a few months, but I thought I'd forward this to you in case it hadn't hit your inbox yet. A journal called Basic and Applied Social Psychology is banning null hypothesis significance testing in favor of descriptive

Win-Vector LLC's Nina Zumel and John Mount are proud to announce their new data science video course Introduction to Data Science is now available on Udemy. We designed the course as an introduction to an advanced topic. The course description is: Use the R Programming Language to execute data science projects and become a data

MONTHLY MEMORY LANE: 3 years ago: February 2012. I am to mark in red three posts (or units) that seem most apt for general background on key issues in this blog. Given our Fisher reblogs, we've already seen many this month. So, I'm marking in red (1) The Triad, and (2) the Unit on Spanos' misspecification tests. Plase see those posts for

Yesterday's is a super-tough call. I'd much rather hear Stewart Lee than Aristotle. I read one of Lee's books, and he's a fascinating explicator of performance. Lee gives off a charming David Owen vibe—Phil, you know what I'm saying here—he's an everyman, nothing special, he's just been thinking really hard lately and wants to share

The Graphic Continuum is a poster created by Jon Schwabish and Severino Ribecca (the man behind the Data Visualisation Catalogue). It lists almost 90 different chart types and organizes them into five large groups: distribution, time, comparing categories, geospatial, part-to-whole, and relationships. Some of them are connected across groups where there are further similarities. The poster is printed very nicely and

John Sukup writes: I came across a chart recently posted by Boston Consulting Group on LinkedIn and wondered what your take on it was. To me, it seems to fall into the "suspicious" category but thought you may have a different opinion. I replied that this one baffles me cos I don't know what the

Chapter 1 of Numbersense (link)uses the example of U.S. News ranking of law schools to explore the national pastime of ranking almost anything. Since there is no objective standard for the "correct" ranking, it is pointless to complain about "arbitrary" weighting and so on. Every replacement has its own assumptions. A more productive path forward is to understand how the composite ranking is created, and shine a light on the

Consider the following dataset, with (only) ten points x=c(.4,.55,.65,.9,.1,.35,.5,.15,.2,.85) y=c(.85,.95,.8,.87,.5,.55,.5,.2,.1,.3) plot(x,y,pch=19,cex=2) We want to get – say – two clusters. Or more specifically, two sets of observations, each of them sharing some similarities. Since the number of observations is rather small, it is actually possible to get an exhaustive list of all partitions, and to minimize some criteria, such as the within variance. Given a vector with clusters, we compute…

In a dose-finding clinical trial, you have a small number of doses to test, and you hope find the one with the best response. Here "best" may mean most effective, least toxic, closest to a target toxicity, some combination of criteria, etc. Since your goal is to find the best dose, it seems natural to compare dose-finding

Yesterday's winner is a tough one. Really, these two guys could've met in the final. Some arguments in the comments in favor of Freud: From Huw, "he has the smirks, knowing looks, and barely missed sidelong glances." And Seth points out the statistical connection: "Some people might say that theory is getting lost in the

If you're in NYC or Sidney, there are some Stan-related talks in the next few weeks. New York 25 February. Jonah Gabry: shinyStan: a graphical user interface for exploring Bayesian models after MCMC. Register Now: New York Open Statistical Programming Meetup. 12 March. Rob Trangucci: #5: Non-centered parameterization aka the "Matt trick." Register Now: Stan

Lee Beck writes: I'm curious if you have any thoughts on the statistical meaning of sentences like "a small but growing collection of studies suggest [X]." That exact wording comes from this piece in the New Yorker, but I think it's the sort of expression you often see in science journalism ("small but mounting", "small

We didn't get any great comments yesterday, so I'll have to go with PKD on the grounds that he was the presumptive favorite, and nobody made any good case otherwise. And today we have the second seed among the Religious Leaders vs. an unseeded entry in the Founders of Religions category. Truly a classic matchup.

The talk is tomorrow, Tues 24 Feb, 2:40-4:00pm in 200 Fisher Hall: "Unbiasedness": You keep using that word. I do not think it means what you think it means. Andrew Gelman, Department of Statistics and Department of Political Science, Columbia University Minimizing bias is the traditional first goal of econometrics. In many cases, though, the