(This article was originally published at Statistical Modeling, Causal Inference, and Social Science, and syndicated at StatsBlogs.)
We had a recent discussion about statistics packages where people talked about the structure and capabilities of different computer languages. One thing I wanted to add to this discussion is some sociology. To me, a statistics package is not just its code, it’s also its community, it’s what people do with it.
R, for example, is nothing special for graphics (again, I think in retrospect my graphs would be better if I’d been making them in Fortran all these years); what makes R graphics work so well is that there’s a clear path from the numbers to the graphs, there’s a tradition in R of postprocessing.
In comparison, consider Sas. I’ve never directly used Sas but whenever I’ve seen it used, whether by people working for me or with me or just people down the hall who left Sas output sitting in the printer, in all these cases there’s no postprocessing. It doesn’t look interactive at all. The user runs some procedure and then there are pages and pages and pages of output. The point about R graphics is not that they’re so great, it’s that R users such as myself graph what we want. In fact, lots of default R graphics are horrible. (Try applying the default plot() function to the output of a linear model if you want to see some yuck.) I think Sas is horrible, not out of some inherent sense of its structure as a computer program but because I see what people do with it. In contrast, I see people using Stata creatively and flexibly, so I have much warmer feelings toward Stata (even though I don’t actually know how to use it myself).
The internals of a program do have something to do with how it’s used, I’m sure. I assume that Excel really is crappy, it’s not just that people use it to make crappy graphs. But as a user, I don’t really care about that, I just know to avoid it.
Please comment on the article here: Statistical Modeling, Causal Inference, and Social Science