I think it has been beat to death that the incentives in academia lean heavily toward producing papers and less toward producing/maintaining software. There are people that are way, way more knowledgeable than me about building and maintaining software. For example, Titus Brown hit a lot of the key issues in his interview. The open source community is also filled with advocates and researchers who know way more about this than I do.
This post is more about my views on changing the perspective of code/software in the data analysis community. I have been frustrated often with statisticians and computer scientists who write papers where they develop new methods and seem to demonstrate that those methods blow away all their competitors. But then no software is available to actually test and see if that is true. Even worse, sometimes I just want to use their method to solve a problem in our pipeline, but I have to code it from scratch!
I have also had several cases where I emailed the authors for their software and they said it “wasn’t fit for distribution” or they “don’t have code” or the “code can only be run on our machines”. I totally understand the first and last, my code isn’t always pretty (I have zero formal training in computer science so messy code is actually the most likely scenario) but I always say, “I’ll take whatever you got and I’m willing to hack it out to make it work”. I often still am turned down.
So I have a new policy when evaluating CV’s of candidates for jobs, or when I’m reading a paper as a referee. If the paper is about a new statistical method or machine learning algorithm and there is no software available for that method – I simply mentally cross it off the CV. If I’m reading a data analysis and there isn’t code that reproduces their analysis – I mentally cross it off. In my mind, new methods/analyses without software are just vapor ware. Now, you’d definitely have to cross a few papers off my CV, based on this principle. I do that. But I’m trying really hard going forward to make sure nothing gets crossed off.
In a future post I’ll talk about the new issue I’m struggling with – maintaing all that software I’m creating.
Please comment on the article here: Simply Statistics