I came across a really interesting post by Chris Lewis, titled Let me see your papers, let me see your source, and what started as a reply on his blog, turned into a story I felt was worth a blog post of its own. Here goes:
The problem Chris describes is very real and it’s really a looming problem in Computer Science research. I also imagine it is something that has slowly crept into the scientific community (I haven’t been around long enough to truly tell). I don’t know how reproducibility is ensured in other fields – there are probably papers on medicine that don’t share the formula – but I think in any field, the academic value of what is published is severely impaired if the results cannot be checked by others. Somehow, in CS, we have come to accept this as a fact of life.
I am convinced it is not so much because of bad intentions that people don’t share code, but more like an attitude that has become the norm. Universities and research institutions also don’t support this as well as they should. My own faculty does not even offer an open repository for me to share code publicly (I’ve solved this for myself by creating a GitHub “organisation” for it). I’ve also written plenty of code myself that I am not exactly proud of. In the heat of approaching experiment deadlines, I have cut more corners than I’d like to show the world. Especially since I also write code professionally, and have at least the illusion of a reputation to uphold ;) It’s not so hard to see why there are so few references to code accompanying research to be found. One thing I would never do though, is withhold anyone the chance to reproduce my results if they asked for it.
Which brings me back to the anecdote Chris mentions in his post: publishing a paper describing a tool, without providing the reader access to that tool or at least a detailed description of how it works should be impossible. Denying a friendly inquiry from a fellow researcher or interested person should feel even more wrong. I’m if, like Chris says, it’s an institutional problem, but I feel a shift of attitude is definitely needed.
I was at a conference on multimedia and video retrieval a few years ago, where I seemed to be the only one bothered by the utter lack of suitable material and data. There is a lot of research into techniques for segmenting, clustering and processing video, but the data is never shared because of licensing issues with the source video. In trying to create an interface that operates on data that has been segmented by these techniques, one of the biggest challenges turned out to be finding suitable data to work from (I eventually found it. You guess correctly: it is copyrighted). This should not be allowed to happen. If this is an omen to what is in store for CS research, we’re in trouble. Research is great, but we should never make it impossible for others to stand on the shoulders of giants.