Re: Let me see your papers, let me see your source

I came across a really interesting post by Chris Lewis, titled Let me see your papers, let me see your source, and what started as a reply on his blog, turned into a story I felt was worth a blog post of its own. Here goes:

The problem Chris describes is very real and it’s really a looming problem in Computer Science research. I also imagine it is something that has slowly crept into the scientific community (I haven’t been around long enough to truly tell). I don’t know how reproducibility is ensured in other fields – there are probably papers on medicine that don’t share the formula – but I think in any field, the academic value of what is published is severely impaired if the results cannot be checked by others. Somehow, in CS, we have come to accept this as a fact of life.

I am convinced it is not so much because of bad intentions that people don’t share code, but more like an attitude that has become the norm. Universities and research institutions also don’t support this as well as they should. My own faculty does not even offer an open repository for me to share code publicly (I’ve solved this for myself by creating a GitHub “organisation” for it). I’ve also written plenty of code myself that I am not exactly proud of. In the heat of approaching experiment deadlines, I have cut more corners than I’d like to show the world. Especially since I also write code professionally, and have at least the illusion of a reputation to uphold ;) It’s not so hard to see why there are so few references to code accompanying research to be found. One thing I would never do though, is withhold anyone the chance to reproduce my results if they asked for it.

Which brings me back to the anecdote Chris mentions in his post: publishing a paper describing a tool, without providing the reader access to that tool or at least a detailed description of how it works should be impossible. Denying a friendly inquiry from a fellow researcher or interested person should feel even more wrong. I’m if, like Chris says, it’s an institutional problem, but I feel a shift of attitude is definitely needed.

I was at a conference on multimedia and video retrieval a few years ago, where I seemed to be the only one bothered by the utter lack of suitable material and data. There is a lot of research into techniques for segmenting, clustering and processing video, but the data is never shared because of licensing issues with the source video. In trying to create an interface that operates on data that has been segmented by these techniques, one of the biggest challenges turned out to be finding suitable data to work from (I eventually found it. You guess correctly: it is copyrighted). This should not be allowed to happen. If this is an omen to what is in store for CS research, we’re in trouble. Research is great, but we should never make it impossible for others to stand on the shoulders of giants.

Update: Sometimes, just sharing code is not enough, as this comic wittily remarks. What about solutions for that? Could something like Sumatra be the answer? Interesting discussions on Hacker News.

3 Responses to “Re: Let me see your papers, let me see your source”

  1. Chris Lewis says:

    Thanks for reading my post! I totally agree with everything you’ve said here, and making a GitHub organization is definitely A Good Thing.

    Data is a much trickier thing, but I did notice that some things like Taverna appears to be becoming more popular in fields like biology and statistics, so there is hope for those fields. I tried it to see if it would be helpful for CS, but it’s basically a visual programming language, so it feels like a pair of particularly blunt scissors for programmers.

  2. Michel says:

    I had never heard of Taverna before, but my first impression is consistent with your remark that it is perhaps more suitable for fields like biology and statistics. I noticed it does have a command-line interface as well, so that should at least partly satisfy some CS urges ;)

    Either way, I think having bad code will always be better than having no code. I think the Willow Garage comic accurately captures the irony of this situation :)

    Regarding data, there is a whole movement going on towards more open access, with the best example being science commons (http://wiki.creativecommons.org/Research). There are still some hurdles to take though, as existing licenses often suffer from problems like “attribution stacking”, that make them impractical (see http://sciencecommons.org/projects/publishing/open-access-data-protocol/). Either way, these are exciting times and it’s not all bad news :)

  3. Bas Peschier says:

    I came across similar discussions with my Affective Interaction teacher, but the point is slightly more high-level. Papers do not convey half of what is important about such projects. The department calls for more interaction during conferences, so people can get a real feel for what you have done, rather than just get the dry specifics in a presentation.

    CS and other fields which work in projects with more resources than just knowledge should open up their resources so others can reproduce and – for starters – get a real grip on what you actually are doing. Objects, source, observation data, the works.

Leave a Reply

Your email address will not be published. Required fields are marked *