Ben Li-Sauerwine's Notebook: The Science of Crowdsourcing Versus the Crowdsourcing of Science

Today Science Translational Medicine published our methods paper for the Sage/DREAM Breast Cancer Challenge. There were two things we wanted to see from the challenge: one was to was an improved model of breast cancer prognosis, the other was to find out how people would interact in a competition where their submission was immediately made public and free for the other competitors to do what they will with. I won't repeat what's in that paper here, but I do want to comment on the difference between the science of crowdsourcing and the crowdsourcing of science.

Having competed in more than my fair share of modeling and programming competitions, I've seen attempts to foster meaningful collaboration in problem solving before. The TopCoder Soybean Challenge is one example, and EteRNA is another. The Sage/DREAM Breast Cancer Challenge is the first I've seen where both a winner would be declared, and also everyone's submissions are always publicly available.

One thing must be said about online coding competition competitors: they are a crafty bunch. Like a genie from a cursed bottle, they will give you exactly what you ask for even if that's not what you want. Further, the crowd will adopt a strategy that will maximize their chance of winning. Nobody plays for second place. In this case, we saw exactly the behavior one would expect in a situation where everybody's submissions and source were public: obfuscation of code, and a single leading team putting in most of the work that (nearly) everybody else copies from. Like many other online competitions, notably EteRNA and FoldIt, a small number of our competitors did the vast, vast majority of the work. The science of crowdsourcing--motivating a group of people to perform the task you want--is thus still a work in progress. Our model of collaboration did not solve the fundamental problem of making a lot of people work together efficiently. It found some very good people who came up with a very good solution, which has been done many times before in other competitions. The Sage/DREAM paper is a paper about the science of crowdsourcing.

The thing I'd like to discuss now is the crowdsourcing of science: how to motivate people to work together to identify a problem, fund a solution, and democratically work towards the goal of solving the problem. My litmus test for a good system would be this:

"Can a new PhD student come to your project, learn the background they need from your documentation, contribute to it for his thesis, and also receive a stipend or other compensation for his work?"

I posit that the most difficult barrier to cross in setting up this system would be funding. In the current system, most academic research money comes from government grants. I have discussed previously some of the flaws of this system. There are a lot of patient advocacy groups and private foundations, however, that I believe would fund a single ambitious project like "Cure This Disease" if there was only a demonstrated mechanism in place to make progress on it. Currently these groups give grants to academic groups for less ambitious small projects since no single lab anywhere may have the resources or manpower to tackle the grand problem.

In order to attack these grand problems, I would favor an engineering approach much like you see in open-source projects. In crowdsourced science, users would post questions instead of feature requests, then write documentation, code or propose lab experiments to answer them. If a user sees a flaw in documentation or methods, they can post a bug report so that others may come along and fix it. Funders may then choose to compensate users who write good documentation and code as an incentive for their continued participation, or to pay a lab to run an experiment that they believe could answer a question. In the future, independent labs might even bid against each other to run these experiments.

Solid documentation and clear answers would be the key to the success of this model. The institution I propose would be able to educate all stakeholders, both funder and scientist, to make the best democratic decision on how to proceed. This clarity is something that much scientific literature and research lacks. Indeed, the difficulty in replicating many results is a hot issue right now. The ideal model would also challenge the existing power structure in academia since this democratic forum would choose the best answers in real time (much like StackOverflow). Reputation can and should still play a role, but any obviously better answer will still float to the top when all solutions sit side-by-side.

Indeed, the crowdsourcing of science would be a bold experiment. It has never been tried before. I believe that a democratic marketplace of minds, ideas, coders, writers, validators, experimentalists and funders is the future of science in the same way that open source is the future of software, and I would be thrilled to contribute my time to such a project when one does arise.

Ben Li-Sauerwine's Notebook

Wednesday, April 17, 2013

The Science of Crowdsourcing Versus the Crowdsourcing of Science

No comments:

Post a Comment