Wednesday, April 17, 2013

The Science of Crowdsourcing Versus the Crowdsourcing of Science

Today Science Translational Medicine published our methods paper for the Sage/DREAM Breast Cancer Challenge.  There were two things we wanted to see from the challenge:  one was to was an improved model of breast cancer prognosis, the other was to find out how people would interact in a competition where their submission was immediately made public and free for the other competitors to do what they will with.  I won't repeat what's in that paper here, but I do want to comment on the difference between the science of crowdsourcing and the crowdsourcing of science.

Having competed in more than my fair share of modeling and programming competitions, I've seen attempts to foster meaningful collaboration in problem solving before.  The TopCoder Soybean Challenge is one example, and EteRNA is another.  The Sage/DREAM Breast Cancer Challenge is the first I've seen where both a winner would be declared, and also everyone's submissions are always publicly available.

One thing must be said about online coding competition competitors:  they are a crafty bunch.  Like a genie from a cursed bottle, they will give you exactly what you ask for even if that's not what you want.  Further, the crowd will adopt a strategy that will maximize their chance of winning.  Nobody plays for second place.  In this case, we saw exactly the behavior one would expect in a situation where everybody's submissions and source were public:  obfuscation of code, and a single leading team putting in most of the work that (nearly) everybody else copies from.  Like many other online competitions, notably EteRNA and FoldIt, a small number of our competitors did the vast, vast majority of the work.  The science of crowdsourcing--motivating a group of people to perform the task you want--is thus still a work in progress.  Our model of collaboration did not solve the fundamental problem of making a lot of people work together efficiently.  It found some very good people who came up with a very good solution, which has been done many times before in other competitions.  The Sage/DREAM paper is a paper about the science of crowdsourcing.

The thing I'd like to discuss now is the crowdsourcing of science:  how to motivate people to work together to identify a problem, fund a solution, and democratically work towards the goal of solving the problem.  My litmus test for a good system would be this:

"Can a new PhD student come to your project, learn the background they need from your documentation, contribute to it for his thesis, and also receive a stipend or other compensation for his work?"

I posit that the most difficult barrier to cross in setting up this system would be funding.  In the current system, most academic research money comes from government grants.  I have discussed previously some of the flaws of this system.  There are a lot of patient advocacy groups and private foundations, however, that I believe would fund a single ambitious project like "Cure This Disease" if there was only a demonstrated mechanism in place to make progress on it.  Currently these groups give grants to academic groups for less ambitious small projects since no single lab anywhere may have the resources or manpower to tackle the grand problem.

In order to attack these grand problems, I would favor an engineering approach much like you see in open-source projects.  In crowdsourced science, users would post questions instead of feature requests, then write documentation, code or propose lab experiments to answer them.  If a user sees a flaw in documentation or methods, they can post a bug report so that others may come along and fix it.  Funders may then choose to compensate users who write good documentation and code as an incentive for their continued participation, or to pay a lab to run an experiment that they believe could answer a question.  In the future, independent labs might even bid against each other to run these experiments.

Solid documentation and clear answers would be the key to the success of this model.  The institution I propose would be able to educate all stakeholders, both funder and scientist, to make the best democratic decision on how to proceed.  This clarity is something that much scientific literature and research lacks.  Indeed, the difficulty in replicating many results is a hot issue right now.  The ideal model would also challenge the existing power structure in academia since this democratic forum would choose the best answers in real time (much like StackOverflow).  Reputation can and should still play a role, but any obviously better answer will still float to the top when all solutions sit side-by-side.

Indeed, the crowdsourcing of science would be a bold experiment.  It has never been tried before.  I believe that a democratic marketplace of minds, ideas, coders, writers, validators, experimentalists  and funders is the future of science in the same way that open source is the future of software, and I would be thrilled to contribute my time to such a project when one does arise.

Monday, April 15, 2013

Computation at the Edge of Chaos

I was thinking recently about one of my favorite research papers I've ever read--Chris Langton's Computation at the Edge of Chaos.  The conclusion of the paper is fairly simple:  that physically, a computer needs to have both memory and long-range transportation of information in order to operate.

The first remarkable thing about this result is how broad it is in the definition of a "computer".  Bacteria on a Petri dish, for example, could be a computer because they can move around, interact with each other and their environment, and last a long time.  The second remarkable thing is that that the kinds of things that can be a computer exist in balance between things that are boring and sit around doing nothing forever (like empty space) and things that are so chaotic that their behavior is dominated by noise (like static on your TV screen).  It turns out that in these sorts of situations, the boring parts act as memory and the noisy parts allow for the transfer and transformation of information.

To show this, Langton used groups of simple simulations like Conway's Simulation of Life1.  Conway's Simulation of Life doesn't actually simulate life--it's just a toy model in much the same way that Monopoly is a toy model of being a real estate tycoon.  It turns out that this model ends up producing things that look kind of like cells on a Petri dish!  The simulation plays out on a two-dimensional grid with a following rules:  gray boxes are empty space and green boxes are living.  Each second, any living box with fewer than two living boxes next to it dies of loneliness.  Any living box with more than three neighbors dies of overcrowding.  Any empty box with exactly three living boxes next door becomes alive due to reproduction.  In any other case, empty boxes remain empty and living boxes remain living.

Sorry! Google Drive stopped allowing me to host scripts from it on August 2016. You can now view the interactive figure 1 at
Interactive Figure 1:  Simulations of Life.  If the "life" goes off of one edge of the picture, it will appear on the opposite side.  Press the green "Randomize!" button at the bottom to restart the simulation with a random initial condition.  Press the gray Conway button to use Conway's Simulation of Life.  Press the dark blue "Coldest!" button to use the Cold Simulaton of Life.  Press the dark red "Hottest!" button to use the Hot Simulation of Life.  You can press the light blue "Make Colder" or light red "Make Hotter" buttons to make the simulation slightly hotter, or slightly colder.
Interactive Figure 1 is an implementation of Conway's Simulation of Life (as well as some tweaks that we will get to later) that I have coded in limejs2.  What you see in the figure is probably a pretty lush place, with lots of "cells" and various other organic-looking activities going on.  If it's not very lush-looking, or if it's stabilized and not much is going on anymore, press the gray "Conway" button on the bottom of the simulation now to restart it.  As it proceeds, notice how a there are both long-lived stable "cells" as well as explosions of growth that last a long time and that may affect things a long way from where they started.  This probably looks pretty random to you, but let me assure you that it is not.  Every second, a fixed computation occurs that changes the world in a very predictable way.  Let's take a look at two simple examples of "cells" that you may find in your simulation to convince you that what's going on here is anything but random!  

Figure 2:  Two of the most common "cells" in Conway's Simulation of Life, and why they behave the way they do.
Figure 2 shows two of the most common "cells" in Conway's Simulation of Life, the block and the blinker.  The 2-by-2 square block stays the way it is because each living box is neighbored by exactly 3 other living boxes. The blinker spins forever because each of the two ends has only one living neighbor and dies, but at the same time the empty boxes next to the center box each have exactly 3 living neighbors and come to life.  This cycle of death and rebirth can continue forever. Conway's Simulation of Life is pretty suspenseful to watch in a way:  you can pick your favorite "cell" and root for its survival as the world around it changes.  It is this very property that lends Computation at the edge of chaos its name:  there exist both chaos in our game (those explosions of growth that travel a long way) and stability (those long-lived "cells") in our game, and it turns out that Conway's Simulation of Life is also Turing complete, which means that if you arranged these cells and structures strategically you could actually use them as a computer!  The "cells" would act as your memory, and the noisy parts that are changing rapidly would act as your processor.    As we shall see, this is a very special property that exists in a narrow subset of possible simulations between boring, trivial nothingness and meaningless, noisy chaos.

You may be tempted to guess that basically any set of rules like Conway's shown in Figure 2 would do weird and interesting things in their own right.  It turns out that interesting ones like Conway's Simulation of Life are actually pretty rare.  To demonstrate this, I'm going to make some very slight changes to the rules of Conway's Simulation of Life and see that the behavior is dramatically altered. First, I'm going to posit that we take the rules of the Simulation of Life and cool everything down so that living boxes need exactly three neighbors to survive, otherwise they will freeze to death and die.   Everything else remains the same.  I will call this the Cold Simulation of Life.  I'm also going to tint the empty cold boxes blue to remind you that they are cold!  To see what happens in the Cold Simulation of Life, press the darkest blue button on the bottom of Interactive Figure 1 labeled "Coldest".  

Restart the Cold Simulation of Life a few times by pressing the green "Randomize" button to convince yourself that, unlike Conway's Simulation of Life, the Cold Simulation is a pretty boring place.  There may be a few hardy, shivering cells that survive, but the slight change to the rules of Conway's Simulation of Life that requires boxes to have exactly 3 living neighbors has made this a harsh and inhospitable world.  Like Conway's Simulation of Life, the remaining cells could act as "memory"--they last forever--but unlike Conway's Simulation there is no long-range interaction between them.  The disturbances you introduce when you press the "Randomize" button quickly leave you with another cold, desolate world to look at.   This property is called "quiescence".

Now let's try changing the Simulation of Life in the other direction: we'll heat it up!  We will take the rules described in Figure 2 again and add one simple change.  Because of the bountiful energy of a hotter world, empty boxes become alive through reproduction when they have either 2 or 3 living neighbors.  This time I will tint the empty boxes red so that you remember that these are hot.  Let's see what happens!  Press the darkest red button at the bottom of Interactive Figure 1 labelled "Hottest" now.

The Hot Simulation of Life is pretty crazy to look at.  You can hit the green button to restart it with random initial conditions, but you probably won't be able to see a difference when you do.  The Hot Simulation's behavior appears as random and chaotic as the static on your TV.  In fact, it's about as interesting to watch as the static on your TV.  Nothing in this world lasts.   This world has waves of new growth constantly sweeping across it and changing everything, but it has no obvious stable cells or structures that could be considered "memory".  This world, then, has the long-range transportation we need for computation, but not the memory.  This property is appropriately called "chaos".

By now, you've probably figured out where this is headed.  I have shown you Conway's Simulation of Life, an interesting world that has interacting "cells" and which can act as a computer.  I have shown you the Cold Simulation of Life which has a few stable cells, but no long-range interactions.  Finally, you saw that Hot Simulation of Life which has lots of interaction but no stability.  It turns out that Conway's Simulation lies at the boundary between the "quiescent" Cold Simulation and the "chaotic" Hot Simulation--at the edge of chaos.

The Edge of Chaos is visible empirically in two different ways.  One is called "critical slowing down".  You will notice that Conway's Simulation of Life takes a long time to reach equilibrium compared to the Cold Simulation of Life, and the Hot Simulation of Life never slows down at all!  Go back to Interactive Figure 1 and press the gray "Conway" button for Conway's Simulation of Life and observe how long it takes for the initial disturbance to slow down.  Now press the light blue "Make Colder" button a couple of times and notice how much more quickly the system reaches a stable equilibrium after just a few clicks.  You will also notice a few light blue cold boxes appearing in the simulation as you make it colder and colder.  Next, press the "Conway" button a couple of times then heat it up by pressing the "Make Hotter" button a few times.  You will notice that it takes much, much longer to reach equilibrium if you add a small number of hot boxes, and indeed after several presses the chaotic behavior will go on seemingly forever without reaching equilibrium.

It takes very few presses of "Make Colder" button to make our simulation quickly reach a stable equilibrium and even fewer presses of the "Make Hotter" button to keep our simulation from ever stabilizing.  This narrow band of temperatures you have observed where our simulation proceeds in an interesting fashion is the edge of chaos.

I will show you the edge of chaos using another measure, the mutual information.  The mutual information tells you how much the state of a box (alive or empty) helps you guess the state of the box at the next second.  In a quiescent Cold Simulation, whether the box is alive or not doesn't tell you much because the box will probably be dead the next second no matter what.  In the chaotic Hot Simulation, the box's status doesn't tell you much because the Hot Simulation is so wildly fluctuating that whether the box is alive or dead the next second is basically random.  In Conway's Simulation, we strike a happy balance between stability and fluctuations that maximizes the mutual information.  By randomly sprinkling in some cold or hot boxes just like you did when you visualized critical slowing down, we can gradually "cool" or "heat" Conway's Simulation and visualize the mutual information as it changes.
Figure 3:  Using mutual information to visualize the edge of chaos3.  This is a scatterplot of mutual information calculated from 100 steps of a simulation where some proportion of boxes have been randomly swapped from Conway's Simulation to either a Cold or a Hot Simulation.  The mutual information of the simulation rises gradually as we go from the Cold Simulation to Conway's Simulation, but it takes very little of the Hot Simulation to completely destroy the mutual information and thus the complex and interesting behavior of Conway's Simulation of Life.   The computation at the edge of chaos, so to speak, lies on that narrow peak in the center of the figure.
Figure 3, to me, is what physics is all about:  using an approximation of reality to draw broad, general conclusions.  None of our simulations--Conway's, Cold or Hot--were actually computers, or even anything very realistic in terms of real-world behavior.  However, they each embody a kind of behavior that we do see in the real world.  The Cold Simulation was a model of long-term stability.  The Hot Simulation was a model of long-distance fluctuations.  As we cooled or heated Conway's Game of Life, we found that its interesting behavior was quite fragile, as only a few hot boxes or a handful of cold boxes quickly destroyed the lifelike behavior.  The fact that our simulations showed organized, high-information behavior in a narrow range of "temperatures" at the edge between stability and chaos showed us that both of these behaviors are simultaneously essential in order to have a useful computer.

Many things can be computers.  Not just your phone and desktop PC, but also brains and life itself.  We can be assured of one truth that spans all of these kinds of computers, though--that however they work, they must all exist on a fragile boundary between stability and chaos.   

Technical Footnotes

1:  The usual name given to this simulation is "Conway's Game of Life."  However, I feel that this is misleading since there is no notion of "playing" it.  Poetic license is used to spare the reader from confusion.

2:  Source code is provided for completeness, but it is neither very elegant nor very enlightening as this is both my first excursion into limejs/HTML5 and was also not really meant to be reused.  I just wanted a figure as quickly as possible.  To quote Lilli when she heard exactly how I produced my data and figure:  "This must have been a fun break from doing things the right way."  (It was.)  Here is the code for the simulation you see above, and here is the code that generated the data you see in Figure 3.

3:  I realize that this is not how Chris Langton visualized the edge of chaos in his paper.  For the benefit of a more general audience, I have translated his methods into a more accessible model in order to ease understanding.  This post is meant to be digestible to the undergraduate or high school student, as well as the physicist.

Sunday, April 7, 2013

Tulip Festival

It's April and the tulip and daffodil fields are blooming in Skagit County.  Simiao and I drove up today to see them--they're beautiful, but you definitely need something else to do if you're going to make a two-hour drive from Seattle.  It's certainly worth seeing, but there are only so many flowers that one cares to look at in a day, and each field looks basically the same as all of the others!  Fort Casey or Deception Pass would be a great nearby destination to visit once one gets tired of looking at the flowers.  I took a few portraits of Simiao while we were up there and thought I'd share.