Thursday, January 29, 2015

Bayesian Network Reconstruction Using Systems Genetics Data: Comparison of MCMC Methods

Several former colleagues and I collaborated on a paper that was finally published in Genetics on January 28.  You can read it for free online here.

I won't regurgitate the entire work here, but I did want to take a moment to say why this research is important for many fields, not just genetics.  In fact, the Bayesian network reconstruction problem arises in numerous places including finance, machine learning, and crystallography.  The premise of it is this:  suppose you had a black box that, based on some inputs that are supplied to it, would perform a possibly random calculation and then give some results back to you.  Now it is your job to figure out how the machine works.  How are the outputs related to the inputs, and more importantly what is the underlying calculation that happens to produce these outputs?  Keep in mind that this machine could be doing nearly anything under the hood:  performing entirely different calculations based on one input, rolling dice and giving a random result, or throwing your inputs out entirely.

To computational biologists right now, the cell is such a black box.  We are able to measure a lot of things about it and do a lot of things to it, but we are unable to actually look at all of the things that are happening inside a cell in real time.  Not only are they happening at really fast timescales, we don't have microscopes capable of watching all of the relevant interactions that happen inside a cell at once, for example DNA unwinding, RNA synthesis, ribosomes making proteins, protein folding, protein-protein and extra-cellular interactions.  As a result, we are left measuring things such as single-nucleotide polymorphism and expression data and then trying to infer how the genes in the cell worked after the fact.

How we reconstruct a cell's regulatory network is remarkably naive:   we construct a completely random network, tweak it to make it more plausible (e.g., simulated annealing or gradient descent), and then repeat this many many times.  Finally, we propose that the links in the resulting networks that appeared most often are likely to be the true connections in the real network.

This isn't a perfect approach, but it has yielded new insights in the past.  Perhaps someday someone will come up with a greatly improved approach to reconstructing Bayesian networks, and it will be a boon to technology and mankind worthy of a Nobel prize.

No comments:

Post a Comment