As
in other areas of science, mathematical models can be used to make
inferences about complex dynamical systems when they are quantitatively
fitted to data. This approach allows us to formally and quantitatively
test and compare competing hypotheses, and allows us to make
quantitative predictions for empirical testing. It is the most powerful
and rapid way of culling wrong hypotheses.
The greatest challenge of modelling is that a vast number of
mathematical models are consistent with the qualitative understanding
we currently have of biological processes. So the fit of a particular
model to data need not be that informative. This challenge is the core
of my work: the iteration of hypothesis generation and testing
(mathematically and empirically) in order to understand biological
processes and their interactions.
Experimental Biology is generating large quantities of high quality
dynamical data. Conventional statistical analysis of such data ignores
its dynamical nature because the methodologies of fitting nonlinear
mechanistic models to noisy multivariate data have, until recently,
been poorly developed. This is an inefficient use of data that is often
costly and time-consuming to collect. Recent advances in Bayesian
population-based McMC allow us to fit nonlinear dynamical models to
multivariate time-series data. This means we can make inferences about
its dynamical nature and quantitatively compare competing
hypotheses. In addition we can extract far more information from
it than conventional statistical analyses. This can be done rapidly and
cheaply thus helping to reduce, refine and replace animal experiments
as well as allowing the reuse of existing data. These methods do,
however, generate a new set of challenges. The most significant of
these is the complexity of the computer algorithms required to
adequately characterise the posterior density. We have spent the last
three years researching, implementing and tuning a new algorithm to
work efficiently on multicore processors, whilst allowing flexibility
in model coding so that it can be used on a wide range of problems.
Unfortunately the code is not yet ready for public consumption. Perhaps
in the future if I can find someone to fund the development of a
user-friendly interface. But If you want to collaborate, either in
developing the code (particularly recoding for use on GPUs) or applying
it on your own data, I'd be very happy to hear from you.
There are two critical issues that make model-based inference of
biological systems challenging: the strong nonlinearity of the
dynamical systems and the high-dimensional parameter space. These have
several important consequences for nonlinear parameter estimation and
biological inference:
- analytical solutions rarely exist thus requiring time-consuming numerical estimates of solutions,
- the likelihood function can be very complex (for example,
multiple local maxima, and ridges) that often trap search or sampling
algorithms in suboptimal regions of parameter space,
- multiple solutions may exist all of which need to be visited by the search or sampling algorithm,
- searching or sampling parameter space is very slow for high dimensional systems.
If these issues are not adequately addressed then a good
characterisation of the posterior density is unlikely. The consequences
are compromised model assessment, compromised model comparison and a
lack of confidence in the validity of the biological inferences. A
common method of model fitting, maximum likelihood, suffers from all of
these problems to such an extent that it becomes almost unusable when
used to model nonlinear, multivariate dynamical systems. Bayesian
population-based McMC, on the other hand, overcomes all of these
problems.
Here are some books and papers that I have find useful in developing our ideas and algorithms.
Jeffreys. H. (1998).
The Theory of Probability, 3rd edition, OUP.
Lindley, D. V. (2006).
Understanding Uncertainty, Wiley-Blackwell.
Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2003),
Bayesian Data Analysis, 2nd edition, Chapman & Hall.
Gilks, W. R., Richardson, S. and Spiegelhalter, D. (1995).
Markov Chain Monte Carlo in Practice, Chapman & Hall.
Friel, N. and Pettitt, A. N. (2008). Marginal likelihood estimation via power posteriors,
J. R. Statist. Soc. B 70, 589-607.
Girolami, M (2008). Bayesian inference for differential equations,
Theo. Comp.Sci. 408, 4-16.
Lartillot, N. and Philippe, H. (2006). Computing Bayes factors using thermodynamic integration,
Syst. Biol. 55, 195-207.
Liang, F. and Wong, W. H. (2001) Real-parameter evolutionary Monte Carlo with applications to Bayesian mixture models,
J. Am. Stat. Assoc.,
96, 653-666.
Fearnhead, P. (2008). Editorial: Special issue on adaptive Monte Carlo methods,
Stat. Comput. 18, 341-342.
Mackay, D. J. C. (1992). Bayesian Interpolation,
Neural Computation 4, 415-447.
And a word of warning
here about computing the Harmonic Mean estimate of the marginal likelihood. Something I did, but soon learnt my lesson.