Current
and recent research projects
Evolutionary
population genetics
It
is now possible to obtain complete genome sequences at moderate
cost, and we are taking advantage of this in two projects.
1.
We have sequenced the genomes of 10 wild house mice, and are using
these to address several fundamental questions concerning the
interactions between natural selection, new mutations, finite
population size and genetic linkage in the mammalian genome.
Studying wild house mice is advantageous, because their effective population sizes are extremely large
(two orders of magnitude larger than humans, for example), so
the signature of natural selection in the genome is considerably
easier to detect and quantify. Genome sequencing is being
carried out at the Sanger Institute using Illumina technology
(principal collaborator David Adams).

2.
With Nick Colegrave, we are carrying out mutation accumulation experiments in several
single-celled algal species (including Chlamydomonas
reinhardtii),
and will use these in an in-depth study of the process of
spontaneous mutation. We are also studying the evolutionary genomics
of Chlamydomonas
by
sequencing complete genomes from natural isolates.

The
genome-wide mutation rate
Evaluating
the role of new mutations in evolution requires knowledge of the
genome-wide rate of new mutations. This parameter has proved to be
extremely difficult to estimate, and direct molecular estimates of
mutation rates are badly needed.
Our lab has directly
estimated the per nucleotide mutation rate in Drosophila
melanogaster by analysing inbred mutation accumulation (MA)
lines that had randomly accumulated spontaneous mutations for ~200
generations. We have applied two molecular methods to do this:
1. Denaturing high performance liquid chromatography (DHPLC, a technique for scanning the genome for new mutations). An example of a DHPLC chromatogram revealing a mutation in one of the MA lines, and a chromatogram confirming that it is a C→T transition, produced by Sanger DNA sequencing, is shown below:


In
this case DHPLC gave slightly different traces for the three
wild-type genotypes that we analysed in the experiment, as a
consequence of differences between the three genotypes in sequence
composition.
2. Illumina sequencing. We sequenced the genomes
of three MA lines by Illumina (Solexa)
high-throughput technology. We detected 174 new mutations, giving us
a much more accurate estimate of the mutation rate than we obtained
by DHPLC. This also allowed us to study several features of the
biological basis of spontaneous mutagenesis in Drosophila.
Graphical output from the program Maq, of 36 base-pair Illumina reads
aligned to the D. melanogaster mitochondrial genome, is shown
below:

The distribution of effects of deleterious mutations and rates and effects of mutations that are driven by positive selection
The fixation of advantageous mutations leads to evolutionary adaptation, and populations are subject to a continual flux of deleterious mutations. In collaboration with Adam Eyre-Walker (Univ Sussex) we have developed methods that use genome sequence data to infer the distribution of effects of mutations, both deleterious with effects down to and neutrality, and those that are advantageous and contribute to adaptation. We are taking advantage of the vast amounts of new genomic sequence data that is becoming available on individuals sampled from populations of several species. Our methods have been widely applied by groups studying evolutionary genetics, and we are working on ways to generalise the kinds of models that can be fitted to data.
The
evolution of recombination and sex
We
are using computer simulations to study a fundamental problem in
evolutionary biology – why sex and recombination? With Sally Otto
(Univ. British Columbia) we have shown that selective interference
between linked deleterious mutations favours mutations that increase recombination (recombination
modifiers), and that the advantage of recombination increases,
apparently without limit, as population size increases. Since all
real populations in nature are very large (at least those that stand
a chance of persisting), this therefore is a general explanation for
the evolution of recombination, and it is becoming increasingly
accepted. We are currently studying the evolutionary maintenance of
obligate sexual reproduction, a problem that has been even more
difficult to explain.

Alignment
of noncoding DNA sequences
Aligning
noncoding DNA is problematical due to the unknown pattern of
insertion/deletion (indel) events. Consider the following pair of
sequences that have two plausible alignments:
|
Alignment 1 |
ATA----CAG |
|
Alignment 2 |
A--TA--CAG |
The number of nucleotide
differences (shown in red) differs dramatically between the two
alignments, but without other information it is impossible to say
which is more likely to be the true alignment. With Toby Johnson and
Jun Wang, we developed a method (MCALIGN) to tackle the noncoding DNA
alignment inference problem by aligning sequences according to an
explicit model of indel evolution. The model has two parameters:
theta, the rate of indels relative to nucleotide substitutions,
and w, a vector specifying the relative frequencies of indels
of a given length. Parameter values can be derived from noncoding DNA
sequence data of species related to the species of interest, but which
are sufficiently closely related as to make alignment by standard
heuristic methods effectively unambiguous.