Current and recent research projects

Alignment 1

ATA----CAG
AGCTAAGCCG

Alignment 2

A--TA--CAG
AGCTAAGCCG


The number of nucleotide differences (shown in red) differs dramatically between the two alignments, but without other information it is impossible to say which is more likely to be the true alignment. With Toby Johnson and Jun Wang, we developed a method (MCALIGN) to tackle the noncoding DNA alignment inference problem by aligning sequences according to an explicit model of indel evolution. The model has two parameters: theta, the rate of indels relative to nucleotide substitutions, and w, a vector specifying the relative frequencies of indels of a given length. Parameter values can be derived from noncoding DNA sequence data of species related to the species of interest, but which are sufficiently closely related as to make alignment by standard heuristic methods effectively unambiguous.