Consensus Sequence

When we began our computational DNA signature development task in August 2000, there were no algorithms available that could efficiently align multiple bacterial genomes. After experimenting with programs that did not scale to handle even tiny viruses, we began using DIALIGN, although it could take days or weeks to align large viral genomes. In 2002 a new multiple-genome aligner program, MGA, became available. This fast, anchor-based algorithm works well

FIGURE 15.4 A simplified diagram of the LLNL DNA signature pipeline.

for a collection of whole genomes similar enough to have exact-match "anchor" regions evenly distributed and present in each genome. As an example, aligning six variola (smallpox) genomes (~190 Kbp in length) took longer than a week on DIALIGN and required breaking each genome into three pieces. MGA aligned these same six genomes in less than 30 minutes. We note that all existing alignment tools assume colinearity and do not handle rearrangements or duplications well. MGA currently cannot align genomes too distant to have sufficient exact anchors evenly distributed, nor can it align complete genomes along with incomplete ones (i.e., draft genomes or sets of sequence fragments). We currently use DIALIGN and/or MGA to align multiple whole-genome instances of any target genome that has more than one genome available.

0 0

Post a comment