Until very recently, most alignment algorithms were primarily designed for comparing single protein or DNA sequences, containing a single gene, either to each other or to a database of sequences. When faced with the problem of aligning long genomic sequences or entire genomes, most programs are incapable of producing accurate alignments and consume excessive space and time, although companies such as Interagon, Paracel, and Timelogic offer specialized hardware to speed up pairwise algorithms. Algorithms designed to run at higher speeds typically make a tradeoff between speed and sensitivity: faster computational time means that some alignments might be missed.
Genome-length alignment tools have usually been designed to satisfy one of several goals: some simply aim to find all similar or identical stretches of DNA between two genomes; others specifically target coding sequences (such as exons) and search for conserved exon order between two species; still others focus on intergenic and intronic regions to detect conserved regulatory signals. In the context of microbial diagnostic design, pairwise comparisons may be primarily used when a newly sequenced isolate is being compared against a very close relative. Some of the main problems associated with computing such an alignment include: (a) genome rearrangements (e.g., exon shuffling or syn-tenic breaks resulting from intramolecular recombinations), (b) large insertions or deletions (sequences that share several regions of local similarity separated by unrelated regions), (c) repetitive sequences (e.g., duplicated genes/operons, transposons, simple and complex repetitives), (d) tandem repeats, and (e) inherent problems of gene regulatory elements, including their small size and relative resistance to small insertions/deletions or substitutions.
Was this article helpful?