Direct Sequence Annotation Tools for Functional Genomics

With the advent of large-scale sequencing techniques, many sequenced pathogenic genomes are now available. This allows comparative genomics to be carried out on a wide scale, but also requires also the representation of data on individual genomes in a suitable format.

Analyzing novel sequences from a large sequencing effort or a complete genome involves a number of different tasks. The first is the identification of transcripts (including splicing in eukaryotic genomes) and the determination of reading frames using programs such as Genescan, Orpheus, Genepredict and Prophet [5]. The next step is the analysis of mRNA, including identification of regulatory elements. A helpful tool we have developed for this is the RNA analyzer [6], which identifies individual regulatory elements in RNA sequences using a decision tree and individual subprograms that execute sequence and secondary structure searches for various elements. It exploits fast folding routines from the Vienna package [7]. Similarly, the program UTRscan [8] identifies a number of further regulatory elements in the mRNA.

Pathogenomics should also profit from insights in new regulatory features such as riboswitches, metabolite-mediated translational repression by RNA structures forming aptamers [9]. Detection software is available for these structures as well [10].

