Finding Particular Sequence Patterns

Another class of sequence analysis problems involves only individual sequences. They are analyzed primarily on the basis of either the statistical signals, such as exon and intron prediction, or the chemical and physical properties of the residues, such as hydrophobicity.

Recognition of functional signals in genes is important in gene identification. Traditional methods to find functional sites are based on using consensus sequences [ 13 ] or weight matrices [ 14] reflecting conservative nucleotides of a signal. Most gene prediction systems combine information about functional signals and the regularities of coding and intron regions. Single gene prediction programs nonnally use dynamic programming to find an optimal combination of preselected exons [15]. GENSCAN [ 16] was the first algorithm to predict multiple eukaryotic genes and remains to be one of the most widely used gene prediction systems.

As the genomic data ramps up, the automated gene finding or gene structure prediction systems are of increasing importance. Large-scale bioinformatics systems are needed to manage and integrate large volumes of genomic data efficiently. A recent review by Rust [ 17] provides more details in this field.

