Some Bioinformatics Websites
GenBank and linked databases
http://www.tigr.org/tdb Genome Database (GDB) (human genome)
Flybase (Drosophila genome)
RCSB Protein Data Bank
http://www.rcsb.org/pdb/ PIR Protein Information Resource (PIR)
A variety of analyses may be performed on DNA sequences. Some simple examples are as follows:
A. Searching for related sequences. Any DNA sequence may be compared with other sequences available in the data banks. Searches can also be run on protein sequences after translation of coding DNA. If another protein is found with a related sequence this may give some idea of the function of the protein under investigation. Of course, this assumes that the function of the other protein has already been deciphered! Another major use of sequence comparisons is to trace the evolution both of individual genes and of the organisms that carry them (see Ch. 20).
B. Codon bias analysis can locate coding regions. Due to third base redundancy and the preferential use of some codons over others (in coding regions but not in random, intergenic DNA), there are differences in codon frequency between coding and non-coding DNA. A codon bias index can be computed that gives a reasonable first estimate of whether a stretch of DNA is likely to be coding or non-coding.
C. Searching for known consensus sequences. A variety of short consensus sequences or sequence motifs are known. Analysis of DNA sequences may reveal promoters, ribosome binding sites (in prokaryotes only), terminators and other regulatory regions. Inverted repeats in DNA imply possible stem and loop structures, which are often sites for the binding of regulatory proteins. Analysis of protein sequences may indicate binding sites for metal ions cofac-tors, nucleotides, DNA etc.
Despite the vast amount of information available from the analysis of DNA sequences, we still need to investigate how genes are regulated at the genome level and how the encoded proteins function. Just as the totality of genetic information is known as the genome, so the sum of the transcribed sequences is the transcriptome and the total protein complement of an organism is the proteome. These are discussed in Chapters 25 and 26 respectively.
Was this article helpful?