Annotation Platforms Suitable for Pathogenomics

Another area of bioinformatics research is to achieve easy comparison of different direct and higher annotation tools for specific genomes and sequences in pathoge-nomics. Good, powerful genome annotation platforms are already available, such as the noncommercial GenDB [20] and MAGPIE [22], or the commercial ones from PEDANT [21] and BioScout (Lion Biosciences AG, Heidelberg). This should also be seen in the context of similar ongoing activities exploring the advantages of XML (extended markup language) in bioinformatics [47] in different laboratories: There are new XML schemes for bioinformatics [48], EMBL data [49], and a protein markup language [50]. An XML broker exists for integration of microarray data [51] and integrated systems for biological pathways [23]. Strong new integrated data platforms for proteomics [52], XML-based remote procedure calls [53], and an SQL-(sequence query language)based server for online integration of life science data [54] have recently become available.

A number of useful integrated public and user-friendly tools for general genome and pathway analysis readily available on the world wide web are summa-

Tab. 1.2 Useful links and databases for genome and pathway analysis.

Links and general overview of databases Computational Molecular Biology at NIH Polish Academy of Sciences Sanger Institute

Washington University, St. Louis

NCBI databases

Sequence annotation

Sequence annotation for nucleotides, protein sequence, and whole genome sequence

Gene annotation

Gene-oriented clusters of transcript sequences

Catalog of human genes and genetic disorders, with links to literature references, sequence records, maps, and related databases

Map, sequence, expression, structure, function, citation, homology information and related web sites for genes

Sanger Institute gene annotation

Transcript/translation information, location, SNPs, orthologue prediction, disease matches, related web sites for genes

Stanford software and tools

Unification tool which dynamically collects and compiles data from many scientific databases (batch searches possible)

KEGG pathway database

SwissProt protein sequence database

http://molbio.info.nih.gov/molbio/db.html

http://www.ibb.waw.pl/biodat/05-02.html

http://www.sanger.ac.uk/Info/Links/data-bases.shtml

http://library.wustl.edu/subjects/life/genet. html

http://www.ncbi.nlm.nih.gov/ GenBank

UniGene

OMIM (Online Mendelian Inheritance in Man) Entrez Gene

http://www.ensembl.org/

http://genome-www5.stanford.edu/cgi-bin/ source/sourceSearch

http://www.genome.jp/kegg/ http://us.expasy.org/sprot/

rized in Table 1.2 as a first primer for any further intended analyses. The well-known National Center for Biotechnology Information (NCBI) is creating public databases and software tools for the dissemination of biomedical information. The databases are built from sequences submitted by individual laboratories and by data exchange with the international nucleotide sequence databases, and also from the other two major nucleotide databases, the European Molecular Biology Laboratory (EMBL) and the DNA Database of Japan (DDBJ). In addition to GenBank, further databases for annotation are supported and made available to the scientific community. A number of software programs, in particular different implementations [2] of BLAST and iterative Blast (Psi-BLAST), are available. Very useful also are the tutorials and the notes on the different parameters and qualifiers the programs accept. For example, the Entrez browser system offered here (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Protein&itool=toolbar) also retrieves, in addition to journal articles, protein and nucleotide sequences from a specific pathogen of interest.

0 0

Post a comment