Bioinformatics: Data Mining Among Genome Sequences

Susanne Kneitz and Thomas Dandekar 1.1

Systematic Genome Analysis of Pathogens as a Basis for Pharmacogenomic Strategies

Rather than identifying individual resistance events and fighting them using a classical pharmacological strategy, systematic genome analysis of the pathogen presents a more ambitious but very powerful pharmacogenomic strategy by which to identify new treatments against the pathogen. The basis for the systematic analysis of genomes is solid experimental data, in particular relating to pathogenicity factors. Current methods allow large-scale collection of data on a genomic scale involving the complete genome sequence (e.g., by DNA capillary sequencers), an overview of the transcriptome (e.g., by EST, expressed sequence tag, sequencing or SAGE, serial analysis of gene expression data), and insights into the proteome (e.g., large-scale two-dimensional gel analysis coupled with mass spectroscopy). The available genome sequences of various pathogens provide a wide range of novel targets for drug design which can be identified by means of microarray analysis. For example, a recent paper describes the application of functional genomics tools such as microarrays and proteomics for development of new drugs that are not only active against drug-resistant Mycobacterium tuberculosis but also can shorten the course of M. tuberculosis therapy [1].

On a note of caution, the quality of large-scale data is often not as good as for single observations. Examples are uncertainties in contig assembly, repetitive sequences, and gene prediction (DNA data), representational bias, and missing out of low-copy messengers (transcriptome data). Proteome data have particular problems, e.g., membrane proteins and highly charged proteins are not well resolved. Multiple gel spots may indicate modifications ofthe same protein. In addition, certain protein modifications (glycosylation, phosphorylation etc.) are not easily detected.

Bioinformatic tools are nevertheless the key to systematic analysis of these data with the aim of fighting development of resistance in the best way possible and of devising strategies against the pathogen on a rational, pharmacogenomic basis. They exploit a number of approaches such as the analysis of gene expression, pathway modeling, and detailed, database-guided biochemical analysis of the var ious steps leading to survival of the pathogen. Sequence information about involved genes, ESTs, or proteins is essential for starting the genome analysis. Iterative sequence alignments [2] offer a much improved way of detecting sequence similarities by aligning iteratively all newly identified related candidate sequences and using position-specific scoring matrices to search for families of related proteins in different genomes. Specific motifs in proteins are recognized by tools such as PROSITE [3]. Detailed analysis of gene expression by software such as Bioconductor and R (software package) [4] allows prediction about activated or repressed genes, gene clusters, groups, and pathways in pathogens. There follows a detailed analysis of all involved pathogen genes (and, if data are available, interacting host genes), enzymes, and pathways by the application of methods such as pathway alignment and elementary mode analysis [5].

0 0

Post a comment