Obtaining Protein Information from a Domain Server

As an example of information thus readily available from a protein sequence, consider the output of AnDOM [13]. It shows the regions homologous to known three-dimensional structural domains which are hidden in a given sequence. For instance, plasmoredoxin from Plasmodium falciparum [15] can be annotated using its sequence by employing the AnDom server. Genbank entry AAF87222 contains the plasmoredoxin [thioredoxin-like redox-active protein (Plasmodium falciparum)] sequence. The server is located at A green bar shows that the main central part of the structure is homologous to the domain in crystal structure 1o8x from the SCOP family c.47.1.10., which is the tryparedoxin I from Crithidia fasciculata. Its residues 22-141 structurally match residues 39-157 of the plasmoredoxin sequence, further confirming its predicted and experimentally confirmed role in redox metabolism [15].

Thus, color coding reflects the predicted individual three-dimensional structures of the domains according to homology: green represents those parts of the structure where there is a mixture of a-helix part plus a b-strand part situated in this domain (SCOP class 3), blue represents folding class a/b (SCOP class 4, helix follows strand follows helix and so on, often in catalytic domains), and violet would correspond to a multidomain protein (SCOP class 5). The positions of the main structural domains are shown graphically at the top; next, the output displays the similarities to all protein structural domains found, including the names of the domains, significance, and references to sequence comparisons. Finally, the detailed domain alignments are given by the server according to their homology to known structures. In this way, the first structural clues to potential drug targets of a pathogen become immediately available. Several other structure predictors are also available [16].

Protein interactions are important because they better reveal protein partners within the infectious organism or the host and with each other. Conservation of gene order [17] and gene fusion events as well as the combined presence or absence of clusters of genes in prokaryotes are good predictors in prokaryotes of direct interactions at the protein level of the encoded gene products. Gene context conservation is easily detected with the STRING software [18]. A huge database of more then 100 genomes is used in STRING to determine whether there is conservation of gene neighborhood or a gene fusion event in genomes related to the organism and the protein gene used as the query. For predictions of protein interactions in eukaryotic pathogens such as trypanosomes or plasmodia, gene context is less reliable; however, recent versions of the STRING software include predictions based on text mining, two hybrid assays, coexpression, and homology to allow helpful predictions about physical association, or at least functional association, of proteins in a protein cluster in eukaryotes as well. Functional association allows predictions about substrate specificity (e.g., a tryptophan pathway metabolite) by looking at the cluster with which the gene of interest is associated (in this example this would be tryptophan-metabolizing enzymes). In this way even proteins of unknown function can be put into a functional context and their substrate predicted, if the exact biochemical function of at least one protein of the cluster predicted by STRING to be associated is known. These predictions are complementary to sequence analysis methods where homology allows prediction of protein function, but homology is less clear in relation to the exact substrate specificity.

Recognizing all the hidden features in a pathogenic genome is a continuous process, as new software, input from additional experimental data, and the exponential growth of databases allow new insights; for example, five years after the original annotation of a pathogenic genome as exemplified by Mycoplasma pneumoniae, about a third of its annotation can be substantially be improved by these new data and techniques [5, 19]. Automated data and software platforms such as GenDB [20], Pedant [21], and Magpie [22] for annotation together with integrated databases [23] increase annotation speed and allow systematic comparison of genomes. The combination of these and similar tools allows rapid annotation of genomes and rapid identification of pathogen-specific features such as host interaction factors and toxins.

0 0

Post a comment