The Long Road from Sequence to Function

Less than ten years after the publication of the first complete nucleotide sequence of a free-living organism, that of Haemophilus influenzae Rd [1], more than 220 complete genomes have been sequenced and made available to the public (see Genomes OnLine Database at, which has radically changed our way of thinking and doing biology. This apparently endless expansion, with several hundred additional genome projects currently underway (, is rapidly leading to the identification of the entire repertoires of genes for almost all the known pathogenic microbes affecting mankind. This sequencing frenzy is boosted by the common belief that genome sequences will be the key in a future, hopefully at hand, to the rational design of novel antimicrobial therapies which could have a fundamental impact on health care in this century [2]. This is expected to be the consequence of a better understanding of a wide range of bacterial and fungal biological processes, including the mechanisms of bacterial and fungal pathogenesis, which will result from the identification of the genes that contribute to these processes and the functional characterization of the corresponding proteins. However, this hope relies on our ability to convert rough sequence data into polished biological information, which promises to be a fantastic challenge since, more often than not, bioinformatic analysis of the sequence actually provides incomplete information, if any, as to gene function. Indeed, in most of the sequenced genomes, approximately 40% of the open reading frames (ORFs) could not be assigned any potential function, either because they are unique to the studied microbe or because they are similar to other genes whose function is also unknown. Moreover, the majority of the functions that could be predicted rely mainly, if not entirely, on circumstantial evidence which, no matter how plausible, needs to be experimentally validated - or, for that matter, invalidated.

Studying, amongst other things, gene or protein expression, protein localization, or protein-protein interactions may generate information relevant to the mechanisms of bacterial and fungal pathogenesis [3]. However, the resulting data

- e.g., a list of genes transcriptionally active during infection - always need to be subsequently tested by mutagenesis, since the role of a gene in a defined virulence trait becomes actual only after it has been demonstrated by mutational analysis. Therefore, the most direct and efficient path to gene function remains to directly define the phenotypic alterations resulting from gene mutation, which can be done on a gene or a genome scale, as discussed below.

0 0

Post a comment