When we introduced the functional genomics pipeline of figure 1.5 (see p. 17) we made it clear that without biological validation, the hypothesis generation and test procedure for which the pipeline was intended will be arrested. A small but important subtask of bridging the flow from computational to biological validation is the assurance that the genes referred to in the analysis (e.g., the lists of genes belonging to each cluster) should refer to the appropriate physical clone regardless of whether the nomenclature used for analysis was GenBank, UniGene, TIGR, Stanford Yeast ORL codes, or any other internal or local numbering schemes. Specifically, biological validation will involve performing biological "wet bench" experiments, which may well depend on obtaining clones of the genes identified in the analysis.
Thus, some degree of translation is necessary to follow up on the results. It has been our experience that this process takes many fold longer than the actual bioinformatics analysis. Since many of the genes or ESTs that appear on the lists after analysis are typically unknown, one can argue that a bioinformatics analysis of microarrays is never actually complete. It may only be after another week, month, or year, that another group may finally assign a meaning to one of those unknown genes. Only after the analyst sees that result might a hypothesis be generated. Thus, an infrastructure that periodically re-queries the accession numbers or genes to see if anything new is known about each oligonucleotide, cDNA, or gene is necessary. Furthermore, appropriate analysts should be informed when such an event happens. Such a system might be similar to how PubCrawler operates on the biomedical literature .
Here we touch upon some of the conceptually shallow but pragmatically thorny challenges that face the researchers as they attempt to make sense of the results of their data mining forays.
Was this article helpful?