Functional Genomics

Now that the human genome has been sequenced (or, more accurately, now that a handful of human genomes have been sequenced), we are said to be in a postgenomic era [200]. We find this term confusing ("genome, we hardly knew ye") because in our view, now that we know at least the draft outline for the genome of multiple organisms, we can begin for the first time to systematically deconstruct how the genetically programmed behavior of an organism's physiology is related to the constituent genes that make its individual version of its species' genome. In this deconstruction, several kinds of biological information are available: DNA sequence, physical maps, gene maps, gene polymorphisms, protein structure, gene expression, protein interaction effects,[1] and a vast literature in MEDLINE [129].

As we use it in this book, functional genomics refers to the overall enterprise of the deconstruction of the genome to assign biological function to genes, groups of genes, and particular gene interactions. These functions may be directly or indirectly the result of a gene's transcription. Much of functional genomics has been and will continue to be the kind of hypothesis-driven biological research pursued for the past decades.

In this book, we address a computationally intensive branch of functional genomics that has emerged as a result of the practical implementation of technologies to assess gene expression thousands of genes at a time. The ability to comperhensively measure expression affords an opportunity to reduce our dependence on a priori knowledge (or biases) and allow the organism to point us in potentially fruitful directions of investigation. That is, we describe a hypothesis-generating effort which, if carefully crafted, can then lead to a highly productive set of investigations using more conventional hypothesis-driven research. In this we have been inspired by the work of Arkin et al. [12], as have others. Starting from the raw time-series measurements of the substrates of glycolysis (see figure 1.1), Arkin et al. were able to computationally reconstruct the glycolytic pathway (figure 1.2). This reverse engineering of the metabolic pathway without prior knowledge is in contrast to the decades of exacting hypothesis-driven elucidation of this pathway by biochemists. It turns out that, as we will discuss later, in experimental design (section 2.1.3), this metaphor is flawed but it remains an icon for one of our major goals: to use bioinformatics applied to functional genomics data to create and re-create the kind of biological pathway charts that are common in most basic biology laboratories.

Figure 1.1: Time-series data from anaerobic metabolism. The time courses of measured concentration of the small molecule inputs, adenine monophosphate (AMP), a source of chemical energy to catalyze reactions) and citrate (a substrate), in the experiments, with the responses of the concentrations of phosphate (P, an inorganic ion) and of the substrates fructose-1,6-biphosphate (F16BP), dihydroxy acetone phosphate (DHAP), fructose-6-phosphate (F6P), glucose-6-phosphate (G6P), and fructose-2,6-biphosphate (F26BP). (Derived from Arkin etal. [12].)

Figure 1.1: Time-series data from anaerobic metabolism. The time courses of measured concentration of the small molecule inputs, adenine monophosphate (AMP), a source of chemical energy to catalyze reactions) and citrate (a substrate), in the experiments, with the responses of the concentrations of phosphate (P, an inorganic ion) and of the substrates fructose-1,6-biphosphate (F16BP), dihydroxy acetone phosphate (DHAP), fructose-6-phosphate (F6P), glucose-6-phosphate (G6P), and fructose-2,6-biphosphate (F26BP). (Derived from Arkin etal. [12].)

Figure 1.2: Glycolytic pathway reconstructed ab initio from time-series data. A, The two-dimensional projection of the correlation metric construction (CMC), defined by Arkin et al., for the time series shown in figure 1.1. Each point represents the time series of a given species. The closer two points are, the higher the correlation between the respective time series. Black (gray) lines indicate negative (positive) correlation between the respective species. Arrows indicate temporal ordering among species based on the lagged correlations between the their time series. B, The predicted reaction pathway-derived CMC diagram. Its correspondence to the known mechanism of glycolysis is high. (Derived from Arkin et al. [12].)

Furthermore, we note that genomic data can be fruitfully exploited even without the assignment of function: a prognostic test for rejection of transplanted kidneys based on the expression level of three genes is useful even if the function of those three genes is poorly known or not known at all.

Figure 1.2: Glycolytic pathway reconstructed ab initio from time-series data. A, The two-dimensional projection of the correlation metric construction (CMC), defined by Arkin et al., for the time series shown in figure 1.1. Each point represents the time series of a given species. The closer two points are, the higher the correlation between the respective time series. Black (gray) lines indicate negative (positive) correlation between the respective species. Arrows indicate temporal ordering among species based on the lagged correlations between the their time series. B, The predicted reaction pathway-derived CMC diagram. Its correspondence to the known mechanism of glycolysis is high. (Derived from Arkin et al. [12].)

Furthermore, we note that genomic data can be fruitfully exploited even without the assignment of function: a prognostic test for rejection of transplanted kidneys based on the expression level of three genes is useful even if the function of those three genes is poorly known or not known at all.

Was this article helpful?

0 0

Post a comment