Expression space

Now that we have defined the experiment design space, we define the expression space to be the space of potential expression values of all genes in a given genome. In the context of microarray measurements, this is a large multidimensional space—with each dimension corresponding to a single gene—and it can be immediately partitioned into several overlapping subspaces. For example, coordinated gene expression measurements made from normal muscle tissue under normal conditions can be represented by a subspace of expression space which will most likely have a nonempty intersection with the subspace of coordinated gene expression measurements made from diseased muscle tissue. We often repeat mantra-like to ourselves the phrase "A cell is a cell is a cell," by which we mean that in order to survive as living and organized entities, most cells share a large repertory of biological processes, whether they be components of muscle tissue, liver tissue, or white blood cells. That is, the expression subspaces of most cell types have significant overlap.

Expression space contains clues about the interactions between two or more clusters of genes. As an example, we start with the simple case of three arbitrary genes: insulin-like growth factor I (IGF-1), insulin-like growth factor binding factor-1(IGF-BP1), and fibroblast growth factor 2 (FGF2). The regulatory relationship between these three genes[2] is, in some tissues, that one gene partially determines the expression level of the other two. This will cause the three genes to move in a coordinated fashion across the range of their respective possible values. That is, if this is a particularly tight regulation system, then the expressions of this triplet will only be found in a small subspace of all possible loci of the expression space. If this were a loosely regulated or unregulated system, it is possible that the gene expression values might be found in all parts of the expression space, as shown in figure 2.2 below. The totality of the potential loci that these genes can occupy constitutes the potential set of all coordinated gene expression levels, or expression space. The subset of the expression space wherein a cellular system has its expression under all stimuli is called the transcriptome [41, 184]. More precisely, the transcriptome encompasses the expression space of all putative 30,000 genes in the human genome (or 100,000 gene products if current alternative splicing estimates are correct). Each point in the human transcriptome corresponds to the possible values that each gene's expression might attain under any physiological or pharmacological stimulus.

Figure 2.2: Expression space in which gene expression is loosely versus tightly coupled. If genes are tightly coupled in the expression space, they will tend to occupy a small subspace of expression space over any set of experiments. If, however, these genes are loosely coupled or even causally unrelated, then their expression levels will have little relation to one another and therefore tend to be scattered over a greater volume of the expression space.

The meta-goal of the experimentalist/investigator then is to define the experiment design space that provides the best and most complete picture of the entire transcriptome or a subset of the expression space of interest. This is the point in experimental design where a great deal of domain-dependent knowledge, i.e., specific or a priori biological knowledge, is of significant utility. The investigator must ask herself or himself: "What is the minimal set of experimental conditions, i.e., the smallest subspace of the experiment design space, that will cause the largest set of relevant subspaces of the expression space to be sampled?" In order to answer this question, the investigator must first understand how each experimental condition could potentially affect the expression of potentially interesting genes and be able to quantify its effect. Second, the investigator must consider all possible subspaces or interactions between genes that are of interest. For example, let us say that we are only interested in the interactions between IGF-1 and IGF-BP1 in the example earlier in the section and we wish to determine how the relationship between these two genes changes under both physiological and pharmacological conditions. Then we would obtain those sets of experiments in the experiment design space which would most likely cause one or both of these genes to be expressed over the broadest range of their possible values so that all the evidence of the interactions would be captured, i.e., the subspace of experiments that covers the largest plane in the IGF-1 and IGF-BP1 expression subspace. A researcher might be interested in the signaling mechanisms of IGF-1 once it has bound to IBF-BP1 and how it affects the liver's expression of other genes. These genes may code for many proteins involved in postreceptor signaling, including possibly IGF-BP1 itself. With a finite budget and a finite source of tissue or animals, the investigator will have to choose between alternative strategies. A "mutant screening" strategy might be to assay expression levels in the liver over a set of knockout mice in which each element of the set constitutes a knockout of a different putative element of the IGF-1 signaling pathway, whereas a "physiological stimulus" strategy, would be to take a single element of the set (one of the knockout strains or a wild-type animal) and subject it to a sequence of different concentrations of extracellular IGF-1. Determining which of these alternatives provides the best exploration of the expression space will require an understanding of whether the range of mutants or physiological stimuli engages the portion of the genetic machinery that is relevant, and how close to the normal homeostatic behavior or pathological behavior of interest is the experiment intended to explore.

Figures 2.3, 2.4, and 2.5 illustrate the pitfalls of insufficiently exercising the regulatory mechanisms of gene expression. In figure 2.3, we observe the joint behavior of two genes in which there exists a strong component of randomness in the relationship between the two genes—as evident from the low Pearson's correlation coefficient that is calculated between these two genes within the current range of expression levels. This is also summarized by the Pearson's correlation coefficient for the linear regression calculated between these two genes, which is calculated to be very low. However, if an additional stimulus is applied to the regulatory system such that gene 1 has a wider range of expression, then despite the randomness in the relationship between the two genes observed in the previous range of expressions, we begin to see what appears to be a linear relationship with a much larger Pearson's correlation coefficient (figure 2.4). If the expression level of gene 1 is perturbed to an even wider range, it becomes apparent that the relationship is not linear but, in fact, curvilinear, possibly even a quadratic relationship, as shown in figure 2.5. If the experiment space was incorrectly selected, then these and many other pairs of genes like it would be assumed to be poorly or randomly related when, in fact, with the right stimulus, a tight curvilinear relationship would be determined, and vice versa.

Figure 2.3: Apparently random relationship in expression space between two genes.

Figure 2.4: Apparently linear relationship in expression space between two genes.

Figure 2.4: Apparently linear relationship in expression space between two genes.

Was this article helpful?

0 0

Post a comment