Gene

Figure 2.5: Apparently curvilinear relationship in expression space between two genes. This curvilinear relationship would be obscured if the expression space had been insufficiently exercised. 2.1.3 Exercising the expression space

In section 2.1.1, we described how each experiment can be viewed as an exploration of the space of all possible expression patterns. In exploring this expression space, the investigator is constrained to the space of possible experiments. Informally,the goal of experiments should be to maximally exercise the genome bringing out the couplings or correlations between genes that are of greatest relevance to the biological process being studied. We have found the following "watch metaphor" to be of some use in explaining the interaction between expression space and experiment design space. We leave it to the reader to judge whether it is helpful, or simply a confusing distraction.

The watch metaphor Suppose, that we were interested in discovering how an old analog watch of the sort illustrated in figure 2.6 worked. Simply watching the hands of the watch going around might allow us to generate some hypotheses about the mechanisms that generate this motion or behavior. Empirically, we could derive the relationship between the movement of the minute hand and the hour hand, but the total sum of mechanisms that could explain the behavior would be large. A more invasive investigation would be to take a hammer to the watch and examine the pieces that fall out of the watch's case (figure 2.7). The combination of the prior observed empirical behavior and the examination of these pieces might give us a better, if still grossly incomplete hypothesis (not to mention the loss of a costly watch). That is, the number of possible mechanisms explaining the observed behavior of the watch hands would have been substantially reduced. Furthermore, if one had no other similar watch, these hypotheses might not be verifiable on account of its destruction. A more sophisticated approach might be to pry open the back of the watch and observe the movement of the gears and escape mechanisms. With these observations, and one or both of the prior kinds of investigations, an even better understanding can be obtained of how the various components of the watch are mechanistically related to one another. All of these experiments are observational studies, and at best we could generate some fairly accurate hypotheses without ever demonstrating causality. Causality would be most convincingly demonstrated by intervening in the operating mechanism, e.g., winding the watch (partly or until the mechanism breaks), shaking the watch, immobilizing one of the gears, or replacing a gear with one of different-sized teeth. These interventions are progressively more technically challenging but also provide better insights and often confirmation of the hypotheses generated during the earlier observational studies.

Figure 2.6: Noninvasive monitoring of mechanistic behavior. By observing the concerted action of the watch hands, several competing weak hypotheses about the underlying mechanism can be generated. The more observations made at different times, the smaller the number of possible hypotheses generated.

Figure 2.7: Decomposition of the watch. An invasive exploration of mechanism will reveal most of the components of that mechanism. In the process of the invasive investigation some of the components will be damaged. Also, the relationship of these components inside a working watch must be inferred and is not directly observed because of the invasive nature of this investigation. The analogy with expression studies, with the indulgence of the reader, can be made as summarized in table 2.1. These describe different kinds of experiments— distinct points in the potential experiment space. Some of these experiments allow the components of the watch to work in their usual fashion, i.e., within their usual "expression space," whereas others bring them into an operating mode or expression space in which they do not usually function. The question the watch investigator would have to ask himself or herself is what part of the expression space of the watch will best inform him or her of the normal function of the watch. It may be that smashing the watch is the only part of the experiment space that is cost effective or feasible for the investigator. The smashed or overwound state of the watch may not accurately reflect the normal intact and possibly more interesting state of the watch's expression space.

Figure 2.7: Decomposition of the watch. An invasive exploration of mechanism will reveal most of the components of that mechanism. In the process of the invasive investigation some of the components will be damaged. Also, the relationship of these components inside a working watch must be inferred and is not directly observed because of the invasive nature of this investigation. The analogy with expression studies, with the indulgence of the reader, can be made as summarized in table 2.1. These describe different kinds of experiments— distinct points in the potential experiment space. Some of these experiments allow the components of the watch to work in their usual fashion, i.e., within their usual "expression space," whereas others bring them into an operating mode or expression space in which they do not usually function. The question the watch investigator would have to ask himself or herself is what part of the expression space of the watch will best inform him or her of the normal function of the watch. It may be that smashing the watch is the only part of the experiment space that is cost effective or feasible for the investigator. The smashed or overwound state of the watch may not accurately reflect the normal intact and possibly more interesting state of the watch's expression space.

Table 2.1: Exploring the experiment space with a watch

Experiment Space in Watch

Experiment Space Biological Analogy

Observing the movement of the hour, minute hands

Hypothesizing cellular function based on gross cellular behavior

Smashing the watch, examining the pieces

Sequencing some of the genes in the cell or measuring a few proteins or RNA transcripts

Seeing all the gears working from the back of the opened watch

Parallel measurement of thousands of genes at a time Time series measurements

Partially winding up the watch

Physiological stimulus

Winding the watch until it is near breaking or until it breaks, thereby bringing the internal gears of the watch into behavior they would not typically exhibit

Pharmacological stimulus that brings the gene expression patterns into an operating region outside the usual physiological range

Removing a gear

Creating an organism with a gene "knocked out"

Replacing a gear with one of different weight

Creating a transgenic organism

To take these analogies one step further, there are multiple arrangements of gears that could account for the observed interactions between these gears and their effect on the hour hand and minute hand. Similarly, although perturbations of the genome can provide insight into the interactions among the major genes involved in a given pathological pathway, they do not reveal the entire pathway. Rather they expose a significantly large family of possible mechanisms that could account for the observed behavior. We address this issue in greater depth in the section on pathway re-engineering (section 4.13.2). Nevertheless, as suggested earlier, there are some misconceptions about what can be done on the basis of microarray expression studies implicit in some of the current genomic and bioinformatics literature.

Misconceptions in "reverse engineering" To date, there have been several efforts to try to reconstruct pathways from gene expression measurements [57]. Given the success of recapitulating the first few steps of the glycolytic pathway from substrate measurements [12], it would appear plausible to do this. However, without careful consideration of the pathways that one is trying to reconstruct, there is significant risk of a methodological and metaphorical flaw in this analysis.

It is important to realize that pathways could exist on several molecular and physiological levels. The genes that are regulated in one pathway may play a role in another pathway. As an example, let us consider the genes coding for lactic acid dehydrogenase: The gene LDHA is expressed mostly in muscle and is located on chromosome 11p15.4, while LDHB is located on chromosome 12p12.2 and is expressed mostly in heart. LDHA is known to have binding sites for HIF-1, CREB1/ATF-1, and p300/CREB binding protein [62], as well as binding sites for NRE, NREBP/c-FOS, AP1, AP2, and other transcription factors in its promoter region [104]. LDHB is thoughttohave a binding site for SP1 [159]. There is also likely to be a post-transcriptional level regulation of these two genes. Because of the differences in promoter regions, it is safe to assume that both LDHA and LDHB participate in their own gene expression regulatory networks.

Both LDHA and LDHB code for protein subunits. The final protein product is made up of four subunits. Various combinations of the two types of subunits are assembled into five different lactic acid dehydrogenases, from LDH-1 (containing four LDHB subunits), to LDH-5 (containing four LDHA subunits). These five enzymes are found in a binomial distribution in mammals. Additionally, the final protein itself is an enzyme that assists in the conversion of lactic acid to pyruvate, called lactic acid dehydrogenase, and defined by the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology as Enzyme Commission (EC) 1.1.1.27. Lactic acid and pyruvate happen to be substrates in the glycolytic pathway of anaerobic energy production.

Thus, in this one simple example, one finds at least three pathways, as shown in figure 2.8. The operator at the enzyme level pathway participates as one enzyme in a major biochemical substrate pathway, that of glycolysis. That operator, however, is assembled stochastically in the assembly pathway from subunit components. Finally, those subunits are produced within their own genetic regulatory pathways involving a multitude of transcription factors. Considering the three layers of pathways, it should now become apparent that it may not be possible to "reconstruct" the glycolytic pathway simply from measuring multiple gene expressions. However, on a more optimistic note, given enough gene expression data, one should be able to reconstruct the first layer of these pathways, i.e., that of gene expression regulation and involving transcription factors and promoter regions.

Figure 2.8: An example of why discovery of pathways solely using gene expression measurements is difficult. At least three "pathways" are involved in the conversion of lactic acid to pyruvate. The highest enzyme level pathway involves the action of an enzyme called lactic acid dehydrogenase, designated EC 1.1.1.27. However, when one traverses to a lower-level pathway, one learns that the role of this enzyme is performed by five separate protein products, which appear in a binomial distribution. When one traverses lower, through the assembly pathway, one learns this distribution is present because of the various possible combinations of LDHA and LDHB, two individual protein subunits, of which four must be put together to make an active enzyme. Then, when one traverses even lower, through the genetic regulatory pathway, one learns that each of these subunits is regulated differently, and appears on different chromosomes. Not shown are the pathways involved in transcription, translation, movement of proteins to the appropriate locations, and movement of substrates into and out of the cells and organelles, and many other pathways. How vigorously should a model system be stimulated or perturbed? Let's revisit the phrase "maximally exercising the genome." What constitutes a maximum perturbation will in part be defined by the goals of the investigator. If she or he wishes to understand the interactions between the genes under normal physiological conditions, then it may be that only relatively minor perturbations followed across time will be informative. Even though extreme perturbations are easier to measure, they might not accurately represent interactions that occur over the normal dynamic range of activities of the RNA species, structural proteins, and enzymes programmatically generated by the genome. There are, however, several circumstances when it may be advantageous to perturb the relationships between the various constituencies of the transcriptome much more violently and dramatically than is found under normal physiological or homeostatic conditions. An example of such a motivation might be to measure the effect of pharmacological doses, rather then physiological doses, of a pharmaceutical. The intended effect may lie outside the normal operating range of the genetic machinery, as is often the case in pharmacology. Other times, the goal may be to examine biological systems which naturally fall far outside the usual operating parameters of the genomic machinery. This includes a variety of disease states such as a malignancy or diabetes, where there are single or multiple massive perturbations of the usual homeostatic system, and therefore a perturbation of the model system that corresponds to the pathological condition would be appropriate. It is worthwhile considering some extreme examples to illustrate the importance of this decision.

For example, if one is trying to understand the relationship between a set of genes thought to be involved in the effect of insulin signaling, it may not be particularly helpful to start with a sample population of human adipose tissue (fat) which is known to be insulin sensitive, but obtained from individuals who are all in the fasting state when insulin levels are low or absent. That is, there may be insignificant insulin exposure or stimulating effect of the insulin-responsive cellular machinery. Consequently, the experiment design subspace sampled will be too small to adequately explore the expression space of interest. Furthermore, if the sample population is ethnically homogeneous or, in the case of a mouse experiment, shares the same genetic background, then these samples will only inform us of the reproducibility and noise of the biology and the measurement system. In contrast, if the same population were exposed to varying degrees of insulin, then the behavior of the genetic program in response to insulin might be better understood. Alternatively, taking a population known to have varying degrees of insulin resistance would also provide a more diverse range or heterogeneity of interactions in insulin signaling, which would enable the investigator to explore the space of possible relationships between the genes. This would then allow the application of a machine-learning algorithm to elucidate these dependencies. If the degree of insulin resistance is clinically measured (e.g., as obtained in a carefully performed hyperinsulinemic euglycemic clamp[3]) in close temporal proximity to the expression measurements, then the investigator will probably have the most accurate control of the experiment design space. The goal then is to stress or stimulate the cellular regulatory system, which in this instance is glucose homeostasis and insulin signaling, so that the genes "show us" how they interoperate under normal, and pathological or pharmacological circumstances. There are several other obvious variations of these kinds of stimulations such as:

1. Examine the expression profile in adipose tissue at different points in time after the glucose exposure to understand the timing of the genetic program. We should not expect all genes to act at the same time or rate, and therefore sampling the changes in the transcriptome across multiple time points at the appropriate frequency will be much more revealing than just one or two points in time.

2. Assess gene expression at different fixed concentrations of insulin.

3. Assess gene expression in ethnically diverse populations. The difference in genetic backgrounds of these populations will give a sense of how much variation in the signaling maybe due to genes outside the set that is being studied in a particular expression experiment. This implies that we would need to hypothesize that changes in the genetic background affect the functioning and thus the functional genomics of the system. This presupposes that

♦ insulin production and insulin effect is the combined result of the activity of several genes;

♦ polymorphisms in these genes are distributed heterogeneously across human populations;

♦ such polymorphisms result in significant changes in gene expression or function, or both.

4. Explore the pharmacological effects of insulin. After a certain dosage level, increasing levels of insulin may no longer result in increased effects of glucose transport. However, increasing insulin may affect other physiological systems within the same cell, as, e.g., transporting ions across the cell membrane. Taking tissue samples at pharmacological concentrations of insulin and comparing their expression profile to those obtained with insulin concentrations in the physiological range would allow determination of genetic mechanisms that respond to the pharmacological dosing.

In summary of our discussion, we note that an appropriate set of (microarray) experiments is that in which the cellular response to the stimuli involved in the experiments will exercise the interaction between genes over the range of interest and relevance to the biologist-investigator. This critical aspect of the experimental design will trump any subsequent choice of bioinformatic analytic technique. Conversely, no bioinformatics analysis will be sufficient to extract functional genomic knowledge if the expression space is not adequately explored.

Was this article helpful?

0 0

Post a comment