Two examples Interarray and intraarray

For heuristic purposes, we will describe inter-microarray and intra-microarray analysis situations which capture the most essential related features of S1 and S2 from the introduction of this chapter. While these examples only involve Affymetrix oligonucleotide microarrays, the analytic approach and problems are more general in scope and the methodologies can generally be extended to many other gene expression assaying technologies following easy domain-specific modifications.

Study S1: Yeast cell cycle A mycologist wishes to find groups of yeast genes whose expression profiles are similar over a 24-hour period. For this, she or he obtains gene expression measurements using Affymetrix S98 yeast genome microarrays for a synchronized sample of yeast cells over a 24-hour period by sampling the total RNA from this population at 30-minute intervals. A total of 48 separate time points were sampled twice (for duplicate measurements) at each time point: T1, ... , 7"48 (and replicates 721, ... , 7248).

Study S2: A drug in tervention study A pharmacologist wishes to characterize the effect(s) of drug X 3 hours after it has been introduced into normal adult wild-type mice by the expression levels of liver cell genes that can be probed using Affymetrix Mu11K microarrays. The gene expression profiles of normal adult mice liver cells that are not treated with drug X are used as the control state. Call the preintervention or control state A, and the postintervention state B. For replicate measurements, liver samples were obtained without drug X application from MA adult mice and another MB adult mice liver samples were obtained after drug X was applied.

Detailed specifications of the Affymetrix technology are available in the Affymetrix GeneChip analysis manual [10]. A technical outline of this technology, namely on the interpretation of .eel and .chp files, is found in section 3.1.2. Affymetrix Mu11K murine chips, e.g., have total probe sets for N = 13,179 unique transcripts, including "housekeeping genes," whereas the yeast genome S98

array has N H 6400.

The study properly begins at the experimental design stage: Mice liver explants in study S2 are harvested and prepared under identical conditions—these most obviously being similar time of day, environmental conditions, and precautionary procedures taken to minimize the effects of non-X-related artifacts and noise in the gene expression, and mutatis mutandis for the yeast cells in study S1. Controlling for extraneous factors, which are potentially microarray-detectable in inter-array study S2, is clearly more difficult and some other factors that should be considered include the age, physical condition, and sex of the mice candidates. Despite the most stringent replicate protocol, it is clear that there always exists some biological or biochemical variation between any two mice, any two cells taken from a common host, or even between any two synchronized yeast cells. In practice, one can only try one's best to exclude explants from mice whose observed or known physiological state is believed to be associated with a different response pattern to X and which might skew the experimental data significantly. With the obvious modifications, whole yeast cells are harvested and duplicates are collected similarly at each of the 48 time points. Following a well-designed analysis scheme, one would hope to be able to classify such negligible variations as noise. To this end, it is important that replicate experiments be run for any chip study. Replicate measurements allow one to quantify the extent to which variations were detectable by microarrays, and to decide whether this noise will confound the reported measurements and attendant analyses. Study S2 has MA and MB replicate experiments for each state. Chapter 2 presents a comprehensive treatment of an experimental design in the presence of noise.

For the remainder of this chapter, we will primarily be concentrating on inter-array study S2, and developing the mathematical tools, e.g., normalization techniques and measures of (dis)similarity which will be used in chapter 4 to address analyses which are essentially intra-array such as study S1. We note, however, that most of the questions and methodologies in inter-array analyses are equally relevant to the intra-array instance, with the obvious contextual modifications. In analyses involving Affymetrix oligochip data, the .chp file is the usual starting point for a typical microarray analysis, (see section 3.1.2). From this file, we will be considering the Probe Set Name, Avg Diff and Abs Call columns. Consolidating the columns of the MA, MB .chp files from study S2 into a table, we have table 3.3.1:

Let Aj and Bj denote the Avg Diff value corresponding to a microarray experiment before and after drug X intervention, respectively, with duplicates j = 1, 2. Note that the replicate hybridizations A1 and A2 might have been scanned into image files using the different user-defined settings or under different ambient conditions beyond the control of the experimenter. Therefore, the most natural question that one asks in any inter-chip study is whether one can validly compare expression levels, represented by the Avg Diff, for any one gene across .chp files that correspond to separate microarray experiments. For instance, for Gene 9785 in table 3.3.1 we may ask:

• Q1. Do order relations (e.g., <, e, =) of reported Avg Diff values for Gene 9785 in different experimental conditions imply comparisons between the true expression level of the measured gene under these conditions? In particular, while one can convincingly argue that the expression level of Gene 9785 is less intense than the level for Gene 9784 because the intra-chip Avg Diff of Gene 9785 (250.5) is smaller than that of Gene 9784 (1211.1) in experiment B1—assuming that all probes are equally effective—it is not immediately clear whether it is reasonable to claim that the true expression level of Gene 9785 in condition B2 is greater than in condition B1 even though the inter-chip numerical Avg Diff of Gene 9785 in B2 (141.2) is greater that the Avg Diff in B1 (110.7).

• Q2. How would one characterize or distinguish the expression (e.g., fold) change from pre-to postintervention states for Gene 9785 that is most likely due to a drug X intervention from an expression change that is a caused by measurement variations or, more broadly, noise? Table 3.5: Avg Diff (Abs Call) data for inter-array study S2 where MA = MB = 2

Probe Set Name





Gene 1

64.3 (P)

248.2 (P)



Gene 9784

1211.1 (P)

1250.0 (P)


Gene 9785

250.5 (P)

193.3 (P)


Gene 9786

'54.3 (A)

-3.1 (A)

1.1 (A)

'2.0 (A)


Gene 13,179

"0.9 (P)

'07.7 (A)


Note that in the ideal system, where every chip assays every transcript equally well and the experiments have been designed so that only expression changes which are a direct consequence of drug X intervention are recorded by the chips with all other attendant thermodynamical or physical variations being negligible and not microarray-detectable, these questions possess trivial answers. Namely, we expect that being replicates, the Avg Diffs B^ = B2 and A^ = A2 for each gene, and assuming that A, Bj are positive-valued, Bj\Aj quantifies fold change for each gene after intervention. Such utopian scenarios do not appear to be achievable at any time in the near future. In the following sections, we describe several ways to address the two preceding questions under different assumptions and conditions that are commonly used in the bioinformatics literature.

Was this article helpful?

0 0

Post a comment