Prototypical Objectives and Questions in Microarray Analyses

We will now discuss common analytic approaches and issues surrounding the basic problems of finding similarities and differences in reported gene expression levels between the following (refer to table 3.3):

• S1. Genes in a microarray experiment, e.g., intra-array comparisons of the expression levels in experiment Em of genes Gk versus Gr in table 3.3, or between genes Gk and Gr in experiments Em through Ey.

• S2. Experiments for a gene, e.g., inter-array comparisons of the expression level of Gk in experiments Em versus Ey, or between experiments Em and Ey across all genes.

Table 3.3: Performance of the 12 identity masks. Each pair of identical experiments gave rise to 12 candidate ID masks. Six of these 12 were derived by method 1 (three with 3 SD and three with 2.5 SD). The other six were derived by method 2 (three with scale 1.00 and three with scale 0.975). Shown here are the percentages of original data points lying outside of the mask region for each of the 12 candidate ID masks derived for experiments A, B, C. A = standard deviation; intensity range or window size of 2000 instead of 1000 is used in experiments B and C for the method 1 trials. (Derived from Tsien et al. [181].)

Table 3.3: Performance of the 12 identity masks. Each pair of identical experiments gave rise to 12 candidate ID masks. Six of these 12 were derived by method 1 (three with 3 SD and three with 2.5 SD). The other six were derived by method 2 (three with scale 1.00 and three with scale 0.975). Shown here are the percentages of original data points lying outside of the mask region for each of the 12 candidate ID masks derived for experiments A, B, C. A = standard deviation; intensity range or window size of 2000 instead of 1000 is used in experiments B and C for the method 1 trials. (Derived from Tsien et al. [181].)

A

Exp A

A

B

Exp B

B

C

Exp C

C

Range Size

1000

5000

9000

2000;1000

5000

9000

2000;1000

5000

9000

Ä = 3

3.1

2.7

2.2

93.2

80.5

1.7

99.6

100.0

2.0

Ä = 2.5

11.1

3.8

3.3

97.3

97.6

2.7

100.0

100.0

2.4

Ä = 1.0

19.2

6.2

0.7

100.0

99.0

2.4

100.0

100.0

2.4

Ä = 0.975

19.2

6.2

0.7

100.0

99.3

2.4

100.0

100.0

2.9

Note that for expositional clarity, we are assuming a one-to-one correspond experiment and a microarray assay.

Note that for expositional clarity, we are assuming a one-to-one correspond experiment and a microarray assay.

ence between an

As has been noted previously, the two most prominent characteristics of microarrays are that they enable the parallel assay of numerous RNA species levels, and the irreproducibility of, and noise in, their reported expression levels. In view of these characteristics, the principal objective of microarray data analysis is to extract or discover knowledge about a biological system from the wealth of gene expression information that is obtained in the wake of each set of chip experiments. Specifically, information here primarily consists of noise-ridden gene expression data which may be pangenomic and which may be supplemented by a priori known biological facts. Knowledge, on the other hand, is certain aspects of the system under investigation that can be elucidated by gene expression data and which includes correlations between genes or conditions, genes whose expression levels distinguish dissimilar biological states, or, at a finer level, regulatory mechanisms and directions of causality between genes, or between conditions and genes. At this point in time, the question of which biological aspect can be unraveled by expression data has neither been entirely explored nor clearly defined.

Table 3.4: A prototypal microarrays experiment data set.

Experiment Em

Experiment Ey

î

Gene Gk

/ A m

x y

i

Gene Gr

A m

x y

i

In the posthybridization analysis, one will inevitably perform the basic intra- and inter-array comparisons S1 and S2. We encounter situation S2, for instance, in studies that seek to find gene markers for biological or physiological states A versus B, where experiments E1 through EmA are mA replicate expression measurements in state A, and EmA+1 through EmA + mB are mB replicate measurements in state B. A time course study where experiments E1 through Em are the expression levels sampled at mtime points t12 ... , tm, is a typical scenario where S1 is used. Furthermore, both S1 and S2 could apply in a study where experiments E1 through E2m are m different temperatures sampled twice per temperature condition. As mentioned in section 3.2, with the irreproducible nature of microarray reported measurements in mind, any attempt to reach the above objective in a consistent and rigorous manner will first require well-defined and mathematically workable notions of reproducibility and similarity of gene expression levels as reported by microarrays.

These definitions should capture or abstract, but not contradict, the biological experience, an immediate corollary of which would be an empirical approach to determine the significance of a reported gene expression level change. Since the full details of the thermodynamical or physical states of a chip experimental system cannot yet be practically obtained, nor can the sum total of all possible noise sources be characterized, any definition of reproducibility would typically have to be predicated upon preanalysis assumptions about the behavior of noise in that system.

Was this article helpful?

0 0

Post a comment