Replicate Experiments Reproducibility and Noise

A ubiquitous and underappreciated problem in microarray analysis is the incidence of microarrays reporting nonequivalent levels of an mRNA or the expression of a gene for a system under replicate experimental conditions. This phenomenon of microarray data irreproducibility is widely attributed to noise in the bioinformatics literature. For example, a common sample probe or target pool that is split and hybridized simultaneously to two separate chips of the same make and following the same scanning protocol will almost surely result in two, not exactly similar-looking, graphic representations of RNA abundance. These images will subsequently translate into two quantitatively different expression data files (e.g., .eel or .chp files on the Affymetrix platform) after they are fed into image-processing and statistical software. This situation is akin to saying that two Northern blot assays for a particular RNA species in a split total RNA sample, having been processed under an identical protocol and conditions, led to differentX-ray film images of the RNA intensity, thereby resulting in two different reports of expression intensity by a phosphor imager, which is clearly undesirable.

The ir reproducibility of a measurement reading—that is not entirely attributable to poor experimental design or practice—is neither new nor endemic to the microarray technology. One encounters this same problem in any measurement of a physical quantity, especially when these quantities exist on microscopic scales or when the measurement device is highly sensitive. Let us consider a thought experiment illustrating the latter point: Get 500 ml of well-mixed NaCl solution (i.e., salt water). Split this solution into two samples A and B, and measure the NaCl concentration in both samples with a series of k measuring devices of increasing sensitivity which report concentrations up to an accuracy of 10 kkg mL' for k {1, 2, ...} respectively. Let CA(k) denote the NaCl concentration in sample A as reported by device k, and likewise define CB(k). When the device sensitivity is low, say, when k = 0, 1, 2, the device would probably not detect any concentration differences between A and B, i.e., CA(k) = CB(k) or more concretely, CA(0) = CB(0) = 5, CA(2) = CB(2) = 5.01, say. As the device sensitivity increases, say, for k > 9, typically negligible physical factors such as small pockets of uneven NaCl concentration in A and B might be within the device detection limit and be recorded so that CA(k) * CB(k) or more concretely, Ca(10) = 5.0100000001 * Cb(10) = 5.0100000002, say.

The usual solution in these situations is to perform as many repeat or replicate measurements as are feasible and practical, with the hope that a statistic of these repeats converges asymptotically to the true measure of that quantity with the number of replicate measurements following the law of large numbers—provided certain conditions concerning the stochastic independence and distribution of these repeats are met. This approach may minimize, on average, certain systematic noise, such as measurement errors, but it cannot practically resolve more diverse manifestations of noise such as ones originating from biological variation in the model samples under investigation. Furthermore, there is the added issue of material cost with regard to replicate microarray measurements. Researchers today rarely perform more than 3 replicate chip assays per experimental condition due to the relatively high cost per microarray experiment.

In studies involving microarrays, replicate experiments are especially important for, but not limited to, these following reasons:

• Microarray data are often employed during the earliest stages of studies that are not hypothesis driven and are primarily exploratory in nature. In such cases, the attendant analyses and conclusions will typically provide hypotheses (e.g., candidate ESTs) for more focused laboratory investigation at the later stages of the study. For these endeavors, further pursuit of a false-positive conclusion can potentially be costly in terms of resources and time.

• It is generally good scientific experimental practice, whenever possible and feasible, to replicate a experiment to verify an earlier, unconfirmed quantitation.

• Data from replicate experiments provide a better quantitative understanding of the extent of noise (see section 3.2.5) inherent in both the system under investigation and the measurement device [116]. The effect of noise is typically proportional to the level of sensitivity of the measuring device. As we have previously noted, the amount of total RNA for a Northern blot assay of a single RNA is more than enough for a typical microarray assay of more that 104 different mRNA species levels in today's technology. Furthermore, the noise effect might not scale linearly with the detected expression level and may be sequence dependent.

The terms replicate, reproducibility, and noise are explained in the next section.

Was this article helpful?

0 0

Post a comment