Reproducibility across repeated microarray experiments Absolute expression level and fold difference

Akin to situation E5 above, suppose that a common total RNA sample is split and the expression levels of the different RNA transripts within each split sample are measured on a separate microarray of the same lot. For such situations, the current bioinformatics literature typically reports the Pearson's correlation coefficient of expression intensities of one chip reading versus the other as being close to r = 1.0, i.e., the separate readings are highly correlated, and used this to conclude that the experimental assay is highly reproducible. We recall that in order to obtain any sort of meaningful statistic out of the Pearson's correlation coefficient, one typically has to assume that the paired expression data point for each gene is independent of those of the other genes. But howare we to account for individual genes whose expressions as reported by each chip are not exactly equal?

To date, few studies focus on the reproducibility of microarray measurements [116]. In a publicly available document [9], Incyte Pharmeceuticals demonstrated a high concordance between RNA expression measurements on cDNA chips using Cy3 and Cy5 dye signals. Based on this finding, Incyte estimates the limit of detection of fold differences at 1.8, meaning to them that 95% of fold differences between samples of 2.0-fold and higher are significant From measuring the expression levels of 120 genes in various cancer cell lines using cDNA spotted filters, Bertucci et al. [24] showed that close to 98% of the measurements showed less than a 2.0-fold difference upon repeat. Richmond et al. [152], in their study of differentially expressed genes in Escherichia coli, filtered out genes below a minimum absolute expression threshold and ones with less than a 5.0-fold difference. Geiss et al. [75] used a Cy3/Cy5 system to measure genes differentially expressed during human immunodeficiency virus (HIV) infection in which they determined that fold differences of as little as 1.5 were statistically significant. This minimal fold threshold was determined by excluding 95% of the expression measurements that were reported, and not by using information-theoretic methodologies. Publications citing fold differences between control and test groups which are as low as 1.7-fold continue to be published, e.g., [117].

The foregoing comparative studies do not address the reproducibility of fold differences if entire microarray experiments were repeated. Butte et al. [40] investigated this topic using Affymetrix Hu35K oligonucleotide microarrays containing 35,714 unique probes to measure RNA expression levels of muscle biopsies from 4 patients: P1, P2, P3, P4. Duplicated measurements were made from each patient sample: P21, P22, P23, P24.

A linear regression normalization procedure—outlined in section 3.4.1—was applied on all the chip data with respect to the data from patient P1. Intra-patient logarithmic fold differences (LFDs) were then calculated between the duplicated measurements for each of the four patients. Furthermore, inter-patient LFDs were calculated between all 6 possible pairs of patients (P1 vs. P2, P1 vs. P3, etc.) and their duplicates measurements (P21 vs. P22, P21 vs. P23, etc.). The LFD was used throughout this analysis so that the levels of up and down fold regulation were numerically equal in magnitude but oppositely signed.

Ideally, if the oligochips assays were perfectly reproducible across the duplicates, one would expect that Pearson's correlation coefficients at 1.0 between the four repeated expression measurements, and intra-patient LFDs calculated on each of the duplicated measurements to be the same, i.e., if gene A is noted to be 10 times higher in P2 versus P1, then it should ideally and similarly be 10 times higher in P22 versus P21. However, the Pearson's correlation coefficients between duplicate expression measurements of the four patients across all 35,714 probes were .76, .84, .78 and .82 (see figure 3.9). Furthermore, the cross-probe Pearson's correlation coefficients for the six paired (replicate) inter-patient LFDs were near .0 as shown in figure 3.10 where the first graph shows the plot of log(P2/P1) versus log(P22/P12). Further analysis showed that this poor correlation coefficient in the replicated LFD was primarily due to small absolute expression values: When folds are calculated using a pair of numbers where the denominator quantity is small (i.e., the gene was found by the microarray to be not highly expressed), a high fold difference is typically the result. This is particularly problematic as the effect of noise in microarray assays today appears to be more pronounced at lower absolute expression levels.

Figure 3.9: Expression measurements made in duplicate from the same RNA samples do not correlate well all the time. RNA samples from four human samples were placed on duplicate oligochips and the expression of 35,714 ESTs was measured. Each point represents an EST. The duplicate expression measurements are plotted here on a log-log scale (base 10). r = .69, .73, .73, .69. (From Butte et al. [40].)

Figure 3.9: Expression measurements made in duplicate from the same RNA samples do not correlate well all the time. RNA samples from four human samples were placed on duplicate oligochips and the expression of 35,714 ESTs was measured. Each point represents an EST. The duplicate expression measurements are plotted here on a log-log scale (base 10). r = .69, .73, .73, .69. (From Butte et al. [40].)

Figure 3.10: When expression measurements do not correlate well, fold differences correlate even poorer. Fold differences of 35,714 ESTs were calculated between the six possible pairings of the four patients. Fold differences are expressed in base 10 logarithm, so that ESTs that did not change between models are plotted in the center of each graph. Fold differences from the duplicated measures are shown on the x- and y-axes. Even though the correlation coefficients were high between original and repeated expression values, the correlation coefficients were very low between original and repeated calculated fold differences. (From Butte et al., [40].) It is worth noting that even when the reported absolute expression measurements are reasonably reproducible between replicate measurements, the fold differences calculated from these reported measurements may not be as reproducible across replicate experiments. As mentioned above, the fold or ratio of these expression quantities is very sensitive to small numerical perturbations, especially in its denominator quantity. This is best explained with an illustration: Say that we carry out a pair of experiments P1, P2, and that the expression levels of 2 genes Gj and Gk reported in each experiment are (1.0, 2.0) for Gk and (500.0, 1000.0) for Gj, so that the fold change for both genes from P1 to P2 are both 2.0(= 2.0/1.0 = 1000.0/500.0). Suppose we further carry out of pair of replicate experiments P21, P22, and that the reported expression levels are now (1.5, 2.0) for Gj and (500.5, 1000.0) for Gk. Note that the denominator quantity in the replicate P21 experiment is perturbed by +0.5 from the reported measurement in P1—an effect one typically sees in the presence of noise. Let us suppose that the overall absolute expression measurements P1 to P21 and P2 to P22 are reasonably reproducible, i.e., the intensity-intensity Pearson's correlation coefficients between the duplicate experiments are greater than .99. Note, however, that the resulting fold changes calculated using these new values are now 1.333(= 2.0/1.5) for Gj and 1.998(= 1000/500.5) for Gk. Clearly, the robustness of the fold is dependent upon the absolute expression measurement, as we see here that the fold of gene Gj is more stable to noise than gene Gk. For a detailed discussion of the appropriateness of using the fold as a measure of expression change apropos microarray data, we refer the reader to section 3.5.

The key points of this subsection are as follows:

• When entire microarray experiments are not replicated, one would not be aware of, or be able to quantify, the extant irreproducibility in the experimental data and design, and false biological conclusions may potentially be generated.

• Even if the gene expression measurements from a single sample placed on two microarrays appears to have reasonable correlation coefficient, the fold differences calculated using those reported measurements may not reproduce as well between replicates at [40].

Based on this experience, we now recommend at least two to three replicates for each data point. Triplicates are certainly preferable for in worst-case scenarios, it is unclear which of the duplicate measurements is more correct. Preferably, the RNA sample for each replicate should be generated separately to maximize noise inclusiveness in the replicate model.

Was this article helpful?

0 0

Post a comment