Background on Fold

In this section, we describe several approaches to resolving the second question Q2 posed in subsection 3.3.1. That is, how would one characterize an expression level or a fold change between different experimental states, for instance, preversus postintervention, in our ongoing example study S2, that is most likely or significantly due to changes in the biological state of a target rather than to measurement error or noise.

Most methods that have been presented in the bioinformatics literature assume that, with all probes being equally effective, a gene which has a statistically larger change in expression level is more likely to have that change due to a difference in biological state rather than to noise. It should be noted, however, that unless we have enough replicate measurements to be able to determine the robustness of a gene expression reading, we cannot entirely rule out the possibility that noise may cause large changes in reported expression levels. Conversely, not all small changes in expression level are produced by noise alone and, in fact, gene expression changes which are below a preset threshold of noise might very well trigger significant physiological effects in a system. Typically, biological states are macroscopic in scale, and transitions from one state to another are, arguably, continuous phenomena, whereas noise occurs at microscopic scales and is discontinuous.

For an informative contrast and guide as to how one might proceed in microarray fold analysis, let us briefly review how fold change for a particular mRNA transcript is determined using the traditional Northern blot. Typically in a blotting run, one starts by isolating total RNA on the order of 10 1Ag per gel lane from the system of interest. Each gel lane corresponds to an experimental or replicate condition. Depending on the gel lane to be loaded, the sample load may be spiked with standard controls such as glyceraldehyde-3-phosphate dehydrogenase (GaPDH) for background normalization or loading control calculations. The gel loading step is usually performed manually in approximately uniform quantities. An electric current is passed through the gel complex and by electrophoresis, RNA fragments will move at different speeds in the complex according to their size. Afterward, separated RNA is transferred from the gel onto nitrocellulose membrane to produce a blot. Labeled probes are constructed with radioactivity-labeled nucleotides and are designed to be specific to each RNA. The control and test RNA samples are incubated with the probes. The blot is then washed to remove probes that have not specifically hybridized, exposed to an X-ray film, its image scanned, and the intensity fold change quantified by a molecular or phosphor imager.

In each gel lane, the intensity of the mRNA of interest is first normalized with respect to its loading control probe intensity. Rather than an absolute and precise quantification, it would be more appropriate to describe the task of the phosphor imager as fold estimation. In the scientific literature, it is common to quote integer-valued fold changes such as 2 or 100. It is rare to find reports of 1.01- or 10.307-fold in the literature. For the average biological scientist, these numerical fold values have a qualitative rather than a precise quantitative meaning. At each stage of the blotting protocol, many occasions exist for error and noise to enter into the analysis and subsequently into the final scanned image and quantification, in ways that are not entirely understood or avoidable, that would render a 10 precision of folds biologically redundant or meaningless unless one possesses the error statistic for the blotting and imaging protocol. Another factor governing fold precision is the machine sensitivity of the imager. Recall the setup for a typical microarray analysis. As observed previously, a chip experiment may be regarded as the reverse of performing ~ 104 different Northern blots in parallel on a common two-dimensional substrate of area ~ 10" m2 of probes upon which one hybridizes the labeled target. Each microarray probe set would correspond to its associated mRNA sample in one gel lane without the loading control and, in the case of Affymetrix oligochips, with an additional set of mismatched probe controls for nonspecific hybridization. Each whole chip has its set of control or housekeeping probes corresponding to the loading control in a gel lane of the Northern blot.

In view of their similarities, one may reasonably expect any noise and reproducibility issue affecting a Northern blot to manifest itself as prominently in a microarray experiment. Due to the microscopic dimensions of each probe feature on a microarray, small irregularities during scanning or fabrication could lead to discontinuous and possibly contradictory outcomes that might not be immediately or practically detectable from among 104 other distinct probes. It is equally instructive to reexamine the starting points for the representative fold calculations in both the Northern blot and microarray technologies.

For microarrays, fold analysis starts off from the level of a text file (e.g., a .chp file in the case of the Affymetrix technology) which contains information such as a probe identifier and corresponding indicators of the level of sample probe or transcript that was detected. Theoretically, this text file is a numerical representation of the scanned chip image, specifically of its detected levels of different RNAs, and is generated by the chip manufacturer's software program which implements (often) proprietary statistical image-processing algorithms that are typically opaque to the user. There is always some loss of information in going from the true image to its machine image file, and to the numerical representation of the image file. The relevance of this loss is, of course, context-specific. Normally, in chip data analysis, the bioinformatician will perform statistical tests, classification, or clustering algorithms based on these preprocessed numerical representations of RNA levels alone. Thus the generic chip data analysis conclusions are at least twice removed from the source or microarray image of RNA levels.

In contrast, the end product of a Northern blot analysis is an X-ray film or an image of the blot, and an estimated quantification of fold change for the RNA of interest after the image is processed by a phosphor imager whose working principles might appear to be more transparent and seem closer to the physical phenomenon under investigation. With microarrays, it is possible to begin fold calculations from their image file. First, this approach may require the bioinformatician to develop his or her own image-processing program which might mean that he or she would have to acquire a working knowledge of several specialized disciplines, including image analysis and software engineering, which is a daunting task in itself. Second, the bioinformatician will need to obtain specific information from chip manufacturers about probe identities and their respective coordinates on the array for every different make of chip that he or she will use. Creating this dictionary location-probe feature lookup may be time-intensive, as in the case with some Affymetrix oligonucleotide microarrays where each probe set of 16 to 20 for a transcript is split up and dispersed throughout the physical array.

It should be noted that while the use of the fold or ratio of intensities is a natural way to quantify relative changes in expression level within the framework of Northern blots, it might not be an appropriate relative measure nor might it retain the same meaning when ported over as is into microarray data analysis. In calculating the fold change for a gene between conditions A and B as assayed by micrarrays A1 and B1, one would very frequently and simply take the arithmetic ratio numerical representation of reported gene expression in the A1 and B1 chips. There are problems in doing this. For instance, Affymetrix represents the intensity of a transcript by the Avg Diff which may be a nonpositive number in which case the ratio of nonpositive quantities is physically meaningless. Taking the data from study S2 for example, how does one quantify the fold of Gene 9784 from duplicate states B1 (-4.6) to B2 (18.3)? Furthermore, the ratio is not defined when the denominator is zero. For clarity, we shall mainly concentrate on examples of fold analyses on the .chp text files in the Affymetrix technology of study S2.

0 0

Post a comment