Dna Microarray Platforms

In the following subsections, we will develop that DNA microarray model introduced in section 2 by detailing the two most commonly used DNA microarray platforms. The basic principles for the cDNA array remains true to that first described by Schena and Brown. The authors helped ensure wide distribution of the technology by making available protocols for building a robot spotter to produce DNA chips, a scanner to analyze them, and the software to run the experiment (17). With these protocols, a number of early investigators with the technical expertise were able to set up in-house chip production and analysis facilities for $50,000-$100,000 (18). With the initial facility, the marginal cost of producing individual chips was low. Commercially available equipment also appeared in the late 1990s and many institutions adopted the cDNA platform by establishing internal chip production facilities. A number of biotech firms also produced individual chips for sale; however, the individual chip market from the beginning has been dominated by an alternate proprietary technology; the oligonucleotide array, which we will discuss in Sections 3.4-3.7.

The availability of two alternative DNA microarray platforms has led to an ongoing debate over which is superior. We will touch on the strengths and weaknesses of the individual platforms in the following subsections; however, it is premature to declare one technology a clear victor. Instead, the choice of cDNA array vs oligonucleotide array is influenced most by local availability, cost, and institutional experience. Although there are significant differences between the platforms, the underlying design and execution of DNA microarray assays is similar and follows the format shown in Table 1 (19).

3.1. cDNA ARRAY CHIP PRODUCTION An investigator wanting to use the cDNA array platform can purchase a commercially available chip or more commonly independently manufacture the chip as illustrated in Fig. 1. This review summarizes concepts needed to appreciate the limitations of the technology, noting that protocol details are widely available. Before construction of a cDNA microarray, the investigator needs to select DNA clones, known as "probes" in the vocabulary of the microarray. For the purposes of this review, we will refer to the cDNA (or mRNA) from the sample as the "target," with the spotted DNA termed the "probe." We note that there is not universal consensus on these terms and readers might find them used differently elsewhere. Clone selection is governed by logistic and scientific considerations, of which clone availability is chief. Surprisingly, clone number is a minor component, as 30,000-50,000 probes can be spotted on a single slide, a number that overlaps the estimated total number of human

Table 1

Elements of DNA Microarray Assay

1. Chip manufacturing: probe selection, chip production

2. Sample preparation: RNA isolation and labeling

3. Chip sample hybridization

4. Scanning of hybridized chip (image acquisition)

5. Image analysis: transduction of spots to genomic information (RNA expression values)

6. RNA expression standardization and normalization

7. Data analysis genes. In some cases, researchers use relatively small numbers of clones of particular interest or that are readily available. More commonly, clones are purchased in bulk from a commercial supplier such as the 9128 Human Unigene 1 clone set from Incyte. The quality of these sets can vary significantly and it is usually of interest that the source be identified. In their usable form, the clones are single-stranded DNA of 0.6-5 kb representing entire genes derived from cDNA, partially sequenced genes, or ESTs.

Once selected, the DNA itself needs to be prepared for spotting onto a solid medium, usually a coated glass slide, in a manner represented in Fig. 1. Approximately 100-500 ng of DNA are isolated and purified from bacterial vectors and placed into wells in preparation for transfer to the slide. The transfer from well to a preselected location on the glass slide is via a robot spotter using a variety of spotting technologies ranging from a fountain pen pipet mechanism, to capillary tube, to noncontact ink jet printer (18,20). Each transfer modality was designed to overcome the challenge of working with minute liquid volumes while obtaining spots of consistent size, 100-200 |im. Each spot, even at this size, is comprised of millions of copies of the DNA sequence of interest.

Prior to the spotting process, the surface of the glass slide is coated with an organic compound such as silane to permit stable DNA binding. Probes might make multiple contacts with the organic matrix along their lengths, with longer sequences interacting the most. Multiple surface interactions could decrease target DNA binding proportional to sequence length, not mRNA concentration, a potential probe quality concern. Other local phenomena such as variable drying could also interfere with eventual DNA hybridization (21).

In addition to technical challenges of generating high-quality clones and spotting them accurately, there are critical data management challenges in microarray design (22). For example, a clone might represent only a segment of a gene or one of several alternate splices; therefore, it is often insufficient to identify a spot by the genes name alone. Furthermore, many clones are ESTs, without consensus gene name. To explicitly identify a spot, the investigator needs to link the probe sequence both to its location on the slide and the well from which the clone was selected. Any break in the chain of information can make interpretation of the assay impossible.

3.2. cDNA ARRAY SAMPLE PREPARATION AND HYBRIDIZATION To prepare RNA for use in a cDNA array, total RNA is isolated from a sample of interest by one of the standard methods (23). Ususlly, 50-100 |g of total RNA (2 |g of mRNA) is sufficient, although this could vary by protocol.

Microsatellite Instability Probes
Fig. 1. cDNA microarray manufacturing process. See text for description. (Courtesy of the NCBI Gene Expression Omnibus.) (Figure appears in color in insert following p. 172.)

Because of the rapid degradation of RNA, only fresh or fresh frozen samples generally provide RNA of sufficient quality for analysis. Even samples that have been collected for the specific purposes of RNA analysis should be evaluated for overall RNA quality.

Most contemporary cDNA array experiments require the simultaneous analysis of RNA from two separate sources on the same chip: an experimental biologic specimen and a reference. In separate reactions for experimental and reference, total RNA is reverse-transcribed to cDNA using either random primers or an oligo-dT primer. The oligo-dT primer takes advantage of the presence of a poly-A tail of mRNA to selectively transcribe only mRNA species. Reverse transcription of the experimental and reference RNA is carried out in the presence of different fluorescent labels, most commonly cyanine 3 (Cy3), which fluoresces green, or cyanine 5 (Cy5), which fluoresces red. Once labeled, experimental and reference cDNAs are combined in solution and poured over the chip to allow hybridization, with experimental and reference targets competing for the same probes. After a prescribed hybridization time, the slide is dried and prepared for scanning.

3.3 cDNA ARRAY IMAGE ACQUISITION AND ANALYSIS A variety of optical systems have been developed for detecting fluorescence, of which we will consider only one (24,25). A laser tuned to the wavelength of the Cy3 dye is passed over the slide while a photodetector quantifies green fluorescence at a resolution of approx 5-20 |im. In this manner, a 16-bit tagged image format (TIF) image is created in which the intensity of each green spot is proportional to the quantity of labeled cDNA found at a given location on the array. The process is repeated with a laser calibrated to the Cy5 dye fluorescence. The two color channels, one green and one red, are superimposed to create a single image such as that shown in Fig. 2. Note that the image used in Fig. 2 was selected to include a high percentage of artifacts that will be discussed here and grossly understates typical image quality. In the two-channel image, the relative amounts of red and green correspond to relative amounts of mRNA from the experimental and reference specimens. When experimental and reference groups are labeled with Cy3 and Cy5, respectively, a spot that is pure green represents mRNA expressed only in the experimental sample. Similarly, a pure red spot corresponds to an mRNA species found only in the reference sample. If neither sample expresses any mRNA, the spot will be black. If mRNA is expressed both in experimental and reference samples, the spot will appear as shades of yellow, orange, or brown.

Figure 2 illustrates the heterogeneity of spots shapes in addition to variation in color and intensity already described. In some cases, spot size varies with signal intensity, with more intense signals having a saturation effect similar to that produced when photographing a bright light. In other cases, however, even dim spots appear quite irregular. Irregularities are most commonly the result of subtle mechanical effects encountered in the original chip spotting process. Small changes in the quantity of DNA spotted, the pressure applied to the slide, or variation in the individual pins used in spotting cause variation in spot size. Variations in fluorescence for a given spot are not only proportional to the amount of mRNA in the sample of interest but they are also proportional to spot topography. To account for variation in spot size in the analysis, the concept of a reference sample was introduced. In place of total fluorescence at each spot, the ratio of fluorescence between an experimental and a reference sample is calculated, a value independent of variations in spot size and shape.

Although ratios overcome the problem of irregular spots, they introduce a new set of challenges (26). The first is that ratios can be unstable measures. When RNA from one of the samples comprising the ratio is present in small quantities

Fig. 2. Example of cDNA microarray image file. Magnified view showing approx 10% of the "spots" from a selected cDNA microarray. Note: This image was selected for its unusually large number of artifacts, including smears and streaks. It is not meant to represent the quality of a typical array. See text for details. (Figure appears in color in insert following p. 172.)

Examples Microarray Defects

Fig. 2. Example of cDNA microarray image file. Magnified view showing approx 10% of the "spots" from a selected cDNA microarray. Note: This image was selected for its unusually large number of artifacts, including smears and streaks. It is not meant to represent the quality of a typical array. See text for details. (Figure appears in color in insert following p. 172.)

relative to the other, then the ratio will be unstable. For example, if the experimental sample has a fluorescence of 10 units and the reference has fluorescence of 0.2 units, then the ratio of experimental to reference fluorescence is 50. A 1% relative error in the reference channel could have resulted in measures of 0.1-0.3, resulting in ratios of 33-100. A second concern in using ratios is the failure to differentiate between expression changes that occur on a small or large absolute scale. For example, a ratio of 0.1 does not differentiate between the comparison of 1/10 and 100/1000. Although not all challenges in the interpretation of ratios have been addressed, many have been accommodated through analytic tools and study design.

The problem of the unstable ratio has many examples in cancer biology, as well as a potential solution through careful study design. The comparison of a tumor expressing an onco-gene at very high levels to normal tissue that does not would be potentially unstable. One solution, applied in the study design phase, is to avoid the use of "normal" tissue as the reference for the ratio. Investigators have opted to use cell lines or groups of cell lines that express a wide variety of genes to avoid the problem of low expression in the reference sample. Cell lines also have the theoretical advantage of offering a reproducible reference, whereas "normal" human tissue might be quite variable depending on the source (27).

Using "normal" RNA as the reference has the advantage of making the comparison of experimental to control groups on the same slide. However, as the previous example suggests, the reference sample is not necessarily the experimental control, but rather a means of correcting for variable spot size. Although the direct comparison is initially appealing, there are theoretical considerations beyond those of estimating stable fluorescence ratios. First, by measuring the same normal control on every slide, the investigator has many measurements of normal, yet only single measurements of individual disease samples. Repeated measures of normal in this way might represent an inefficient use of data. Second, fluorescence ratios include two error components: one from both the experimental and the reference sample. In an analysis comparing disease to normal, the two error components are a normal part of study interpretation. When contrasting two disease states (each using normal as a reference), the error introduced by the ratio represents unnecessary noise in the system. Study designs to increase efficiency and address these concerns have been developed, although they are admittedly more complicated to interpret. One such example, a series loop experiment uses the reference RNA for one chip as the experimental RNA on a second chip (28). Measurement error and optimal reference RNA are considered further in later sections and we refer the interested reader to full reviews (29).

In addition to the grid of green, red, yellow, and blank spots, most cDNA image files demonstrate a variety of irregular streaks and spots. Figure. 2 was selected explicitly to illustrate an extreme range of such defects, noting that a more typical field of view would have far fewer artifacts or none at all. Here, we see irregularities resulting from background noise from sources such as dust, local drying effects, and mechanical spotting difficulties. Much of this noise can be attenuated through software and analytical techniques involved in image processing, but arrays should generally be inspected for severe artifacts. Commercial arrays are often shipped with quality control measures in place to minimize these concerns.

Affymetrix Oligo Control

Fig. 3. Summary of Affymetrix oligonucleotide microarray protocol.

Stain Biotinwith Streptavidin

Fig. 3. Summary of Affymetrix oligonucleotide microarray protocol.

Once the two-color image has been generated, the final step in preprocessing the array is to calculate fluorescence and ratio values for each of the spots and associate those values with the genes that they are intended to represent. Calculation of a summary fluorescence statistic is more involved than might be apparent at first glance, as the spots are highly variable over their areas. There might be a dim focus in the center, a gradient of intensity, or other complex topography that needs to be reduced to a single value of fluorescence for both the red and green channels. The public-domain software ScanAlyze written by Michael Eisen has been widely used for this purpose, but other programs are available. Finally, the data are exported to a spreadsheet that links the gene identities of the spots with the fluorescence ratios as well as values of each channel.

3.4. OLIGONUCLEOTIDE ARRAYS: AFFYMETRIX PLATFORM AND CHIP PRODUCTION In parallel with the development of the cDNA microarray, an alternate proprietary technology called the oligonucleotide microarray was made available (Figs. 3 and 4) (30,31). Unlike cDNA arrays, the chips themselves cannot be manufactured by individuals; they must be purchased from the manufacturer (Affymetrix of Santa Clara, CA). Although the oligonucleotide arrays share all of the features of a microarray assay described in Table 1, the differences between the platforms influence study design and data interpretation.

Two technological advances paved the way for the manufacture of oligonucleotide arrays: photolithography and confocal fluorescence scanning. Photolithography as illustrated in Fig. 5 is a process in which oligonucleotides are synthesized directly onto a solid matrix. The process starts with the substrate—a quartz wafer chosen for its optical properties and hydroxylated surface that acts as a linker to which the oligonucleotides are attached. The quartz surface is coated in silane as in the cDNA example and then a synthetic linker with a photochemicallly removable group is attached. A chrome mask with a grid pattern of apertures is precisely aligned over the chip. Ultraviolet radiation is applied to the system such that only the areas directly under the apertures of the mask, an area of 18-20 |m, are irradiated. The photosensitive group is removed from the linker in the irradiated sectors only, unveiling a binding site for a single nucleotide. The surface of the chip is then bathed in a single-nucleotide species (adenine, thymine, guanine, or cyto-sine) to which an additional photochemically removable group is attached. A new chrome mask can now be applied and the entire process repeated. In this manner, short oligonucleotides (usually 25 nucleotides long [called "25-mers"]) can be constructed. The sequence of the nucleotides is dictated by the order in which the masks and nucleotides are applied. The individual chips are 1.28 cm2 and capable of harboring over 500,000 unique oligonuclleotide locations. Again, as with the cDNA arrays, each location on the array contains millions of copies of the unique oligonucleotide.

Whereas the cDNA microarray spots are composed of DNA sequences of 600-5000 bases, the oligonucleotide sequences of 25 bp seem quite short. The short sequence of uniform length addresses the concern (raised in Section 3.2) for varying DNA hybridization by strand length, but it raises concerns for lack of specificity. The oligonucleotide microarrays improve specificity by combining groups of 25-mer probes to form a "probe set." It is the probe set, of which each probe is only a part, that is specific for a gene target. Probes in a set are selected from a variety of transcribed regions along the gene using software algorithms and empiric testing, which are best described in the technical notes provided by Affymetrix. Figure 6 is an example of the distribution of probes along the length of a sample probe set. Probe sets vary in the number of probes they contain depending on the chip. The U95 human chips, for example, used probe sets consisting of 16 probes, representing 400 bp along the length of the target gene, a size sufficiently large to convey specificity. For the sake of this review, we will discuss probe sets as they were constructed for the U95 GeneChip® array.

Microsatellite Instability ProbesMicrosatellite Instability Probes
Fig. 5. Photolithography as applied to Affymetrix GeneChip manufacturing.

A major difference between the oligonucleotide array and the cDNA array is that only one RNA sample is hybridized per array in the oligonucleotide platform. The consistency of the photolithographic process obviates the need to control for spot size. Background noise, as well as nonspecific binding to any individual 25-mer necessitate the inclusion of an additional feature on the Affymetrix chips. For each of the 16 probes in the probe set, there is a second probe called a mismatch (with the original probe being called the "perfect match"). The mismatch probe has exactly the same sequence along its 25-bp length except for a single substitution at position 13. Hybridization to the mismatch probe is used to calibrate the perfect match, accounting for nonspecific binding and background noise.

Each small box marks the location of probe along the length gene.

Fig. 6. Example of Affymetrix probe distribution along a gene. Each small box marks the location of probe along the length gene. There are 16 boxes corresponding to the 16 probes in a probe set. (Figure appears in color in CD ROM.)

3.5. OLIGONUCLEOTIDE ARRAYS: AFFYMETRIX PLATFORM: SAMPLE PREPARATION AND HYBRIDIZATION Sample preparation varies somewhat from the protocol used in cDNA array analysis as seen in Fig. 3. The starting material is again total RNA, although 5-10 |g of high-quality RNA is sufficient. Again, the total RNA is reverse-transcribed using the oligo-dT primer, although no fluorescent label is added in this step. The cDNA is in vitro-transcribed to cRNA in the presence of biotinylated deoxynucleotide triphos-phate, which will serve to label the RNA with streptavidin-conjugated fluorescent markers after hybridization. In a final preparatory step, the cRNA is fragmented and then hybridized against the oligonucteotide array.

3.6. OLIGONUCLEOTIDE ARRAYS: IMAGE ACQUISITION AND ANALYSIS The chip is then washed and the hybridized cRNA labeled via the biotin-strepavidin system. Fluorescence proportional to the degree of hybridization is recorded by confocal scanning, in which a laser induces fluorescence on one side of the chip while an optical scanner records signal intensity on the other. In this way, a single channel black-and-white image is generated. Each spot represents an oligonucleotide probe, the intensity which is, in theory, proportional to the concentration on mRNA present in the original biologic sample. The raw image produced in this way is saved as a .dat image file, with information on pixel intensity measured similarly to the .tiff file generated in the cDNA example (Fig. 4).

Unlike the cDNA array, the oligonucleotide array does not require the intensity of each spot be interpreted in relation to a reference. Intensity can be interpreted directly as a linear unit-less measure called "expression." However, because a given gene target is represented by a probe set of perfect matches and mismatches, the gene expression value needs to be computed from data provided by the entire set of 32 probes. This process begins at the image processing stage when probe-level expression is calculated by software such as the proprietary Microarray Suite from Affymetrix or the public-domain program dChip (32). This step is analogous to that performed on cDNA arrays by Scanalyze, however, unlike Scanalyze, the output is not a table of gene fluorescence values or ratios; it is an intermediate file with the file extension. CEL with probelevel data only. Both Microarray Suite and dChip will integrate the perfect matches and mismatches by a variety of additive and averaging methods such as the AvDiff employed by Microarray Suite v4.0. Using the AvDiff method, the difference in intensities of the perfect match-mismatch pairs are summed and averaged to calculate a final summary statistic "gene expression" for each gene on the array.

AvDiff = (1/T)1I' (Perfect match-Mismatch)

where J is the number of suitable probe sets. There is no consensus standard for integrating across probes in a probe set, with several reasonable alternatives currently in use (33-35). The final output of the assay is a data table with expression value for each of the genes on the array. Because the oligonu-cleotide arrays are a commercial product in which the probe locations have been standardized and incorporated into the software, the burdens of data management described in the cDNA example are clearly simplified.

Further data standardization and normalization described in a later section are routinely required prior to analysis. For those planning to use oligonucleotide array data generated by other investigators, it is important to either obtain the raw image files or have a full understanding of how gene expression for the data were calculated. Similar warnings apply to the use of cDNA ratio values, which vary depending on the software used to interpret the image file.

3.7. OLIGONUCLEOTIDE ARRAYS: AGILENT PLATFORM The chrome mask used by Affymetrix is limiting in that the design of new chips requires that new masks first be manufactured. To the extent that building new masks is expensive and slow, the system is relatively inflexibility for probe optimization or new array customization. Alternate approaches technologies to produce oligonucleotide arrays have been developed (36-39). Of these the most widely available is the inkjet method developed by Agilent Technologies in which oligonucleotides are printed onto the surface of a glass slide. Essentially, the process mimics that of commercial ink jet printers, with the four ink colors replaced by the nucleotides ATCG.

In place of the photoprotective site used in the Affymetrix platform, a covalently bound trityl group blocks the 5' hydroxyl group of the nucleotide. As with the other array platforms, the process starts with a silane-coated glass slide with hydroxyl groups available to initiate the oligonucleotide construction at specific loci. The printer head scans the surface of the slide, delivering 100 pL volumes of single-nucleotide species. Unlike the Affymetrix platform, all four nucleotides can be deposited on the slide in a single pass—one base per spot on the array. Nucleotides are deposited in excess of available binding sites, forming 100-|im-diameter spots with 30-|im intervening spaces. Once bound to the glass, excess nucleotide is washed away and the trityl moiety is removed from the 5' end of all oligonucleotides. With the trityl group removed, the 5' hydroxyl group is available for strand elongation when the next nucleotide is deposited from the printer system. The combination of precision electronics, surface tension, and orientation of linker molecules ensures that microdroplets are precisely delivered and that oligonucleotide construction proceeds efficiently.

The advantage of this system over that used by Affymetrix is that changes in the chip design require only changes in the software running the printer array, not the construction of a new series of masks. Underperforming probes can be refined with relative ease, new probes tested, and the entire system itera-tively optimized at a relatively discounted cost. Product development of this type have lead Agilent to favor gene measurement using oligonucleotides of 60 bases in length and not the probe set methodology employed by Affymetrix. Arrays produced in this manner appear to perform comparably to the other platforms, although investigators generally have less experience with them.

Pregnancy Guide

Pregnancy Guide

A Beginner's Guide to Healthy Pregnancy. If you suspect, or know, that you are pregnant, we ho pe you have already visited your doctor. Presuming that you have confirmed your suspicions and that this is your first child, or that you wish to take better care of yourself d uring pregnancy than you did during your other pregnancies; you have come to the right place.

Get My Free Ebook


Post a comment