Solidphase Array Sequencing Devices

The demand for ultra-high-throughput re-sequencing for personal medicine is driving the development of non-electrophoretic platforms, capable of sequencing multiple single (amplified) molecules held at defined locations in parallel. These integrated platforms employ technologies ranging from microfluidic arrays and solid-phase chemistries, SBS chemistries, ultra-sensitive optics and CCD data capture (see Table 1). Newly commercialized nanosequencing technologies already offer markedly increased capacity in sequence output, as required for rapid genome-sequencing projects and for large genome re-sequencing programs.

3.1. Ultra-sensitive detectors and sequencers

Hebert and Braslavsky (2007) provide a detailed description of equipment being developed to achieve longer reads from very large numbers of single DNA molecules by improving the sensitivity of SMD of fluorescence resonance energy transfer on a total internal reflection microscope. Simultaneous with these developments, versatile and improved nucleotide analogs with better incorporation kinetics and better fluorescent signal output are being developed (Kumar and Fuller, 2007). Different positions of dye labeling have also been explored. For example, a number of terminal phosphate-labeled nucleotides with three or more phosphates and with varied length linkers attached between the terminal phosphate and the dye have been synthesized (Kumar et al, 2005; Kumar and Fuller, 2007; Edwards et al., 2007). These nucleotides have utility as substrates for DNA sequencing where the base addition is unitary, until the reporter is eliminated from the terminal position and a terminal phosphate is regenerated. Williams and colleagues at LiCOR (http://www.licor.com/) are developing charge-switch technology to detect the release of reaction products when nuc-leotides are incorporated into single DNA strands, while Metzker's (2005) group are also working on developing novel fluorescent, photolabile nucleotide terminators for SBS. Edwards et al. (2007) also describe the use of reversibly terminating dye-tagged nucleotides for single nucleotide extension SBS, as well as making improvements to DNA polymerases that will support their accurate incorporation into DNA. One focus of their research is novel chemistry that allows a fluorescent molecule attached to a nucleotide to be detected and then removed with a flash of light after its addition to a growing DNA molecule, as well as integrated sequencing systems by combining terminating nucleotides and array platforms.

3.2. Sequencing by synthesis

''Sequencing by synthesis'' is a method common to primer extension methods such as SNuPE and pyrosequencing, in which a unitary base addition chemistry that allows single nucleotide additions to growing chains to be monitored on each oligonucleotide feature, simultaneously with the addition of one of the four differentially labeled terminating nucleotides. Church and colleagues

Fig. 3. (A) Polony amplification. A library of linear DNA molecules with universal priming sites is PCR amplified within a polyacrylamide gel. Each single template molecule gives rise to a polymerase colony or "polony". (B) Fluorescent in situ sequencing. Polonies are denatured, and a sequencing primer is annealed. Polonies are sequenced by serial additions of a single fluorescent nucleotide. Reprinted from Mitra et al. (2003). Copyright (2003), reprinted with permission from Elsevier.

Fig. 3. (A) Polony amplification. A library of linear DNA molecules with universal priming sites is PCR amplified within a polyacrylamide gel. Each single template molecule gives rise to a polymerase colony or "polony". (B) Fluorescent in situ sequencing. Polonies are denatured, and a sequencing primer is annealed. Polonies are sequenced by serial additions of a single fluorescent nucleotide. Reprinted from Mitra et al. (2003). Copyright (2003), reprinted with permission from Elsevier.

(Mitra et al., 2003; Shendure et al., 2005; Zhang et al., 2006a) initiated the integration of solid-state DNA sequencing using polymerase colonies (''polonies'') and cycles of fluorescent dNTP incorporation with high-signal sensitivity that allow multiple polonies to be sequenced in parallel. Large-scale arrays of discrete ''polonies'' can be extended cyclically, bringing cost-effective genome-scale single-array sequencing (Figure 3). ''FISSEQ'' involves the addition of the one nucleotide, the extended fragments are then all detected simultaneously using CCD optics, the terminating moiety and fluoro tag are then removed chemically from each attached nucleotide readiness for the addition of the next nucleotide in the following cycle. The series of base additions is only interrupted for signal scanning (and for data-acquisition) and for chemical treatment of the slides to remove dye-signals prior to the next extension step. An alternative ''sequencing by ligation'' (Shendure et al., 2005) technology has been rapidly developed and an advanced version is currently undergoing commercial development by the Applied Biosystems subsidiary, Agencourt Biosciences/Agen-court Personal Genomics (http://www.agencourt.com/) as the ''Supported Oligo Ligation Detection'' (SOLiD) process for massively parallel sequencing by stepwise ligation.

The technology company the 454 Life Sciences Corporation (Margulies et al., 2007) have developed a solid-phase parallel microarray system of microfluidic

Fig. 4. The GS20 Sequencer. The bead-based pyrosequencing process involves the flow of sequencing reagents containing buffers and nucleotides in a fixed sequential order across the PicoTiterPlateā„¢ device during a sequencing run. Each of the hundreds of thousands of beads (each with millions of clonal copies of a DNA fragment) is located in fixed micro well and all are sequenced in parallel. The addition of one (or more) nucleotide(s) results in a reaction that generates a light signal, which is then recorded by the CCD camera. The strength of the light signal is proportional to the number of nucleotides incorporated. For example, a short homopolymer region incorporates a number of nucleotides in proportion to its length, in a single nucleotide flow. The speed of sequencing and process chemistry is simplified by use of native dNTPs. Copyright (2005), reprinted with permission from the 454 Life Science Corporation.

Fig. 4. The GS20 Sequencer. The bead-based pyrosequencing process involves the flow of sequencing reagents containing buffers and nucleotides in a fixed sequential order across the PicoTiterPlateā„¢ device during a sequencing run. Each of the hundreds of thousands of beads (each with millions of clonal copies of a DNA fragment) is located in fixed micro well and all are sequenced in parallel. The addition of one (or more) nucleotide(s) results in a reaction that generates a light signal, which is then recorded by the CCD camera. The strength of the light signal is proportional to the number of nucleotides incorporated. For example, a short homopolymer region incorporates a number of nucleotides in proportion to its length, in a single nucleotide flow. The speed of sequencing and process chemistry is simplified by use of native dNTPs. Copyright (2005), reprinted with permission from the 454 Life Science Corporation.

wells in which 3 x 104 features per cm2 (captured wells) are used to capture single-stranded fragments of sheared genomic DNA attached to microbeads, which are then PCR-amplified in situ by emulsion PCR, resulting in some 107 copies of a unique fragment attached at each bead (Figure 4). The DNA sequence is read by cycles of incorporation of single native deoxynucleotide triphosphates (dA, dG, dC or dT) at each feature and detected by pyrosequencing (Hyman, 1988; Nyren et al., 1993; Ronaghi, 2001). Gharizadeh et al. (2004) noted the interference from primer-dimers and loop structures that give rise to false sequence signals during pyrosequencing could be improved by employing Sequenase polymerase, and homopolymeric regions could be read through for more than five T bases. Interestingly, Eriksson et al. (2004) reported that the new analog, 7-deaza-2'-deoxyadenosine-5'-triphosphate (c7dATP) has low substrate specificity for luciferase, while the inhibition of apyrase was reduced significantly, and read lengths up to 100 bases were obtained by pyrosequencing for several templates from fungi, bacteria and viruses. In order to prevent the incorporation of erroneous bases, limiting amounts of each dNTP are added at each addition cycle. This measure however can contribute to dephasing the synchronicity of nucleotide addition across all copies of a template attached to a bead. Homopolymer tracts are sinks for incorporation of the low amount of dNTP, and failure to complete extension across all copies of a homopolymer may result in mixed or dephased signals in later addition rounds and cause bead dropout.

Despite these technicalities, the outcomes from the "Genome Sequencer GS20(Mb)'' pyrosequencing platform have developed rapidly. In August 2003, the 454 Life Sciences Corporation announced the sequence the 35 kb genome sequence of adenovirus by pyrosequencing and by 2004 could achieve the sequence of entire bacterial genomes of several Mb length (Margulies et al., 2005a; Andries et al., 2005) with some 20 Mb of total sequence generated on a single PicoTitre plate within 3.5 h. By the end of 2005, the sequence of simple eukaryote genomes such as yeast (12 Mb) and other small eukaryote microorganism genomes could be undertaken using one Sequencer GS20 machine within one week. Poinar et al. (2006) also recently reported using the platform to sequence some 14 Mb of ancient DNA from preserved mammoth tissues indicating the rapid development of applications for ultra-high-throughput short-read sequence (see Section 5 ). Recently, the 454 Life Sciences Corporation introduced an updated version of the GS20v1.02, with improved single-read accuracy, new gasket formats, software algorithms with additional applications and an LIMS interface. They also announced development of a new version of their sequencer, the GS100, for analysis of larger genomes than the current GS20, with an expected release in 2007. The broad swathe of applications to which SBS technology can be applied has been growing rapidly as researchers explore the benefits of a low-cost platform that provides extremely deep-sequence coverage of small libraries. It appears very suited to identifying genetic variation in mixed samples due to its high depth of coverage (Thomas et al., 2006). This coverage depth to some extent overcomes some of the limitations of the shorter reads, particularly with nucleic acids that are short, or low in repetition, such as RNA species, fragmented DNAs and the simpler genomes of microorganisms. Goldberg et al. (2006) also evaluated the integration of GS20 data with conventional Sanger whole shotgun sequencing data for genomic assembly, concluding there was improved cost-effectiveness using a hybrid sequencing approach by combining standard capillary sequencer Sanger WGS data and GS20 data to generate higher-quality lower-cost assemblies of micro-bial genomes.

Ju and colleagues (Meng et al, 2006; Edwards et al, 2007) have developed an approach to DNA SBS using reversible fluorescent nucleotide terminators to address the limitations of current DNA sequencing techniques. The photo-cleavable fluorescent nucleotide analog, 3'-0-allyl-dGTP-PC-Bodipy-FL-510 has been developed as a reversible terminator for SBS. The nucleotide is incorporated by DNA polymerase efficiently into an extending DNA strand, where it terminates the polymerase reaction. Following the unitary addition, the fluorophore is photocleaved quantitatively by irradiation at 355 nm and the allyl group is rapidly and efficiently removed by using a Pd-catalyzed reaction under DNA-compatible conditions to regenerate a free 30-OH group on the ribose, which reinitiates the polymerase addition reaction. Successive cycles of such addition cleavage-reactivation steps could be used successfully to sequence a homopolymeric region of a DNA template (Meng et al., 2006). This reversible-termination SBS technology promises to be a viable approach for high-throughput DNA sequencing.

Recently, Aksyonov et al. (2006) reported a new DNA SBS method in which the sequences of DNA templates were obtained by determining the number of nucleotides extended within the primers at each array spot in sequential DNA polymerase-catalyzed nucleotide incorporation reactions, using single fluoresce-in-labeled dNTP species. The fluorescein label can be destroyed following the readout of each addition step by a photo-stimulated reaction. Self-quenching was avoided by diluting the labeled dNTP with unlabeled reagent.

3.3. Single DNA molecule sequencing

The need for analysis of single DNA molecules has stimulated the development of technologies with a lead-time of 3-5 years, which have unique single-molecule sensitivity. Several solid-state methods for single DNA molecule sequencing have been reported recently, again with promise of highly parallel, genome-scale efficiencies. Several corporations (see Table 1) are developing state-of-the-art array instruments for sequencing of individual molecules of DNA or cDNA (RNA). The technologies developed by Solexa currently involve amplified single template molecules and use several innovations. The first involves a zero-mode waveguide (Levene et al, 2003), which confines optical excitation and detection to the few zeptoliters of fluid surrounding the polymerase at the interface between the attached DNA molecules and the surface of the chip (Figure 5); the second innovation involves the development of cluster DNA amplification whereby DNA molecules were modified by two adapters attached at either ends, and then replicated in situ via a bridging process between surface attached complementary adapters; and the third, the use of extremely sensitive CCD low-intensity imagers (Jansen et al, 1989), which can capture low-intensity signals from single molecule sequence extension events. These detectors can capture images with densities of 108 pixels per cm2. Single DNA molecule imaging can potentially achieve simultaneous analysis of up to 100,000 distinct target molecules every second (Bennett et al., 2005).

Helicos Corp is developing a procedure called "true single molecule sequencing by synthesis'' (tSMS). The procedure involves working directly on fragments of genomic DNA, eliminating DNA amplification (Braslavsky et al., 2003). The use of single DNA molecules means that they can be packed closely on the solid surface, with an entire human genome arrayed across a single glass substrate chip. The tSMS technology relies on cyclic SBS, using some 1.2 billion strands of DNA attached to a quartz slide, by directly interrogating each of the single molecules after each nucleotide addition step. The tSMS process does not employ template molecule amplification. Despite the need for highly sensitive detectors and greater statistical coverage to confirm sequence, this lack of amplification provides a number of benefits, including no PCR bias thus it has potentially fewer errors, and no dephasing issues as individual molecules are

Cyclic Array Sequencing

either read in one round, or are read in a subsequent round of the cyclic process. The high template-packing density on the slide surface, with up to 108 molecules per cm2 projected (Kartalov and Quake, 2004) will provide the largest amount of sequence information per data image. This high-density and high-throughput sequencing will allow the detection of rare genomic mutations and polymorphisms, as well as rare transcripts if cDNAs are arrayed. The method is expected to have running costs around 1000 times less than Sanger sequencing. Recently, Helicos announced the successful sequencing of the 6.4kb long M13 phage genome including short homopolymeric sequences. These analyzers are expected to develop sequencing rates of 109 or more base reads per day, the equivalent of a billion-lane sequencer that reads the sequence of each molecule at the speed of the addition reaction. Although currently the efficiency and uniformity of extension is poor, it is expected that if each molecule could be extended by an average 50 nucleotides, it will allow parallel discovery and detection of genetic variation on 108 molecules that can be aligned to known reference sequence (such as the human genome). Currently, 'SBS' methods that generate short reads between 25 and 100 bp may permit de novo sequencing of entire genomes of low repetition. Methods to eliminate repetitive DNA from genomes have also emerged as complementary technologies: techniques such as methyl DNA depletion (Emberton et al., 2005) and high-Cot fractionation (Braun et al., 1978; Peterson et al, 2002) can effectively enrich for the unique genome fraction of organisms ranging from mammals to those with highly repetitive genomes such as crops and other plant species. These fractionation techniques require highly fragmented DNA, which conveniently is a size range compatible with the short-read array sequencing platforms.

Genome assembly is a key outcome associated intimately with the manner of genomic sequencing, which is reviewed by McGrath (2007) in this volume. The vast majority of DNA sequencing is still performed using Sanger methods, while array pyrosequencing is still a relatively new technology. This gives rise to questions of whether sequencing data from both technologies be combined and assembled together? The 454 Life Sciences Corporation suggests that the same assembly tool allow flowgrams and chromatograms to be assembled together

Fig. 5. Solexa's genetic analysis technology is based on massively parallel short-read sequencing, using its Clonal Molecular Array technology (Steps 1-6) and novel reversible terminator-based sequencing chemistry (Steps 7-11). The approach relies on attachment of randomly fragmented genomic DNA to a planar optically transparent surface (Steps 1-2) and solid-phase amplification (Steps 3-6) to create an ultra-high-density sequencing flow cell with >10 million clusters per cm2, each containing ~1000 copies of template. These templates are sequenced using a very robust four-color DNA SBS technology that employs reversible terminators with removable fluorescence (Steps 7-11). This approach ensures high accuracy and avoidance of artefacts with homopolymeric repeats. High sensitivity fluorescence detection is achieved using laser excitation and total internal reflection optics (Step 10). Short sequence reads are aligned against a reference genome and genetic differences called using a specially developed data pipeline (Step 12). Alternative sample preparation methods allow the same system to be used for a range of other genetic analysis applications, including gene expression. Copyright (2006), reprinted with permission from the Solexa Corporation.

(Desany et al., 2005), and that this combined technology approach can improve assemblies that suffer from cloning biases such as unclonable regions. Further support for incorporating flowgram data into existing tools is underway, for example, with the release of the Staden package 1.6.0. Recent announcements from the 454 Life Sciences Corporation indicate that they intend to develop a software with features to address many of these issues. These include new algorithms that improve sequence assembly and contig building, and algorithms that will also facilitate sequencing of large number of short DNA fragments, like serial analysis of gene expression (SAGE) tags, cap analysis of gene expression (CAGE) tags and microRNA analysis.

Was this article helpful?

0 0

Post a comment