Introduction

Routine studies of individual genomes are central to the investigation of genetic variability and genetic susceptibility to diseases, but the inability to rapidly and cost-effectively sequence large amounts of DNA is a major hindrance to this goal. The recent completion of the human genome project in 2001 (Lander et al., 2001; Venter et al., 2001) has necessitated upwards of $300 M in investments in two years time, and the estimated cost and time of sequencing a human genome today is set anywhere between $10-and $25 M in a year, still very far from the $1000 genome objective (Chan, 2005). However, a paradigm shift has occurred recently whereby, in order to understand the function of DNA, it is not enough to produce the full sequence of a few individuals but rather we need the effort to sequence an immense amount of genome so as to relate variations in sequence and expression profiles, i.e. RNA re-sequencing, to the function of the genes. Therefore, de novo sequencing has been overshadowed by the potential for fast and inexpensive re-sequencing. Finding heterogeneities and inter-genomic variations will be the engine for new discoveries in the function of DNA (Bentley, 2004; Rogers and Venter, 2005).

While long read lengths are critical in de novo sequencing, they are less important in re-sequencing applications. With a length of as short as 16 bases (van Dam and Quake, 2002), sequences can be uniquely identified and mapped onto a template sequence and thus a method that provides a massive amount of short read lengths will be as affective as a method that produces the same amount of sequence with longer read lengths. It is expected that new and revolutionary methods will improve on Sanger sequencing in the main areas of cost and throughput, while some might also increase read lengths. Excellent reviews of the new techniques were recently published (Shendure et al., 2004; Chan, 2005). This chapter will focus mainly on aspects of one of these methods: single-molecule sequencing by cyclic synthesis.

Single-molecule sequencing is a goal that has been pursued for almost two decades as a possible candidate to replace the ubiquitous Sanger method (Jett et al., 1989) Different schemes have been proposed to achieve this goal, for example: (1) using exonuclease on flow-stretched labeled DNA and to detect the fluorescent product down stream (Augustin et al., 2001; Werner et al., 2003), (2) stretching DNA molecules in nanofabricated devices and to read fluorescent tags at the output (Chan et al., 2004), (3) recording the ionic current through nanochannels while single DNA is thread through it (Meller et al., 2000), (4) following the synthesis of DNA in real time by local confinement of illumination (Levene et al., 2003), and (5) monitoring fluorescently labeled nucleotide incorporation on single DNA molecule step-by-step in cycle extensions (Braslavsky et al., 2003). From all of the above, the demonstration that sequence information can be obtained from single DNA molecules by cyclic synthesis (Braslavsky et al., 2003) leads to the development of the first working scheme for large scale single-molecule sequencing (Harris et al., 2007 to be published).

DNA sequencing by cyclic synthesis (SBS) differs from the Sanger method, which relies on length separation of amplified DNA strands that terminate with a particular color according to the last base in the chain. Instead, in SBS the synthesis itself is monitored by various methods, such as pyrosequencing (Leamon et al., 2003), or in polony sequencing (Mitra et al., 2003). These methods monitor many reactions in parallel and thus accelerate sequencing rate and reduce cost. Out of all the cycle-extension approaches, single-molecule sequencing has the highest sequence information density, i.e. the number of sequence reads per unit area. Polymerase colony sequencing (Mitra et al., 2003) has a density of about 1-2 polonies/mm2, whereas picotiter plates (Leamon et al., 2003) have a density of up to 480 wells/mm2. The theoretical limit on density in single-molecule sequencing is the diffraction limit of light. For 670 nm emission, this limit is 2/2, or 335 nm, which entails a three orders of magnitude increase in density over picotiter plates, assuming a one-micron separation is allowed between molecules. Further more, monitoring several fields of view with a single camera introduces a major increase in throughput and opens the way for parallel sequencing of tens of millions of single DNA strands. Each DNA strand is read for about 25 bases, thus generating sequences that can readily be aligned to a reference sequence. Single-molecule sequencing is also the only cyclic-sequencing method that does not require the incorporations of nucleotides to be synchronous on all strands, a most important factor that limits read lengths in other schemes (Mitra et al., 2003) and can be used to reduce error rates since reactions can terminate before the occurrence of side effects, such as misincorporation.

In this chapter, we will begin by introducing the advantages of single molecule imaging, and the theory behind the imaging systems and methods that are used in single-molecule sequencing by synthesis (SBS). We follow with an examination of the sequencing method itself and several variants that have been proposed in the last few years. We will then discuss the data analysis methodology and the sources of errors in base calling. We conclude with an overview of the applications and the performance of the technique.

Was this article helpful?

0 0

Post a comment