The sequencing system described in this chapter is capable of achieving a throughput that is two orders of magnitude higher than has been attained up to now using conventional, Sanger-based, sequencing technology. This system is predicated on the idea of parallelizing all conceivable steps in the sequencing process, from sample preparation, through template amplification and sequencing, to data analysis. The methods and hardware were all developed to handle hundreds of thousands of fragments simultaneously and were made possible by significant improvements to solid-support pyrophosphate sequencing, which, in turn, have allowed the extension of read lengths beyond what had previously been reported. While the successful sequencing of reads that are 80-120 bases long has been reduced to routine practice, using 84 cycles of nucleotide additions it has been possible to achieve read lengths of 200 bases. On occasion, at 168 cycles, individual reads that are 100% accurate over greater than 400 bases have been generated. Short fragments a priori do not prohibit the de novo assembly of bacterial genomes. In fact, the larger oversampling afforded by the throughput of this system typically results in a draft sequence having fewer contigs than when sequenced with conventional Sanger sequencers.

The main goal behind the drive for low-cost high-throughput sequencing has been to decrease the cost of sequencing sufficiently to enable individualized human genome sequencing. Along the way to this goal, many potential applications of the technology are becoming available or will be enabled as soon as the bioinformatics development races to keep up with the large quantities of data and the new possibilities that inexpensive and quick sequencing allows. For example, affordable microbial sequencing, either re-sequencing for SNP identification or de novo sequencing of more variable strains, enables comparative genomics on strains of varying virulence, drug resistance, and host-species preference. The emulsion-based cloning employed in the 454 system, with its inherent ability to enable sequencing from single molecules in a complex mixture, opens up the possibility of massive oversampling of tag sequences or specific regions of interest and does so in a quick and cost effective manner. The detection of low abundance sequence variants in complex samples, as described above, is of considerable value in many scientific areas and holds the promise of becoming a powerful diagnostic tool in virology and oncology clinics, for instance, advancing the prospect of effective personalized medicine. The first demonstration of sequencing from complex mixtures has sensitivity below 1% in a complex mixture of HIV quasi-species present within a patient as a function of time and drug response. This level of sensitivity is barely achievable by the Sanger-based methods, and then only by cloning of fragments into bacteria. Microarray-based sequencing methods are not as sensitive or as quantitative as the direct sequencing of clonally amplified single molecules.

The applications enabled by this new sequencing technology and its usefulness to the drug discovery and development process are only beginning to be discovered. Once the technology is widely available and its power is known, additional applications will be developed. The ability to sequence de novo opens up a wide array of possibilities for new discovery and creative approaches to important and unaddressed problems in many areas of research and development. In the public health arena, 454 sequencing can be applied to the worldwide tracking and monitoring of the spread of specific strains of pathogenic microorganisms. In bio-defense, one can envision the rapid identification of the strain of an isolated suspected bioterrorism agent, or the identification of pathogens by sequencing complex mixtures. Sequencing the complete or partial genomes of large populations of individuals will impact our understanding of the genetic basis of human diseases. In cancer biology, rapid and inexpensive sequencing will shed light on the mutations that may give rise to cancer, help identify novel oncogenes and tumor suppressor genes and understand the basis of drug response. With the prospect of longer read lengths, whole-genome exon sequencing projects, with one amplicon per exon, can be envisioned within a foreseeable future, allowing comprehensive population-sized genetic studies and the large-scale mapping of disease susceptibility genes. Ultimately, it is not inconceivable that the scalability of this technology will enable the sequencing of individual human genomes to become part of the routine practice of medicine.

On July 20, 2006, the Max Planck Institute of Evolutionary Anthropology in Leipzig and the 454 Life Sciences Corporation announced an ambitious project to the sequence the Neandertal genome at 1 x using our advanced pyrosequencing technology. The Neandertals constitute the hominid group most closely related to currently living humans and it is anticipated that much can be learned by comparing its genome to that of the human and the chimpanzee. Of particular interest will be the regions where Neandertal is closer to chimpanzee than man, as these may indicate regions that have evolved in humans after the split from Neandertal. The project faces major challenges because an estimated 90-95% of the available DNA is microbial DNA, and furthermore the fossil DNA fragments have been degraded into small fragments with chemical modifications of the nucleobases. However, proof-of-principle experiments generated 1 million bases of Neandertal sequence owing to the single-molecule-based emulsion PCR and the high throughput of 454 picoliter sequencing. Generation of the first three billion bases of Neandertal DNA is expected to be completed within two years.

0 0

Post a comment