Nonrepeat DNA sequencing

Although the sequences obtained by massively parallel SBS arrays of the genomes of bacteria E. coli (Shendure et al, 2005) and M. genitalium (Margulies et al., 2005) are extremely impressive, these technologies have several important limitations. Despite some 10 times genome coverage (in raw basepairs) of the 10 Mb essentially unique sequence genome of E. coli by Shendure et al. (2005), the random nature of library capture and in vitro amplification methods such as RCA and emulsion PCR result in only 91% of the genome having at least one time coverage. Other methods to ensure representation of "recalcitrant" gen-omic regions that are under-amplified by these enzyme catalysed processes must be introduced, as even a depth of sequencing cover greater than 20 times typically fails to identify the absent sequence (gaps). In addition, Margulies et al. (2005) noted the use of short read pyrosequencing data for de novo sequence reassembly of M. genitalium into continuous long sequence would be confounded by repeated motifs, and would necessitate hand-finishing the assembly of the numerous contigs that are generated. For both platforms, the fragmentation of the template and the short reads themselves contribute to a loss of sequence context and demand very high levels of genome coverage to provide statistical support for accurate assembly.

1.4. Limitations to the assembly of short-read data

When sequencing the complex genomes of eukaryotes, even the longer 100-150 bp reads cannot be easily reconstructed into larger contigs, unless known contiguous regions are deliberately sequenced. None of the SBS technologies described above can be used for sequencing long regions of simple sequence DNA, or homopolymer regions such as poly A, and other longer repeat regions, as they cannot intrinsically determine the length of any repeat region that is longer than the average read length. Pyrosequencing is also particularly sensitive to homopolymer elements as the quantitation of nucleotide number is not linear for elements longer than 6-7 nucleotides (Ehn et al, 2004; Margulies et al, 2005). For SBS technologies such as polony FISSEQ sequencing (Mitra et al., 2003), the DNA polymerases employed for single molecule amplification and for sequence extension can also be potentially halted by "unamplifiable" and ''unsequence-able'' regions, leaving unsequenced gaps or regions of lower coverage (Shendure et al., 2005). However, the nature of single nucleotide extension chemistry militates against inhibition of extension by the polymerase. Notably, pyrosequencing technology (Leamon et al., 2003), FISSEQ (Mitra et al., 2003) and Solexa's ''Cluster DNA Amplification" (Mitchelson et al., 2007) each employ a single molecule PCR-amplification process to multiply template molecules. This step could also be a contributor to the reduced representation of unamplifiable regions. The large gigabase-sized genomes of many eukaryotes, which may contain 30-60% repeated DNA, are examples of genomes that cannot be readily sequenced de novo using these present SBS methods.

0 0

Post a comment