A new generation of sequencers has only just entered the market, (see this volume, Margulies et al., 2007), which will allow novel approaches to be taken to the analysis of ancient and complex DNA sources. The 454 Life Sciences Corporation "GS20" is the first to make of these machines to an impact (Poinar et al., 2005), with models by Solexa, Helicos and Agencourt Personal Genomics soon to follow. These systems have a great advantage in artefact-avoidance over current technologies due to the elimination of a cloning step from most analyses. Their preference for short DNA fragments and downstream software for elimination of artefact sequences opens a doorway to ancient DNA "pale-ogenomics''. Genomic sequencing of extinct species is a new and exciting field that promises to answer many questions in molecular evolution, adaptation, speciation and genomic evolution. Two recent studies have demonstrated the feasibility of sequencing genomes of extinct species, but they also highlight the significant problems and limitations of the approach, that will likely limit its application to high-profile species and/or exceptionally well preserved samples. These problems revolve around the issues of DNA contamination, DNA damage/degradation and problems of dealing with trace amounts of DNA and will be of extreme importance in any attempts to sequence genomes of extinct hominids.
Metagenomics approaches, either using traditional bacterial cloning techniques or the new parallel sequencing technique developed by the 454 Life Sciences Corporation, differ from the targeted PCR-based approaches in that all genomic sequence in a sample are anonymously sequenced. For ancient DNA samples this creates a significant problem because contaminating microbial, environmental and human DNA will be sequenced alongside the endogenous gen-omic DNA. Unambiguously separating authentic from contaminating DNA creates a major bioinformatics challenge. One solution requires an annotated genome sequence of a close relative to identify and classify the genomic sequences obtained. However, even with such a framework much sequence data may be falsely included or excluded. A second solution requires sequencing multiple individuals and using interindividual sequence similarity to identify genomic regions that are likely to be endogenous to the sample and not from contamination. The problem of contaminating DNA is not trivial. Noonan et al. (2005) attempted genome sequencing of two extinct cave bears. Only 1-6% of all sequences obtained could be identified as probably cave bear in origin, and a massive 60-65% had not match to any sequence in the public databases. Poinar et al. (2005) sequenced 28 Mb from an exceptionally well-preserved mammoth specimen, only 45% of which was identified as mammoth DNA. Clearly, contaminating DNA is a major problem and will require significantly more sequencing and bioinformatics power in order to gain sufficient coverage of any ancient genome of interest. The highly fragmented nature of DNA and high level of damage will significantly impact coverage and sequence accuracy of the genome. Metagenomics of extinct species requires multi-fold coverage to correct for sequence modifications that have accumulated during the samples history. Assembling many millions of short overlapping fragments may be difficult or impossible for large parts of the genome, particularly in repetitive regions.
Most ancient DNA specimens are not as well preserved as the mammoths, nor can such large samples be taken for bulk DNA extraction. The vast bulk of ancient DNA comes from non-frozen conditions, where residual DNA is preserved in far smaller amounts and fragment sizes and with higher microbial and other contaminating DNA content. Genome sequencing via the 454 Life Sciences Corporation approach requires relatively large amounts of DNA, something that most specimens or samples cannot provide. In some cases, larger samples may be sacrificed, but for the majority of specimens, access to met-agenomics technology will require the development of new methods to amplify the trace amounts of ancient DNA from normal specimens, without producing locus or allele bias.
The immediate challenge for researchers involved in molecular microbial ecology will be to adapt their approaches to take advantage of the new technologies without becoming swamped in data, or losing sight of potential biases introduced by these new systems. It is also important to note that these machines are not technically limited to the generation of raw genomic or met-agenomic data; no doubt imaginative ecologists will adapt the capacity of this equipment to reveal specific aspects of microbial community structure on a depth and scale unavailable via current profiling methods. Cautious interpretation of the output will no doubt be educated by the experiences of the past 13 years since "community scale'' DNA profiling of microbiota first began (Mu-yzer et al., 1993).
Was this article helpful?