Much of the time and cost of sequencing is involved in closing gaps and raising the entire genome to a specified level of quality (e.g., an error rate of less than 1 in 10,000 bases). This level of quality is assumed for traditional academic sequencing where full annotation of the organism's genes is desired, and is highly desirable for at least one strain/isolate for any pathogen. But is this level of quality, effort, and cost required for all additional strains/isolates of a pathogen? The answer depends in large part on the resolution of diagnostics being designed, the nature of the pathogen, and available time and funding.
For detection diagnostics it is desirable to maximize the number of strains/isolates sequenced so that there can be a high confidence that the broad-range signatures will indeed detect all known variations. Finishing a small number of bacterial genomes and drafting the rest will provide the best balance of sensitivity and cost for determining detection signatures. The advantages of finishing a genome are numerous,8 but after the first representative genome for a species is completed, the benefits of completing additional strains decrease rapidly. In order to get an accurate picture of species diversity, two or three additional complete genomes may be useful, but beyond that, draft sequencing is probably sufficient. Due to the higher variability of some classes of viruses and their shorter length, finished sequence is usually obtained. Note that while draft sequencing is an excellent vehicle for mapping out SNPs and VNTRs, it is not as effective at identifying large insertions and deletions. Because a draft sequence is broken into pieces and is missing many segments, it is impossible to know whether a sequence missing from a draft represents a gap in the assembly or a genuine deletion.
It should also be stressed that obtaining sequence of close and non-pathogenic near-neighbors is a vital part of designing signatures with high specificity. While draft sequencing is generally sufficient for the purpose of isolating DNA regions specific to the pathogenic relative (see the comparative genomics discussion below), the value of finishing the nearest neighbors may be worth it. As an anecdotal example, when LLNL finished the Yersinia pseudotuberculosis genome, it was discovered that one Yersinia pestis "unique" signature was a 100% match to the non-pathogenic neighbor. Owing to a gap closure in the finishing process and the nature of the tools used to look for signature matches, the exact match of this signature had not been detected throughout the draft stage. As tools for dealing with draft genomes improve, it is probable that increased emphasis will be put on maximizing strain/isolate coverage via draft sequencing. This will make the selection of which strain(s) to finish an important choice.
Was this article helpful?