Following the bioterrorism attacks in the U.S. in October 2001, in which multiple samples of Bacillus anthracis spores were sent through the mail to a variety of targets, TIGR was asked to sequence and analyze the strain of anthrax bacterium used in these attacks. The goals of the analysis were threefold: (1) to determine whether the anthrax had been genetically engineered in any way, (2) to discover any and all differences between the attack strain and known strains, and (3) to create a unique genetic signature that could be used to characterize the attack strain. Shortly after the attacks occurred, VNTR analysis showed that the strain used in the attacks was Ames, a strain originally isolated from a cow in Texas in 1981. This strain had been sent to the U.S. Army biodefense laboratory in Ft. Detrick, Maryland at that time, and subsequently distributed to multiple labs around the world engaged in anthrax research. Using all known anthrax VNTR markers, the samples were indistinguishable from other Ames isolates. At the time of the attacks, TIGR was nearly completed with the sequencing of an Ames isolate from Porton Down, England (originally from Ft. Detrick). This made it clear that one of our tasks was to determine whether or not we could discover new genetic markers that would allow us to distinguish the Porton Down strain from the attack strain.
The results of this study, published in June of 2001,3 revealed answers to all three of the questions above. First, the attack strain was clearly not engineered: it was nearly identical to known laboratory strains. Second, we discovered 60 new genetic differences between Ames and other strains, including three insertions/deletions, eight new VNTRs, and 49 SNPs. The insertions were relatively small, and the only one large enough to contain a gene (1200 bp) was found to be occurring naturally in Bacillus cereus. All of these differences were validated by resequencing and by cross-checking against a panel of other Ames isolates. At least 15 of the newly discovered markers were found to vary within previously typed Ames samples, allowing us to more finely differentiate the strain using a genetic signature with these new markers.
Several of the methods described earlier in this chapter were critical in the analysis of the anthrax isolates. The two genomes were assembled using the Celera Assembler, and the resulting assemblies (the attack isolate and Porton Down isolate were at ~6x and 11x coverage, respectively) were aligned using MUMmer2 and NUCmer. These assemblies contained hundreds of small and large contigs, with the Porton Down isolate containing fewer contigs due to its deeper coverage. (In addition, finishing and closure work was nearly com plete at that time.) Complicating the analysis was the fact that the Porton Down isolate had been cured of its plasmids; fortunately, both the pXO1 and pXO2 plasmids had been sequenced separately, from the Sterne and Pasteur strains. So the "reference" strain was really three strains, one for the 5.2-Mbp main chromosome, and two others for the 182-Kbp and 96-Kbp plasmids.
MUMmer allowed us to very quickly identify all differences between the assemblies, which we then classified as SNPs, VNTRs, or insertions. Note that due to the time pressure on the project, assemblies were rerun several times and the analysis had to be repeated each time. The speed of the assembler and the alignment software were critical in these multiple re-analyses. We then extracted all the underlying sequence reads for every difference, and eliminated regions of 1-2x coverage as representing likely basecalling errors. For the remaining differences, we calculated the probability of error for each SNP and VNTR and reported all the high-confidence differences, along with their probabilities, in the published analysis. Subsequent sequencing of additional B. anthracis strains at TIGR has further confirmed the SNPs, VNTRs, and indels reported in this study: nearly all have been found in at least one additional strain or isolate. These additional findings serve to reinforce the statistical and computational methods developed for this study.
Was this article helpful?