Comparison of Complete Genome Sequences

The chromosome of Escherichia coli K-12 is the best studied microbial genome. Accordingly, that of the E.coli K-12 strain MG1655 was the first E.coli genome which has been completely sequenced [30]. Earlier results already revealed an unexpected level of structural and genetic diversity among genomes of different E. coli strains also mirrored by genome size variation within the species E. coli between 4.6 and 5.5 Mbp [31].

The genomes of four E.coli strains - the nonpathogenic K-12 strain MG1655, two strains of enterohemorrhagic E.coli O157:H7 (EDL933 and Sakai) and the uropathogenic E. coli O6:K2:H1 strain CFT073 - have been completely sequenced [30, 32-34] (see Table 5.1). The availability of these complete genome sequences allows detailed comparison of the genetic and structural genome variability not only of different E.coli strains, but also of different pathotypes. A genomic comparison of strains CFT073, EDL933, and MG1655 revealed that only 39.2% (2996 genes) of their combined set of proteins are common to all three strains [34], underlining the astonishing diversity among E. coli isolates. The two E. coli O157:H7 genomes are extremely similar. However, comparison of either O157: H7 sequence to E. coli K-12 reveals that an extraordinary amount of gene loss and gain has occurred since these strains last shared a common ancestor about 4.5 million years ago. The 5.5-Mbp O157:H7 genome is nearly 1 Mbp larger than that of E.coli K-12. While roughly 4.1 Mbp of the chromosome was very similar between O157:H7 and K-12, this conserved "backbone" was interrupted by hundreds of "islands" and "islets" of sequences specific to one strain or the other. 0.53 Mbp of E. coli MG1655-specific sequences are absent from the EDL933 genome, which itself contains over 1.4 Mbp of DNA without a counterpart in the K-12 genome. These sequences are clustered in 177 regions ranging from 50kbp to nearly 90kbp in length and comprise more than 1000 putative ORFs, including some that have been previously associated with O157:H7 virulence, as well as many new candidate pathogenicity factors, such as iron utilization and host cell adherence-associated genes. Surprisingly, two large islands were discovered, each containing nearly identical genes encoding urease in a strain typically characterized by its lack of urease activity in clinical assays. However, this gene cluster could be expressed in an E. coli K-12 background, indicating that regulation of these determinants which may contribute to the acid tolerance of enterohemorrhagic E. coli (EHEC) is different in various E. coli backgrounds [35].

A large fraction of the genomic differences can be accounted for by the activity of mobile genetic elements. Nearly 40% of the O157-specific elements are found in one of at least 18 cryptic prophage or the one intact bacteriophage (933W) which also contains the most characteristic virulence genes of O157:H7, i.e., those coding for the Shiga toxins [33]. Whole-genome structure comparison of several O157 strains not only demonstrated that the O157-specific sequences are highly conserved among the strains, but also showed that an unexpectedly high genomic diversity exists. Prophages especially exhibit extensive structural and positional diversity, suggesting that variation of prophages is one of the most important factors in generating genome diversity among O157 strains [36, 37].

The 5.2-Mbp genome sequence ofuropathogenic E.coli (UPEC) strain CFT073 [34] supported these views, and comparison with an O157:H7 and a K-12 genome revealed 2996 genes of the core chromosome common to all three strains. However, the comparison also reveals 1303 Mbp of DNA only present in the UPEC strain but absent in both of the other strains. E. coli K-12 and O157:H7 are more closely related to each other than either is to this extraintestinal pathogenic E. coli (ExPEC) strain. Although the island contents differ, many of the same chromosomal sites serve as chromosomal insertion sites of these strain-specific elements. According to the genome sequence, there are a number of previously unknown toxins and adhesins that may contribute to pathogenesis in the human urogenital tract. There are ongoing projects to sequence the full genome of other ExPEC and intestinal pathogenic E.coli (IPEC) strains (see Table5.2), and their comparison with existing sequences is expected as well to reveal genes specific for extraintestinal or intestinal pathogens, but not found in commensal strains, as to identify genes responsible for distinct aspects of their diseases. The majority of strain-/ pathotype-specific regions are found at limited positions in the individual chromosomes, suggesting that the strain-specific elements accumulated over time by repeated horizontal gene transfer, frequently with successive transfers of different elements into the locus of the core chromosome. The fate of horizontally transferred sequences depends on their cost or benefit to the bacteria. The fact that so many different horizontally acquired sequences exist in islands differentiating these closely related E. coli strains suggests that many of them are temporary residents of the genome or provide an advantage specific to the individual lifestyle of particular strains.

0 0

Post a comment