Fig. 4. Possible assembly of a region containing two repeats. The repeat sequences can assemble together producing read stacking at that point and leaving the intervening reads to assemble together. These reads cannot be extended to the left or right due to the stacking of the repeat sequence.
Fig. 5. Rearrangements and excision in a genome location caused by many identical copies of a repetitive sequence.
and long interspersed nuclear elements (LINEs of 500-5000 bp in length), or LTR retroposons such as long-terminal repeats of approximately 700 bases in length. Other classes of repetitive sequence that can cause problems for an assembly program include gene duplication where genes duplicate and then diverge in sequence, or long-segmental duplications that can be very long and have very similar copies of a very long portion of the genome. Misassemblies of repetitive DNA containing regions can therefore result also in the excision of a repeat in particular genome locations, as well as erroneous genome rearrangements, in addition to collapsed areas of the genome (Figure 5).
Techniques for repairing regions of the genome that contain repetitive DNA are labor intensive and depend on finding differences between the sequencing reads that are aligned together at a particular point. For example, if three constituent sequencing reads at point X contain a C base and an equal number contain a T, this could indicate that there two copies of a repeat have been assembled into one location. If two copies of a repeat are suspected, its constituent reads can be isolated and each copy of the repeat can be assembled independently and then placed into the rest of the assembly. However, while these differences may be true SNPs they may not be detectable if the shotgun sequence coverage in this region is low. This is especially true for diploid mammalian genomes as both copies of the chromosome have slightly diverged and therefore it is difficult to distinguish true polymorphisms between the haplotypes from incorrectly assembled and collapsed repeats. Salzberg and Yorke (2005) recommend that all assemblies be made available for other groups to view them and correct them if necessary. The Assembly Archive at NCBI is the largest resource available at the moment to capture draft and finished genomes (Salzberg et al, 2004).
Was this article helpful?