transcript spliced mRNA
Transcription of pre-mRNA is initiated at the arrow shown above exon ). This primary trarv script is then processed (by splicing) to remove noncod^ng intfons to produce messenger RNA.
TABLE 7-3 Contribution of Introns and Repeated Sequences to Different Genomes
Gene density Average number of Percentage of DNA
Species (genes/Mb) introns per gene* that is repetitive*
Escherichia cotiK-12 950 0 <1 EUKARYOTES (animals) Fungi
Saccharomyces cerevistae 480 0 04 3.4
Caenorhabditis elegans 200 5 6.3
Drosophila rmlanogaster B0 3 !2
Fugu rubtipes 75 5 2.7
Hot no sapiens B.5 6 46
Arabidopsis thaliana 125 3 nd
Oryza sativa (nee) 470 rid 42
■no = rtct detenr.med the length of DNA required to encode a gene (Table 7-3). For example, the average transcribed regions of a human gene is about 27 kb (this should not be confused with the gene density), whereas the average protein-coding region of a human gene is 1.3 kb. A simple calculation reveals that only 5% of the average human protein-encoding gene directly encodes the desired protein. The remaining 95% is made up of introns. Consistent with their higher gene density, simpler eukary-otes have far fewer introns. For example, in the yeast S. cerevistae, only 3.5% of genes have introns, none of which is greater than 1 kb (see Table 7-3).
An explosion in the amount of intergenic sequences in more complex organisms is responsible for the remaining decreases in gene density. Intergenic DNA is the portion of a genome that is not associated with the expression of proteins or structural RNAs. More than 60% of the human genome is composed of intergenic sequences and most of this DNA has no known function (Figure 7-4). There are two kinds of intergenic DNA: unique and repeated. About a quarter of the intergenic DNA is unique. These regions comprise many appar-entfy nonfunctional relies, including nonfunctional mutant genes, gene fragments, and pseudogenes. The mutant genes and gene fragments arise from simple random mutagenesis or mistakes in DNA recombination. Pseudogenes arise from the action of an enzyme called reverse transcriptase (Figure 7-5 and Chapter 11). This enzyme copies RNA into double-stranded DNA (referred to as copy DNA or cDNA) but is only expressed by certain types of viruses that require this enzyme to reproduce. But, as a side effect of infection by such a vims, the cellular mRNAs can bn copied into DNA, and the resulting DNA fragments reintegrated into the genome at a low rate. These copies are not expressed, however, Ixicause they lack the correct sequences to direct their expressirin. (such sequences are generally not part of a gene's RNA product, see Chapter 12).
intergenic DNA 2,000 Mb other intergeriic regtons 60Û Mb genome-wide repeats 1,400 Mb unique 510 Mb microsatellites 90 Mb
FIC U R E 7-4 The organization and content of the human genome. The human genome is composed of many different types of DMA sequences, the majority of which do not encode proteins. The figure shews the distribution and amount of each of the various types of sequences. (Source: Adapted from Brown T.A. 2002. Genomes, 2nd edition, p. 23, box 1.4, © 2002 BIOS Scientific Publishers. Used by permission. wwwtandf.com.)
human genome 3,200 Mb genes and gene related sequences 1.200 Mb related sequences 1,152 Mb genes 45 Mb jotróos, 1 UTRs gene fragments pseudogenes
The Majority of Human Intergenic Sequences Are Composed of Repetitive DNA
Almost half of the human genome is composed of DNA sequences that are repeated many times in the genome. There are two general classes of repeated DNA: microsatellite DNA and genome-wide repeats. Microsalellite DNA is composed of very short (less than 13 bp), tan d e m 1 y- repeat ed sequences. The most common microsatellite sequences are dinucleotide repeats (for example, C^ACACACACACA-CACA). These repeats arise from difficulties in accurately duplicating the DNA and represent nearly 3% of the human genome.
Was this article helpful?