regions (introns), scanning for ORFs is a poor method for finding genes. The best gene-finding algorithms combine all the available data that might suggest the presence of a gene at a particular genomic site. Relevant data include alignment or hybridization to a full-length cDNA; alignment to a partial cDNA sequence, generally 200-400 bp in length, known as an expressed sequence tag (EST); fitting to models for exon, intron, and splice site sequences; and sequence similarity to other organisms. Using these methods computational biologists have identified approximately 35,000 genes in the human genome, although for as many as 10,000 of these putative genes there is not yet conclusive evidence that they actually encode proteins or RNAs.

A particularly powerful method for identifying human genes is to compare the human genomic sequence with that of the mouse. Humans and mice are sufficiently related to have most genes in common; however, largely nonfunctional DNA sequences, such as intergenic regions and introns, will tend to be very different because they are not under strong selective pressure. Thus corresponding segments of the human and mouse genome that exhibit high sequence similarity are likely to be functional coding regions (i.e., exons).

The Size of an Organism's Genome Is Not Directly Related to Its Biological Complexity

The combination of genomic sequencing and gene-finding computer algorithms has yielded the complete inventory of protein-coding genes for a variety of organisms. Figure 9-34 shows the total number of protein-coding genes in several eu-karyotic genomes that have been completely sequenced. The functions of about half the proteins encoded in these genomes are known or have been predicted on the basis of sequence comparisons. One of the surprising features of this comparison is that the number of protein-coding genes within different organisms does not seem proportional to our intuitive sense of their biological complexity. For example, the roundworm C. elegans apparently has more genes than the fruit fly Drosophila, which has a much more complex body plan and more complex behavior. And humans have

Organism Genes

Human Arabidopsis (plant) C. elegans (roundworm)

~32,000 25,706 18,266

Drosophila (fly) 13,338

Saccharomyces (yeast) ~6000

Organism Genes

Drosophila (fly) 13,338

Saccharomyces (yeast) ~6000

▲ FIGURE 9-34 Comparison of the number and types of proteins encoded in the genomes of different eukaryotes. For each organism, the area of the entire pie chart represents the total number of protein-coding genes, all shown at roughly the same scale. In most cases, the functions of the proteins encoded by about half the genes are still unknown (light blue). The functions of the remainder are known or have been predicted by sequence similarity to genes of known function. [Adapted from International Human Genome Sequencing Consortium, 2001, Nature 409:860.]

I I Metabolism I I DNA replication/modification I I Transcription/translation I I Intracellular signaling

Cell-cell communication I I Protein folding and degradation I I Transport

Multifunctional proteins Cytoskeleton/structure Defense and immunity Miscellaneous function I I Unknown fewer than twice the number of genes as C. elegans, which seems completely inexplicable given the enormous differences between these organisms.

Clearly, simple quantitative differences in the genomes of different organisms are inadequate for explaining differences in biological complexity. However, several phenomena can generate more complexity in the expressed proteins of higher eukaryotes than is predicted from their genomes. First, alternative splicing of a pre-mRNA can yield multiple functional mRNAs corresponding to a particular gene (Chapter 12). Second, variations in the post-translational modification of some proteins may produce functional differences. Finally, qualitative differences in the interactions between proteins and their integration into pathways may contribute significantly to the differences in biological complexity among organisms. The specific functions of many genes and proteins identified by analysis of genomic sequences still have not been determined. As researchers unravel the functions of individual proteins in different organisms and further detail their interactions, a more sophisticated understanding of the genetic basis of complex biological systems will emerge.

Effect of Carbon Source on Gene Expression in Yeast The initial step in a microarray expression study is to prepare fluorescently labeled cDNAs corresponding to the mRNAs expressed by the cells under study. When the cDNA preparation is applied to a microarray, spots representing genes

Cells grown on glucose medium

Cells grown on ethanol medium

Cells grown on glucose medium

Cells grown on ethanol medium

Green dye

Reverse-transcribe to cDNA labeled with a fluorescent dye

Red dye

DNA Microarrays Can Be Used to Evaluate the Expression of Many Genes at One Time

Monitoring the expression of thousands of genes simultaneously is possible with DNA microarray analysis. A DNA micro-array consists of thousands of individual, closely packed gene-specific sequences attached to the surface of a glass microscopic slide. By coupling microarray analysis with the results from genome sequencing projects, researchers can analyze the global patterns of gene expression of an organism during specific physiological responses or developmental processes.

Preparation of DNA Microarrays In one method for preparing microarrays, a «1-kb portion of the coding region of each gene analyzed is individually amplified by PCR. A robotic device is used to apply each amplified DNA sample to the surface of a glass microscope slide, which then is chemically processed to permanently attach the DNA sequences to the glass surface and to denature them. A typical array might contain «6000 spots of DNA in a 2 X 2 cm grid.

In an alternative method, multiple DNA oligonu-cleotides, usually at least 20 nucleotides in length, are synthesized from an initial nucleotide that is covalently bound to the surface of a glass slide. The synthesis of an oligonu-cleotide of specific sequence can be programmed in a small region on the surface of the slide. Several oligonucleotide sequences from a single gene are thus synthesized in neighboring regions of the slide to analyze expression of that gene. With this method, oligonucleotides representing thousands of genes can be produced on a single glass slide. Because the methods for constructing these arrays of synthetic oligonucleotides were adapted from methods for manufacturing microscopic integrated circuits used in computers, these types of oligonucleotide microarrays are often called DNA chips.

cDNAs hybridized to DNAs for a single gene

Hybridize to DNA microarray

T Wash

Measure green and red 1 r fluorescence over each spot cDNAs hybridized to DNAs for a single gene

Hybridize to DNA microarray

T Wash

Measure green and red 1 r fluorescence over each spot

A If a spot is yellow, expression of that gene is the same in cells grown either on glucose or ethanol

B If a spot is green, expression of that gene is greater in cells grown in glucose

C If a spot is red, expression of that gene is greater in cells grown in ethanol

▲ EXPERIMENTAL FIGURE 9-35 DNA microarray analysis can reveal differences in gene expression in yeast cells under different experimental conditions. In this example, cDNA prepared from mRNA isolated from wild-type Saccharomyces cells grown on glucose or ethanol is labeled with different fluorescent dyes. A microarray composed of DNA spots representing each yeast gene is exposed to an equal mixture of the two cDNA preparations under hybridization conditions. The ratio of the intensities of red and green fluorescence over each spot, detected with a scanning confocal laser microscope, indicates the relative expression of each gene in cells grown on each of the carbon sources. Microarray analysis also is useful for detecting differences in gene expression between wild-type and mutant strains.

that are expressed will hybridize under appropriate conditions to their complementary cDNAs and can subsequently be detected in a scanning laser microscope.

Figure 9-35 depicts how this method can be applied to compare gene expression in yeast cells growing on glucose versus ethanol as the source of carbon and energy. In this type of experiment, the separate cDNA preparations from glucose-grown and ethanol-grown cells are labeled with differently colored fluorescent dyes. A DNA array comprising all 6000 genes then is incubated with a mixture containing equal amounts of the two cDNA preparations under hybridization conditions. After unhybridized cDNA is washed away, the intensity of green and red fluorescence at each DNA spot is measured using a fluorescence microscope and stored in computer files under the name of each gene according to its known position on the slide. The relative intensities of red and green fluorescence signals at each spot are a measure of the relative level of expression of that gene in cells grown in glucose or ethanol. Genes that are not transcribed under these growth conditions give no detectable signal.

Hybridization of fluorescently labeled cDNA preparations to DNA microarrays provides a means for analyzing gene expression patterns on a genomic scale. This type of analysis has shown that as yeast cells shift from growth on glucose to growth on ethanol, expression of 710 genes increases by a factor of two or more, while expression of 1030 genes decreases by a factor of two or more. Although about

▲ EXPERIMENTAL FIGURE 9-36 Cluster analysis of data from multiple microarray expression experiments can identify co-regulated genes. In this experiment, the expression of 8600 mammalian genes was detected by microarray analysis at time intervals over a 24-hour period after starved fibroblasts were provided with serum. The cluster diagram shown here is based on a computer algorithm that groups genes showing similar changes in expression compared with a starved control sample over time. Each column of colored boxes represents a single gene, and each row represents a time point. A red box indicates an increase in expression relative to the control; a green box, a decrease in expression; and a black box, no

400 of the differentially expressed genes have no known function, these results provide the first clue as to their possible function in yeast biology.

Cluster Analysis of Multiple Expression Experiments Identifies Co-regulated Genes

Firm conclusions rarely can be drawn from a single microar-ray experiment about whether genes that exhibit similar changes in expression are co-regulated and hence likely to be closely related functionally. For example, many of the observed differences in gene expression just described in yeast growing on glucose or ethanol could be indirect consequences of the many different changes in cell physiology that occur when cells are transferred from one medium to another. In other words, genes that appear to be co-regulated in a single microarray expression experiment may undergo changes in expression for very different reasons and may actually have very different biological functions. A solution to this problem is to combine the information from a set of expression array experiments to find genes that are similarly regulated under a variety of conditions or over a period of time.

This more informative use of multiple expression array experiments is illustrated by the changes in gene expression observed after starved human fibroblasts are transferred to a rich, serum-containing, growth medium. In one study, the relative expression of 8600 genes was determined at different significant change in expression. The "tree" diagram at the top shows how the expression patterns for individual genes can be organized in a hierarchical fashion to group together the genes with the greatest similarity in their patterns of expression over time. Five clusters of coordinately regulated genes were identified in this experiment, as indicated by the bars at the bottom. Each cluster contains multiple genes whose encoded proteins function in a particular cellular process: cholesterol biosynthesis (A), the cell cycle (B), the immediate-early response (C), signaling and angiogenesis (D), and wound healing and tissue remodeling (E). [Courtesy of Michael B. Eisen, Lawrence Berkeley National Laboratory.]

times after serum addition, generating more than 10 individual pieces of data. A computer program, related to the one used to determine the relatedness of different protein sequences, can organize these data and cluster genes that show similar expression over the time course after serum addition. Remarkably, such cluster analysis groups sets of genes whose encoded proteins participate in a common cellular process, such as cholesterol biosynthesis or the cell cycle (Figure 9-36).

Since genes with identical or similar patterns of regulation generally encode functionally related proteins, cluster analysis of multiple microarray expression experiments is another tool for deducing the functions of newly identified genes. This approach allows any number of different experiments to be combined. Each new experiment will refine the analysis, with smaller and smaller cohorts of genes being identified as belonging to different clusters.

10 Ways To Fight Off Cancer

10 Ways To Fight Off Cancer

Learning About 10 Ways Fight Off Cancer Can Have Amazing Benefits For Your Life The Best Tips On How To Keep This Killer At Bay Discovering that you or a loved one has cancer can be utterly terrifying. All the same, once you comprehend the causes of cancer and learn how to reverse those causes, you or your loved one may have more than a fighting chance of beating out cancer.

Get My Free Ebook

Post a comment