Using specialized recombinant DNA techniques, researchers have determined vast amounts of DNA sequence including the entire genomic sequence of humans and many key experimental organisms. This enormous volume of data, which is growing at a rapid pace, has been stored and organized in two primary data banks: the GenBank at the National Institutes of Health, Bethesda, Maryland, and the EMBL Sequence Data Base at the European Molecular Biology Laboratory in Heidelberg, Germany. These databases continuously exchange newly reported sequences and make them available to scientists throughout the world on the Internet. In this section, we examine some of the ways researchers use this treasure trove of data to provide insights about gene function and evolutionary relationships, to identify new genes whose encoded proteins have never been isolated, and to determine when and where genes are expressed.
Stored Sequences Suggest Functions of Newly Identified Genes and Proteins
As discussed in Chapter 3, proteins with similar functions often contain similar amino acid sequences that correspond to important functional domains in the three-dimensional structure of the proteins. By comparing the amino acid sequence of the protein encoded by a newly cloned gene with the sequences of proteins of known function, an investigator can look for sequence similarities that provide clues to the function of the encoded protein. Because of the degeneracy in the genetic code, related proteins invariably exhibit more sequence similarity than the genes encoding them. For this reason, protein sequences rather than the corresponding DNA sequences are usually compared.
The computer program used for this purpose is known as BLAST (basic local alignment search tool). The BLAST algorithm divides the new protein sequence (known as the query sequence) into shorter segments and then searches the database for significant matches to any of the stored sequences. The matching program assigns a high score to identically matched amino acids and a lower score to matches between amino acids that are related (e.g., hydrophobic, polar, positively charged, negatively charged). When a significant match is found for a segment, the BLAST algorithm will search locally to extend the region of similarity. After searching is completed, the program ranks the matches between the query protein and various known proteins according to their p-values. This parameter is a measure of the probability of finding such a degree of similarity between two protein sequences by chance. The lower the p-value, the greater the sequence similarity between two sequences. A p-value less than about 10~3 usually is considered as significant evidence that two proteins share a common ancestor.
HTo illustrate the power of this approach, we consider NF1, a human gene identified and cloned by methods described later in this chapter. Mutations in NF1 are associated with the inherited disease neu-rofibromatosis 1, in which multiple tumors develop in the peripheral nervous system, causing large protuberances in the skin (the "elephant-man" syndrome). After a cDNA clone of NF1 was isolated and sequenced, the deduced sequence of the NF1 protein was checked against all other protein sequences in GenBank. A region of NF1 protein was discovered to have considerable homology to a portion of the yeast protein called Ira (Figure 9-31). Previous studies had shown that Ira is a GTPase-accelerating protein (GAP) that modulates the GTPase activity of the monomeric G protein called Ras (see Figure 3-E). As we examine in detail in Chapters 14 and 15, GAP and Ras proteins normally function to control cell replication and differentiation in response to signals from neighboring cells. Functional studies on the normal NF1 protein, obtained by expression of the cloned wild-type gene, showed that it did, indeed, regulate Ras activity, as suggested by its homology with Ira. These findings suggest that individuals with neurofibromatosis express a mutant NF1 protein in cells of the peripheral nervous system, leading to inappropriate cell division and formation of the tumors characteristic of the disease. I
Even when a protein shows no significant similarity to other proteins with the BLAST algorithm, it may nevertheless share a short sequence with other proteins that is functionally important Such short segments recurring in many different proteins, referred to as motifs, generally have similar functions. Several such motifs are described in Chapter 3 (see Figure 3-6). To search for these and other motifs in a new protein, researchers compare the query protein sequence with a database of known motif sequences. Table 9-2 summarizes several of the more commonly occurring motifs.
Was this article helpful?
This guide will help millions of people understand this condition so that they can take control of their lives and make informed decisions. The ebook covers information on a vast number of different types of neuropathy. In addition, it will be a useful resource for their families, caregivers, and health care providers.