The Candidate Gene Approach for Identifying Risk Genes

An alternative to the genome-wide scanning strategy for identifying susceptibility genes is increasingly being utilized. This strategy, often referred to as a "candidate gene approach," screens selected sets of genes deemed potentially relevant to a pharmacologic response or disease susceptibility for variation in DNA sequence. The selection process utilizes current (and imperfect) knowledge of the function of genes and disease processes. Only the coding segments or exons and the regions expected to have roles in regulation of expression of these genes are usually screened for variation. Variants in these regions are expected to have a higher potential to impact gene expression or protein function than do variants in intronic and inter-genic regions.

Two strategies have been used to identify variants in known genes. The first strategy is the utilization of data already available in public databases. The Integrated Molecular Analysis of Genomes and their Expression (I.M.A.G.E.) Consortium was initiated in 1993 as a collaboration to identify all of the genes in the genome. The goal was to isolate and sequence all (or most) of the messenger RNAs (mRNA) or expressed sequences from many different tissues of an organism. In the context of cell function, the mRNAs can be considered the molds required for the process of transferring the blueprint (the functionally relevant component of the genome sequence) into wheels and doors (the proteins), which are subsequently combined to make cars (cells). The mRNA is known to be related to a segment of the DNA sequence, but the knowledge is not complete. The sequence obtained for each mRNA isolated was called an expressed sequence tag (EST) and was basically a "signature" for each gene. The goal was a catalog of the genes in an organism. The relative distribution and quantity of the individual mRNAs among different tissues provides insight into tissue-specific patterns of gene expression. Additional information can be obtained at <>. Because the tissues used for isolating the mRNA were obtained from different individuals, differences in the sequence of clones should reflect genetic variation in DNA sequence among individuals. The Cancer Gene Anatomy Project (cGAP), described below, is an effort similar to the I.M.A.G.E. project, except that its goal was to isolate expressed sequences from many different types of tumors. And because tumors were derived from different individuals, this was another source of sequence data to be screened for genetic variants (Clifford et al., 2000). Thus, by development of computer algorithms to search the sequence data files for the cDNA clones encoded by each gene and then screening each clone for deviations from the consensus sequence, potential SNPs could be identified. As these SNPs reside in the coding region of the genome, many of the sequence changes should result in amino acid substitutions. This has been described as the in silico search for genetic variation (Grag et al., 1999; Beutow et al., 1999; Emahazion et al., 1999).

A second variant detection strategy involves directed resequencing of selected genomic regions from different individuals, usually between 50 and 100 unrelated individuals. Many of these efforts have screened a subset of the DNA Polymorphism Discovery Resource collection, a set of 450 samples selected by the NIH to be a sample of the ethnic diversity of the U.S. popu lation (Collins et al., 1998). The DNA donors are anonymous but expected to be generally healthy individuals. Others have screened samples from individuals selected from diverse geographic regions of the world (Nickerson et al., 1998) or samples from individuals with a disease of specific interest (Hayward et al., 1999). These efforts are focused on identification of amino acid substitution variants and differ from the SNP Consortium effort, which had a goal of identifying the large number of SNPs required for building a high-density map and which treated all regions of the genome equally. It is estimated that fewer than 5% of the SNPs in the SNP Consortium collection are located in coding regions of the genome.

The candidate gene strategy emphasizes identification of amino acid substitution variants and other variations of potential functional relevance in genes believed to be have roles in the biology of a disease and/or expected to have a potential role in susceptibility to environmental exposure- or lifestyle factor-related disease. Examples of the disease outcomes that directed the selection of biological pathways and processes, and thus the genes included in these variation screening efforts, include cardiovascular disease, cancer, and asthma (Cambien et al., 1999; Cargill et al., 1999; Halushka et al., 1999; Shen et al., 1998). These studies to identify genetic variants have reported results from the screening of over 200 different genes. The results can be generalized as follows: (1) approximately three different amino acid substitution variants per gene were detected in the screening of 100-200 chromosomes from generally healthy individuals; (2) it is not uncommon to observe specific variants only in individuals with similar ethnic or geographic origins; (3) the average variant allele frequencies range from 3 to 6% in the different studies; (4) the genetic variation among individuals is extensive, and most individuals will exhibit complex genotypes when the multiple genes of a pathway are being studied; (5) over 60% of the substitutions involve the exchange of amino acid residues with dissimilar physical or chemical properties, suggesting that many of the substitutions should impact protein structure and function; and (6) the very large number of variants with individual allele frequencies of less than 5% account for at least 30% of the total variation among individuals. The initial goal of these variant identification efforts is the development of a catalog of variants. The entries in these catalogs can be found in a number of databases via links from NCBI. Not unlike the catalog of genes, the catalog of variants is available as a starting point for investigators in designing experiments to address specific hypotheses.

With the increasing availability of the data from these variation screens, the challenge is selecting the candidate genes most likely to be relevant for a disease and then documenting the subset of variants in a gene that are causally associated with an altered phenotype. Several approaches can be used to test for the potential functional relevance of a variant gene or protein. These variants can be characterized for biochemical activity (Hadi et al., 2000), but this is expensive given the large number of variants that continue to be identified. Thus some efforts have been initiated to predict the effect of amino acid substitutions on protein structure (Hadi et al., 2000) or residual protein function (Chasman and Adams, 2001), as a prelude to detailed functional analysis of a variant protein. The availability of the three-dimensional structures for a large number of different proteins would be a very useful resource for developing strategies for selection of the subset of specific variants impacting protein activity. It has been suggested that a Structural Genomics Project with a goal of obtaining the structure of most of the proteins encoded by the genome should be initiated. The impact of the variants on biological characteristics of cells with defined genotypes could be determined. Many of the cellular end points of highest interest to pharmacology would be associated with differences in toxicity or efficacy of a compound in cells of different genotypes. Ultimately, the association of genotype(s) with toxicity or efficacy of a drug or disease incidence could be directly estimated in population-based studies. These studies could involve only a few individuals, if the gene and the phenotype are well characterized and the impact on measured outcome is large. For many of the common diseases, the disease state is the consequence of the interaction of a number of the different variants existing at 5-10 genes and will require molecular epidemiology studies involving thousands of cases and controls.

This variation in search of function differs from the strategy that was commonly employed in the pregenome era, where the goal was usually to identify the genetic basis for a previously well-defined phenotypic characteristic, such as hyper- or hyporesponse to a therapeutic agent. This new genome-era strategy has been described as the genotype to phenotype strategy for estimating the relevance of genetic variation (Mohrenweiser and Jones, 1998). It is a reflection of the relative ease with which large quantities of preliminary data can be generated by a relatively small number of laboratories organized to focus on large-scale data generation. Successful utilization of this data requires that individual investigators have easy access to these databases.

10 Ways To Fight Off Cancer

10 Ways To Fight Off Cancer

Learning About 10 Ways Fight Off Cancer Can Have Amazing Benefits For Your Life The Best Tips On How To Keep This Killer At Bay Discovering that you or a loved one has cancer can be utterly terrifying. All the same, once you comprehend the causes of cancer and learn how to reverse those causes, you or your loved one may have more than a fighting chance of beating out cancer.

Get My Free Ebook

Post a comment