Allele Frequencies and Their Equilibrium Properties

Genetic changes at a specific site are generally termed polymorphic if they occur commonly in the population (i.e., two alternative forms with frequencies of 1% or more). Polymorphisms are classified into several major classes including single nucleotide polymorphisms (SNPs), short tandem repeats (STRs), and variable number of tandem repeats (VNTRs). STRs and VNTRs have proven extremely useful for the localization of disease genes through linkage analysis (Gyapay et al., 1994). Linkage, however, does not have fine enough resolution to lead to the identification of the disease-causing gene and allele. The latter require fine mapping, and it is in this area that the single nucleotide changes are most useful (Johnson et al., 2001). Furthermore, susceptibility alleles (whether changing disease susceptibility or response to treatment) are likely to fall into the class of single nucleotide differences. In Mendelian diseases, such single nucleotide changes result in truncated proteins, altered binding sites, or other amino acid changes that are responsible for disease. In the context of common complex diseases, such as cardiovascular disease, hypertension, diabetes, and obesity, it is likely that the same kinds of changes exist but with more subtle effects on disease. It is also likely in the context of common diseases that single nucleotide changes in noncoding regions of a gene may have subtle (or not so subtle) effects on regulation of a gene (Horikawa et al., 2000).

Ultimately, all polymorphisms arose as mutations. In a randomly mating, infinitely large population, gene (allele) and genotype frequencies are constant from generation to generation if no mutation or selection occurs. Hardy, an English mathematician, and Weinberg, a German physician, first demonstrated this in 1908. This notion of constancy of frequencies is the central pillar of population genetics. Genes that appear to behave in this fashion within a population are said to be in Hardy-Weinberg equilibrium.

This can be simply illustrated by considering a hypothetical locus having two alleles, A1 and A2, with frequencies p and q, respectively. Under random mating, the frequencies of the three possible genotypes involving A1 and A2, namely, A1A1, A1A2, and A2A2, will occur with frequencies p2, 2pq, and q2, respectively. The allele frequencies p and q and the genotype frequencies p2, 2pq, and q2 remain constant from generation to generation under random mating. In reality, the Hardy-Weinberg principle is so robust that most polymorphisms exist in equilibrium. One can test that a gene behaves in accordance with its Hardy-Weinberg expectations by estimating the frequencies of the alleles and then comparing the observed distribution of genotype counts to their expectations generated from the allele frequencies (i.e., are the observed numbers of each genotype in proportions equal to p2, 2pq, and q2?).

One of the obvious implications of genotypes being distributed in proportions p2, 2pq, and q2 is that it leads to a very uneven distribution of individuals in each genotypic group. Consider for a moment a gene having two alleles with the least frequent allele occurring 1% of the time. This implies that in a sample of 1000 individuals you may never observe an individual homozygous for the least frequent allele, whereas 20 are expected to be heterozygotes and 980 will be homozygotes for the common allele. In the context of treatment for common diseases, such a locus may not lead to opportunities for personalized medicine (although it may still be important in leading to an exploitable metabolic pathway). If the least frequent allele were to occur 10% of the time, then the distribution of genotypes in a sample of 1000 would be 10 homozygotes for the least frequent allele, 180 heterozygotes, and 810 homozygotes for the common allele. As the frequency increases, larger segments of the population will be grouped by genotype. If there are genotype-specific responses to treatment, then these frequencies will determine in large part the strategy for screening, classification, and treatment choice.

There are no circumstances for a gene with variation in Hardy-Weinberg equilibrium that will lead to equal numbers of individuals in each of the genotypic classes. Only in the case when allele frequencies are equal will the expected distribution be symmetrical. In such a case there will be equal numbers of the two homozygous classes (1/4 in each group), and 1/2 of the total sample will consist of heterozygotes.

A simple measure of the amount of variation possible in the population for a polymorphic locus is the expected frequency of heterozygotes, which is termed "heterozygosity" (Roychoudhury and Nei, 1988). Heterozygosity is also a reflection of the amount of linkage information available in a marker for mapping genes. Regardless of the number of alleles at a locus, genetic variation (heterozygosity) is always maximized when alleles are equally frequent. For two alleles, the maximal heterozygosity is 0.50. For systems with many alleles like STRs and VNTRs the heterozygosity is often above 0.80;

indeed, for mapping genes via linkage, sets of markers are chosen because of their high heterozygosity or other similar measures of their information content (Botstein et al., 1980).

Although genetic considerations lead to unbalanced numbers in each group, statistical considerations are often most advantageous with balanced numbers. Judicious statistical design is of paramount importance for controlling costs while maintaining adequate statistical power to test hypotheses. This is of particular concern in the context of clinical trials that require extensive study of individuals in the trial. By establishing a two-stage sampling for studies, both considerations can be reconciled. For example, an initial screen of individuals for a clinical trial may involve minimal examinations of a large number of potential participants followed by genotyp-ing. Once candidates are genotyped, a balanced subset representing the genotypic groups can be selected for the full intervention. Statistical power is maintained (likely enhanced) while minimizing the required sample sizes. Such designs will be required to test the efficacy of treatments by genotype.

Diabetes 2

Diabetes 2

Diabetes is a disease that affects the way your body uses food. Normally, your body converts sugars, starches and other foods into a form of sugar called glucose. Your body uses glucose for fuel. The cells receive the glucose through the bloodstream. They then use insulin a hormone made by the pancreas to absorb the glucose, convert it into energy, and either use it or store it for later use. Learn more...

Get My Free Ebook

Post a comment