An SNP is formed by a point mutation, where one base pair is substituted by another base pair. Per definition, a genetic variation at a single base pair locus is not considered to be an SNP unless at least two alleles have frequencies of more than 1% in a large, random population. Thus, 'private alleles' identified in small selected populations (e.g. families) are considered to be mutations and not SNPs.
The vast majority of SNPs have only two alleles because the mutation rate at a particular base pair position in the genome is extremely low (on average 1 mutation per 100 million generations) and it is highly unlikely that two point mutations happen at the same position. For this reason, SNPs can be used to distinquish between populations and the geographical history of a given population can be mapped by identifying the distribution of a particular SNP allele among existing populations. Recently, the National Geographic Society, IBM and the Waitt family foundation initiated a worldwide survey of human populations with the purpose of mapping all major human migrations since modern humans left Africa approximately 60 000 years ago (www.nationalgeographic. com). The markers used in this study are Y chromosome SNPs and mitochon-drial DNA (mtDNA) SNPs, which are particularly useful for mapping large human migrations (Jobling and Tyler-Smith, 2003; Pakendorf and Stoneking, 2005), because the Y chromosome is inherited from father to son and the mtDNA genome is inherited from mother to any offspring without recombination. Thus, a point mutation in the Y chromosome or the mtDNA creates a new male or female lineage, respectively, and the lineage remains distinct from all other lineages in future generations.
The human genome consists of approximately 3000 million base pairs (bp) and the most recent estimate of SNPs is 10 million (Lai, 2006), which gives an average of one SNP per 300 base pairs. The density of SNPs across the genome varies up to ten-fold (Sachidanandam et al., 2001), because of variations in selection pressure, and local recombination and mutation rates (Reich et al., 2002), and the majority of SNPs are located in repeat regions, which are notoriously difficult to analyse. Nevertheless, SNPs are the best choice for construction of a dense set of polymorphic markers that cover the whole genome. The marker set can be used for studying association between the markers and a particular human trait or disease (International HapMap Consortium, 2005). Once an association has been found, a more detailed analysis of the region surrounding the relevant marker(s) can be performed and the polymorphism(s) responsible for the human trait(s) or disease(s) may be identified.
The human genome consists of 30 000-35 000 genes, but the coding regions of these genes only comprise 1.1-1.5% of the genome. The SNPs located outside coding regions can influence gene expression if they are located in regulatory DNA sequences (Wang et al., 2005), but the majority of SNPs probably have very little or no functional consequences for the organism.
An SNP located in a gene may have diverse effects on the cellular function of the protein encoded by the gene. If the SNP is located in the coding region of the gene, the different alleles may encode different proteins, because the trinucleotide sequence (the codon) that codes for one amino acid differs. For example (see Figure 6.1), the codon TGC can be changed to TGT, TGG or TGA by a point mutation in the third position of the codon. The original C allele will code for the amino acid cysteine. The T allele will also code for the amino acid cysteine (known as a silent mutation), whereas the G allele will code for the amino acid tryptophan (known as a missence mutation) and the A allele will code for termination of protein synthesis (known as a nonsense mutation). Obviously, the nonsense mutation is the most severe form of mutation and will almost always result in a non-functional protein. A missence mutation can have all kinds of consequences on the protein, including mis-folding, mis-placement and decreased or increased activity, which again may affect the organism in various ways. Even a silent mutation may not be neutral for the cell, because a silent mutation may affect the efficiency of protein synthesis and thus alter the cellular concentration of the protein.
DNA sequence TTC GAC TGC AAA Protein sequence Phe Asp Cys Lys
DNA sequence TTC GAC TGT AAA Protein sequence Phe Asp Cys Lys
DNA sequence TTC GAC TGG AAA Protein sequence Phe Asp Trp Lys
DNA sequence TTC GAC TGA AAA Protein sequence Phe Asp Stop
Figure 6.1. Examples of how an SNP may change the amino acid sequence of a protein
Was this article helpful?