## Single nucleotide polymorphisms for human identification

For human identification purposes, one SNP locus is obviously less informative than one STR locus, because the SNP locus has only two possible alleles and the STR locus typically has 8-15 different alleles. The match probability P for n SNP loci (between the SNP profiles from two randomly selected individuals) can be approximated by assuming that all SNPs are in Hardy-Weinberg equilibrium and that the frequency of the least common allele p is constant for all loci:

This is a simple function of p and n, and by comparing P to the match probabilities obtained for STRs it can be estimated that 50 SNPs give match probabilities equivalent to 12 STRs (Gill, 2001) if p is between 0.2 and 0.5; P has the highest value for p = 0.5, but P does not change very much when p is between 0.3 and 0.5. Thus, Equation (6.1) is a good estimate for the real match probability of a set of SNPs if the allele frequencies of the selected SNP loci are within this range, even though Equation (6.1) was calculated under the assumption that p is constant for all loci. In a paternity case, the power of exclusion, Z, can be calculated based on all possible genotype combinations of mother and child (Krawczak, 1999). For a biallelic SNP, where p. is the frequency of the /th allele and m is the number of alleles.

Under the assumption that p = 0.5 for SNPs and p. = 1/m for STRs, 4-8 SNPs are needed to obtain the same power of exclusion as one STR. Using real Caucasian allele frequencies for 14 commonly used STRs, the average number of SNPs needed to obtain the same power of exclusion as one STR was 4.23 (SNPs with p = 0.5), 4.41 (SNPs with p = 0.4) or 5.04 (SNPs with p = 0.3), respectively (Krawczak, 1999). This indicates that 50-60 SNPs with p = 0.3-0.5 have the same discriminatory power for analysis of stains (match probability) and disputed family relations (power of exclusion) as the 13 CODIS loci currently in use by most forensic laboratories.

Z = p(1 -p)[1 -p(1 -p)] and for a multi-allelic STR,

There are two important reasons why it is preferred to analyse 50-60 SNP loci instead of 13 STR loci. First of all, the length of the PCR product containing an SNP locus need only be the length of the PCR primers plus one base pair (the SNP position). In theory, a DNA sequence is unique if it is 16 bp long (416 = 4295 million combinations, which is more than the number of base pairs in the human genome). Thus a PCR product containing an SNP locus need only be (2 x 16) + 1 = 33 bp long. In reality, the PCR primer design restrains the positioning of the PCR primer (Sanchez et al., 2005), and the PCR product must be longer. Nevertheless, most SNPs can be amplified on PCR products less than 100 bp in length. In contrast, some CODIS STR alleles have up to 40 tandem repeat units, each with a size of 4 bp, and consequently the PCR products containing the STR locus need to be 200 bp or longer. In the commercial kits used in most forensic laboratories to amplify the 13 CODIS STRs, the loci are amplified in the same tube (multiplex PCR) and the lengths of the PCR products (i.e. the alleles) are determined by electrophoresis. In order to separate and identify the many PCR products, the longest PCR products have been designed to be 400-450 bp, but in highly degraded DNA samples the average length of DNA fragments is shorter than 150 bp, and therefore many of the STR loci are not amplified when the sample has been exposed to high temperatures or high humidity that degrades the DNA. So-called mini-STR kits have been developed (Coble and Butler, 2005; Asamura et al., 2006; Opel et al., 2006), where the PCR products have been reduced in length, but these kits only target 4-6 STRs and the discriminatory power is significantly smaller. Furthermore, unfinished extension products will be formed in higher numbers when PCR is performed on highly degraded DNA, because PCR primers may anneal to a strand where the target sequence is interrupted. The unfinished extension products from mini-STR kits consist almost exclusively of tandem repeat sequences and can anneal to many different positions in the STR target locus during subsequent PCR cycles. This will increase the risk of amplifying fragments with a different number of tandem repeats than was originally present in the sample. If such products are made during the first critical cycles of the PCR, false alleles may be detected and assigned to the sample. In contrast, unfinished extension products from amplification of SNP loci cannot create false SNP alleles, because they can only anneal to one unique position.

The second reason why SNPs are preferred over STRs is the low mutation rate of SNPs. If the investigated man and a child in a typical paternity case do not share any alleles for a given locus (genetic inconsistency), it indicates that the man is not the father, but one genetic inconsistency between the father and the child is observed in approximately 100(13 x 0.003) = 3.9% of the cases, where the 13 CODIS loci have been investigated. This is a highly unfortunate, but unavoidable, consequence of the relatively high mutation rate of tandem repeats. In rare cases where relatives (e.g. two brothers, or father and son) are investigated, it may even be impossible to draw a conclusion, because relatives share a high number of genetic markers and few genetic inconsistencies are expected between an investigated relative (e.g. uncle or grandfather), who is not the father, and a child. For comparison, one genetic inconsistency between the father and the child will be observed in approximately 100(60 x 0.0000001) = 0.0006% of the cases if 60 SNPs were investigated.

There are three reasons why STRs are preferred over SNPs. First, the national and international databases contain STR profiles from known criminal offenders, victims and samples collected from crime scenes during the past two decades. It is an overwhelming task to type all these samples again for SNP markers, and in some cases it is impossible, because the samples may have been used up and cannot be replaced. Secondly, samples collected from crime scenes often contain DNA from more than one person, and mixtures are difficult to detect when analysing DNA markers with only two alleles (Gill, 2001). In contrast, STRs are very useful for detection of mixtures, because two individuals are likely to have three or four different alleles in some STR systems. Sometimes, it is even possible to estimate the STR profiles of the two individuals based on the amplification strength of the alleles, and it is always possible to calculate a match probability between a reference sample from an individual and the mixture. If the mixture contains DNA from more than two people, SNPs will be almost useless, whereas STRs may still be used for calculation of match probabilities (Gill et al., 2006). Thirdly, very little DNA is often recovered from crime scenes (e.g. from fingerprints or hair), and since all such samples are unique and cannot be replaced it is essential to obtain as much information as possible from every investigation performed on the sample. This is one of the reasons why amplification of multiple fragments (multiplex PCR) is important for forensic genetic analyses. In addition, multiplexing also simplifies analysis, reduces cost and decreases the number of times a sample is handled in the laboratory, which reduces the risk of contamination and mix-ups. However, it is difficult to develop PCR multiplexes with 5-10 fragments, and construction of a robust multiplex with 50-60 fragments is a very serious undertaking.

Nevertheless, several large SNP panels have been suggested for human identification purposes (Inagaki et al., 2004; Dixon et al., 2005; Kidd et al., 2006), and recently a 52-SNP-plex assay was described (Sanchez et al., 2006) by a group of five forensic laboratories (The SNPforlD consortium, www.snpforid. org), where 52 fragments are amplified in one multiplex PCR and the SNPs are detected by two multiplex SBE reactions (Figure 6.2). Each of the SNPs in the 52-SNP-plex assay maps to unique locations on the autosomal chromosomes and has a minimum distance of 100 kb from known genes. The SNPs are polymorphic in the three major population groups (Caucasian, African and Asian) with p = 0.3-0.5, and all SNPs can be amplified and detected from as little as 200 pg of genomic DNA (approximately the amount of genomic DNA in 30 human cells). Currently, this is the most promising SNP multiplex for human identification, and the European DNA Profiling Group (EDNAP), a working group under the International Society for Forensic Haemogenetics (ISFH), and commercial companies have expressed strong interest in the 52-SNP-plex.