The third forensic application of Y-chromosomal DNA polymorphisms is in the identification of the geographical origin or genetic ancestry of an unknown male individual. Since this is the most recent application of Y-chromosomal markers to forensics, it deserves a somewhat more detailed summary. Geographical origin or, in other words, genetic ancestry identification is important in forensic cases with no known suspects. In such cases it would be helpful for the police to be able to concentrate their investigation towards finding suspects from specific groups of individuals, i.e. people of a particular geographical origin (often the terms 'ethnic group' or 'ethnic identification' are used but are unfortunate since ethnicity is determined by more factors than geography). Genetic testing can, to a certain extent, provide such information, at least for some geographical regions of the world. However, in order to trace the suspect(s), the police would usually extrapolate information on particular externally visible characteristics from the DNA data providing information about the geographical origin. Assumptions about a suspect's looks based on his DNA-based geographical origin are strictly indirect and the entire approach is feasible only when a high correlation between a geographical region and an externally visible trait exists. For instance, there is a high correlation between human skin colour and latitude also leading to continental differentiation. As result, European geographical origin is usually strongly associated with light skin colour, whereas African genetic ancestry usually is with dark skin colour. Because of this strong association, it is somewhat justified to conclude a light skin colour appearance of a donor of a DNA sample when DNA typing reveals a European genetic origin, and a dark skin colour appearance from a DNA test revealing a African genetic origin. However, similarly strong correlations involving other phenotypic traits and geographical regions are rare.
DNA-based identification of geographical origins is usually performed by testing markers where a specific allele or haplotype is restricted to a certain geographical region or shows significant and large frequency differences between geographical regions. In general, frequency distributions of genetic markers arise when a mutation occurs in a single individual living in a particular geographical region and inherits the mutations to produced offspring who subsequently spread/migrate to other geographical regions or remain where they are. There can be several reasons why a marker increases in frequency so that it can be used for geographical origin identification. For instance, the mutation can have a beneficial effect on the individuals carrying it, resulting in reproductive success. Such effects of positive selection causing high marker frequencies are known, for instance, from genes responsible for or associated with resistance towards certain infectious diseases but can be expected from all genes with close environmental interactions and severe influence on survival and reproduction. In the case of resistance towards infectious diseases, the marker frequency depends on the strength of selection (benefit) but also on the frequency of the disease-causing organisms (e.g. mutations in genes expressed in red blood cells provide malaria resistance and are frequent in regions with a high incidence of malaria because of the high frequency of malaria-causing Plasmodium spp.). However, based on existing knowledge, positive selection is unlikely to have shaped Y chromosome diversity and the frequency distribution of Y markers basically depends on the mutation rate, the geographical region of occurrence of the mutation, cultural factors influencing the degree of male reproduction (i.e. residence and marriage patterns, warfare, etc.) and the migratory history of the respective (male) population.
With the availability of the first population data of Y-STRs it was noticed that - albeit rarely - some Y-STR marker alleles show a highly restricted geographical distribution, e.g. short DYS390 alleles in the Pacific region (Kayser et al., 1997). Also, significant differences in the Y-STR haplotypes were found between geographically distant populations (Kayser et al., 2001) as well as between geographically close populations, such as the Germans and the Dutch (Roewer et al., 1996), although not between many other European groups (Roewer et al., 2001). Within Europe at least three different groups of populations (called metapopulations in the YHRD) - Eastern Europeans, Western Europeans and Southeastern Europeans - were identified in the YHRD (Roewer et al., 2005). Thus, based on the minimal Y-STR haplotype, information about which European region a male and his paternal ancestors originated from can be obtained, at least for some of the most characteristic haplotypes and those that show a more restricted distribution.
With the recent expansion of the YHRD to additionally include non-European population samples, continental information can also be obtained from Y-STR haplotype data, although such conclusions are still preliminary because of the low (but growing) number of non-European samples. There are a number of Y-STR haplotypes that show a continentally restricted frequency distribution. For instance, a YHRD search in August 2006 (Release '19') revealed that the Y-STR haplotype most characteristic for the Eastern European population cluster (DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS385a-b: = 17, 13, 30, 25, 10, 11, 13, 10-14; Roewer et al., 2005) is found in 192 out of 40 108 individuals in a set of 320 worldwide populations of which 186 (97%) are European (Plate 9.1a). From the remaining six matches, one (0.5%) is in Turkey, one (0.5%) in Kazakhstan and four (2%) in Hungarian Gypsies. This haplotype was not observed elsewhere in the world. Of the 186 European matches, 123 (66%) are found in the Eastern European metapopulation, as expected (most frequent all over Poland; somewhat frequent in Ukraine; rare in Lithuania, Latvia and Slovenia), 53 (28.5%) in the Western European metapopulation (mostly all over Germany; somewhat frequent in Czechia; rare in
Sweden and Italy), 7 (3.8%) in the South-Eastern European metapopulation (Greece, Hungary, Macedonia, Romania), 2 (1%) in US Americans and 1 (0.5%) from Argentina - the latter three men are of self-declared European descent.
The Y-STR haplotype most characteristic for the Western European metapopulation (14, 13, 29, 24, 11, 13, 13, 11-14; Roewer et al., 2005) was found in 820 out of 40 108 individuals in a set of 320 worldwide populations (Plate 9.1b), of which 731 (89%) are Europeans. From the 89 remaining matches 85 are of likely European ancestry through European admixture: 30 are from the USA, UK, Brazil or Columbia but from individuals of self-declared African ancestry, 3 are from African countries (Angola, Equatorial Guinea), 2 are from the UK but from individuals with self-declared Asian ancestry, 13 are from Reunion Creoles, 7 are from Ecuador Mestizos, 2 are from Ecuador Quichuas - all of these most likely indicate European Y-chromosomal admixture - and 28 are from U.S. Hispanics. Thus, altogether this haplotype is found in 99.5% of the matches in individuals with European ancestry; the remaining four matches are from Turkey, China, Georgia, and Hungarian Gypsy. This haplotype was not observed elsewhere in the world. As expected, this haplotype is most frequent in the Western Europeans with 466 (57%) matches (frequent all over Portugal, Spain, France, The Netherlands, Belgium, Ireland, UK, Germany, Switzerland, Italy; less frequent in Sweden, Denmark, Norway, U.S. Europeans; rare in Austria, Czechia, Estonia, Finland), and was also found with 16 matches (2%) in the Eastern European metapopulation (mostly Poland; rare in Slovenia and Germany), 16 matches in the Southeastern European metapopulation (2.3%) (rare in Greece, Hungary, Italy, Macedonia, Romania, Albania, Bulgaria) and 233 matches (28.4%) in Europeans from Argentina, Brazil, Colombia, South Africa and US Americans of European descent. In contrast, the Y-STR haplotype that is most frequent in US African Americans (15, 13, 31, 21, 10, 11, 13, 16, 17; Kayser et al., 2002) was found with 41 matches in the YHRD (Plate 9.1c), of which 38 (92.6%) are Africans or men with known African descent: Cameroon, Bantu South Africa, Guinea, Mozambique, Egypt, African Americans from the USA, Brazil, Ecuador, Colombo and UK Afro-Caribbeans. The remaining 3 matches are from one Argentinean European, one US Hispanic and one Reunion Creole, most likely indicating African Y-chromosomal admixture.
Although, as can be seen, at least some Y-STR haplotypes are informative for geographical origin identification, the relatively high mutation rate of Y-STRs (Kayser et al., 2000b) tends to randomize genetic ancestry signals over a large number of generations/long time span. Therefore it is often stated that Y-STRs are more informative for detecting recent rather than ancient events in the genetic history of populations, whereas ancient events can be identified more reliably using Y-chromosomal single nucleotide polymorphisms, or SNPs, which have mutation rates 100 000 times lower than Y-STRs (Thomson et al., 2000). One of the first studies describing a geographically restricted distribution of a Y-SNP marker, and its use for investigating human population history, appeared in 1997 (Zerjal et al., 1997). Today, a large number of Y-SNP markers are known and many of them show a continent-specific distribution. A comprehensive summary of the distribution and applications of Y-chromosome SNP markers can be found elsewhere (Jobling et al., 2004). Here, I want to illustrate the suitability of Y-SNP markers for detecting geographical origins using three continent-specific examples:
1. African origins: the Y-SNP marker SRY4064 (or one of its phylogenetic equivalents, M96 or P29) defining haplogroup E (Y Chromosome Consortium, 2002) appears at high frequency almost everywhere in Africa but is absent from all regions outside Africa, except those in close geographical proximity to Africa (Plate 9.2a). This is because the mutation probably arose in Africa some time after humans migrated out of Africa, about 150 000 years ago, but before the major human migrations within Africa.
2. European origins: the Y-SNP marker M173 defining the haplogroup R1 (Y Chromosome Consortium, 2002) has a high frequency in Europe (especially Western Europe) and a low to non-existent frequency outside of Europe, except those areas with known records of European immigration, e.g. due to the European colonizations starting about 500 years ago carrying the marker to regions such as Polynesia, together with more recent European admixture, e.g. in New Zealand (Plate 9.2b). This mutation most likely has an ancient origin in Eurasia but its current frequency distribution is believed to be the result of a postglacial expansion starting 20 000-13 000 years ago from a refugee population somewhere on the Iberian peninsula (Semino et al., 2000), explaining the gradual frequency decline from Western to Eastern Europe.
3. East Asian origins: the Y-SNP marker M175 defining the haplogroup O (Y Chromosome Consortium, 2002), has a high frequency in East Asia where it most likely originated, but does not exist elsewhere, except in regions with known East Asian influences, e.g. due to the expansion of Austronesian speakers starting about 6000 years ago in east Asia and carrying the marker to regions such as Polynesia (Kayser et al., 2000a) (Plate 9.2c).
Some cases are known where a high correlation between Y-STRs and Y-SNPs has been observed, such as the statistically significant Y-chromosomal differentiation between Polish and German populations, which is assumed to be a genetic consequence of politically forced population movements during and especially after World War II (Kayser et al., 2005). Furthermore, some European metapopu-lations, as identified by their Y-STR haplotypes (Roewer et al., 2005), correlate well with specific Y-SNP haplogroups, e.g. the most characteristic Eastern European Y-STR haplotype (17,13,30,25,10,11,13,10-14), together with its close relatives, is associated with Y-SNP haplogroup R1(xR1a1), whereas the most characteristic Western European Y-STR haplotype (14,13,29,24,11,13,13,1114), together with its close relatives, is associated with Y-SNP haplogroup R1a1
(Kayser et al., 2005). Although in cases of close correlation, Y-STRs and Y-SNP markers alone will reveal the same geographical information, a combination of both marker types can be more informative due to the additional information provided by closely related Y-STR haplotypes (e.g. one-step neighbours) existing on a particular Y-SNP background.
To use any type of genetic marker to identify the geographical origin of an individual, large reference databases are required to establish the marker's geographic distribution. Such a database exists for Y-STR haplotypes, with the YHRD. Unfortunately, however, a similar resource for Y-SNP data is not yet available. Ideally, for Y-chromosome-based geographical origin identification, Y-SNPs and Y-STRs should be combined in a single reference database in order to maximize male-specific ancestry information. Efforts to include Y-SNP data in the YHRD database are currently underway.
There is one severe problem with using Y-chromosomal markers for genetic ancestry identification, namely in cases of individuals with mixed genetic heritage (genetic admixture). For example, a Y chromosome DNA analysis of the son of a European man and an African woman will reveal a European geographical origin in the son in spite of his, most likely, African appearance. A similar analysis in all his male relatives will also reveal European (Y-chromosomal) genetic ancestry, even if all of them produce offspring with African women. In such cases, testing Y-chromosomal markers will be completely misleading if conclusions about a person's appearance are to be drawn from the test results. Continental and population-wide genetic admixture is known from North and South America (Alves-Silva et al., 2000; Kayser et al., 2003) but in principle can be expected in individuals everywhere, with an increased probability in regions with a known history of influences from people of different continental origin, e.g. due to the European expeditions to the Americas and the Pacific, or as a result of the African slave trade to the Americas. Therefore, to reveal a person's geographical origin with a high degree of accuracy, ancestry-informative markers from the Y chromosome need to be combined with those from both mitochon-drial DNA (Sigurdsson et al., 2006) and autosomal DNA (Lao et al., 2006).
Was this article helpful?