23. genetyków, antropologów, biochemików z całego świata, z tak
renomowanych ośrodków jak Stanford i Cambridge, w potężnym badaniu
uwzględniło ponad 16 tys. genotypów reprezentujących 126 populacji z
całej Euroazji. Projekt był rozwinięciem badania przeprowadzonego w 2009
r. przez zespół pod kierownictwem Petera A. Underhilla z uniwersytetu
Stanford w USA.
Przypomnijmy, że w roku 2009 wnioski zespołu Underhilla zatrzęsły
dogmatami o historii Słowian, w tym szczególnie Polaków. Naukowcy
uznali, że nasi przodkowie mieli takie geny, jak większość dzisiejszych
obywateli Rzeczpospolitej, a nad Odrą i Wisłą jesteśmy od skromnych
około 11 tys. lat. Oznaczało to, że dotychczasowe teorie historyków o
pobycie Słowian na dzisiejszych polskich ziemiach dopiero od V wieku
n.e. to bzdura. Dalszą konsekwencją tego wniosku było zakwestionowanie
tezy niemieckich nacjonalistów, że wszelkie kultury przed tym okresem
tworzyli Germanie. Dopóki nie wkroczyły nauki ścisłe łatwo było zbijać
twierdzenia części polskich archeologów i historyków, że Słowianie byli
twórcami wszystkich kultur na naszych ziemiach co najmniej od łużyckiej.
Łatwo było wyśmiewać się z prymitywizmu Polaczków w porównaniu z wysoko
cywilizowanymi Niemcami. Niestety po 2009 główny nurt polskiej
historiografii nie wykorzystał daru podanego na tacy zza Atlantyku i
podkulił ogon pod siebie.
Tak, jak każdy naukowiec z krwi i kości Peter A. Underhill śledził
jednak postępujące osiągnięcia genetyki, w których zresztą od lat
uczestniczy. Miał wątpliwości co do precyzji pierwszych wniosków. Po
pięciu latach wykorzystał nowe metody i uzupełnił materiał badawczy o
dane z odkopanych przez archeologów szczątków ludzkich sprzed
tysiącleci. W 2014 r. okazało się, że pomylił się, bo określenie czasu
powstania „prapolskiego genu” R1a-M458 w oparciu o analizę STR i metodę
koalescencji z roku 2009 było błędne.
W roku 2014 jego zespół obliczył czas w oparciu o SNP-y i
sekwencjonowanie całego chromosomu Y oraz dane ze wspomnianych kopalnych
DNA. W oparciu o tę metodę oszacowano wiek „naszej” haplogrupy na około
5800 lat.
Wyniki Underhilla z roku 2014 oznaczały też, że indoirańska hg
R1a-Z93 i bez wątpienia słowiańska R1a-Z282 rozeszły się około 6 tys.
lat temu oraz że nigdy się już potem nie spotkały, bo w Indiach i Iranie
nie ma R1a-Z282, a w Europie Środkowo-Wschodniej nie ma R1a-Z93. Czas
rozejścia się tych rodów można łączyć z czasem ostatecznego rozpadu
wspólnoty praindoeuropejskiej i z początkiem kształtowania się języków
praindoirańskiego i
Oznaczało to, że mógł pomylić się o 5 tys. lat, ale nie podważało to
teorii o nadal kilkutysiącletnim nieprzerwanym pobycie Prapolaków na ich
dzisiejszych ziemiach. Przy okazji znacznie ograniczyło tolerancję
możliwego błędu datowania – teraz to około tysiąc lat, a nie kilka
tysięcy. Uboczne efekty znalezienia tej korekty okazały sie jednak
zaskakujące. W publikacji podsumowującej badanie z 2014 roku możemy
przeczytać: „Zauważmy, że najwcześniejsze linie R1a znalezione do tej
pory w starożytnym europejskim DNA, datowane na 4 600 lat temu,
korespondują z kulturą ceramiki wstęgowej rytej. Natomiast trzy sample
DNA uzyskano z wcześniejszej kultury ceramiki sznurowej (7600-6500 lat
temu). Daje to możliwy obraz szerokiego i gwałtownego rozprzestrzeniania
R1a-Z282, linii związanych ze
Najnowsze ustalenia naukowców wychodzą więc ewidentnie na spotkanie z
teoriami pasjonatów, którzy w oparciu o źródła, podważane przez
zawodowych historyków, mówią o starożytnej Wielkiej Lechii rozciągającej
się przez tysiące lat od… Renu do Wołgi. O federacji rodów z jednego
pnia, która była w stanie oprzeć się imperiom perskiemu, rzymskiemu,
macedońskiemu, bizantyńskiemu i frankońskiemu. O ludach różnie zwanych
(Lechici, Wenedowie, Scytowie, Sarmaci), które były przodkami
dzisiejszych Polaków.
Niemieckich również. I im akurat nie dziwię się. W zespole Underhilla
byli za to dociekliwi Amerykanie, Hindusi, Chorwaci, Brytyjczycy,
Francuzi, Rosjanie, Włosi. Stąd być może prace zespołu Underhilla mało
obchodzą oficjalnych doktrynerów polskiej historiografii. Dlatego też
pewnie bardziej wierzę specjalistom z najlepszych ośrodków akademickich
współczesnego świata niż spekulacjom naszych „luminarzy”. I wstyd mi, bo
tak jak nadwiślańska profesura też jestem Polakiem – tylko może jakimś
innym. Chyba, że to taka nasza szczególnie kreatywna „polityka
historyczna” – niech inni zabiorą nam nasze dziedzictwo, ale my cieszmy
się z późnego chrztu mieszkowych dzikusów z bagien.
NA ZDJĘCIU: Ulubiona przez „turbosłowian” rzekoma mapa Lechii z 6
w.n.e. według brytyjskich historyków. Najpewniej falsyfikat, ale czy
rzeczywiście?
Top of page
Introduction
High-throughput
resequencing efforts have uncovered thousands of Y-chromosome variants
that have enhanced our understanding of this most informative locus’
phylogeny, both through the resolution of topological ambiguities and by
enabling unbiased estimation of branch lengths, which, in turn, permit
timing estimates.
1, 2, 3, 4, 5 The International Society of Genetic Genealogy
10
has aggregated these variants and those discovered with previous
technologies into a public resource that population surveys can leverage
to further elucidate the geographic origins of and structure within
haplogroups.
6, 7, 8, 9, 10, 11, 12, 13
Y-chromosome haplogroup R (hg R) is one of 20 that comprise the standardized global phylogeny.
14 It consists of two main components: R1-M173 and R2-M479
15 (
Figure 1). Within R1-M173, most variation extant in Eurasia is confined to R1a-M420 and R1b-M343.
16 In Europe, R1a is most frequent in the east, and R1b predominates in the west.
17
It has been suggested that this division reflects episodic population
expansions during the post-glacial period, including those associated
with the establishment of agricultural
/pastoral economies.
3, 18, 19, 20, 21
More than 10
%
of men in a region extending from South Asia to Scandinavia share a
common ancestor in hg R1a-M420, and the vast majority fall within the
R1a1-M17
/M198 subclade.
22 Although the phylogeography of R1b-M343 has been described, especially in Western and Central Europe,
15, 23, 24, 25
R1a1 has remained poorly characterized. Previous work has been limited
to a European-specific subgroup defined by the single-nucleotide
polymorphism (SNP) called M458.
22, 26, 27, 28, 29, 30 However, with the discovery of the Z280 and Z93 substitutions within Phase 1 1000 Genomes Project data
1 and subsequent genotyping of these SNPs in ~200 samples, a schism between European and Asian R1a chromosomes has emerged.
31
We have evaluated this division in a larger panel of populations,
estimated the split time, and mapped the distributions of downstream
sub-hgs within seven regions: Western
/Northern Europe, Eastern Europe, Central
/South Europe, the Near
/Middle East, the Caucasus, South Asia, and Central Asia
/Southern Siberia.
Top of page
Materials and methods
Population samples
We assembled a genotyping panel of 16
244
males from 126 Eurasian populations, some of which we report upon for
the first time herein and others that we have combined from earlier
studies,
22, 29, 30, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45
and updated to a higher level of phylogenetic resolution. All samples
were obtained using locally approved informed consent and were
de-identified.
Whole Y-chromosome sequencing
We analyzed 13 whole R1 Y-chromosome sequences: 8 novel, 2 previously published,
4 and 3 from the 1000 Genomes Project.
2
All sequences were generated on the Illumina HiSeq platform (Illumina,
San Diego, CA, USA) using libraries prepared either from genomic DNA or
from flow sorted Y-chromosomes drawn from lymphoblast cell line cultures
induced to the metaphase cell division stage (
Supplementary Table 1). We used Bowtie
46 to map 101-bp sequencing reads to the hg19 human reference, and we called genotypes across 9.99
Mb and estimated coalescence times using a rate of 1 SNP accumulating per 122 years as described in Poznik
et al.4
SNP analysis
We selected binary markers (
Supplementary Table 2) from the International Society of Genetic Genealogy
10
database and from whole Y-chromosome sequencing and genotyped samples
either by direct Sanger sequencing or RFLP assays. Within the full
panel, 2923 individuals were found to be members of hg R1a-M420. These
M420 carriers were then genotyped in a hierarchical manner (
Figure 1) for the following downstream markers with known placement on the tree: SRY10831.2, M17, M198, M417, Page7, Z282
/S198, Z284
/S221, M458, M558
/CTS3607, Z93
/S202,
Z95, Z2125, M434, M560, M780, M582, and for three SNPs whose placement
within the R1a topology was previously unknown: M746, M204, and L657.
We generated spatial frequency maps for the R1a subgroups that we determined to occur at 10
% frequency or greater within a studied region. To do so, we used Surfer (v.8, Golden Software, Inc, Golden, CO, USA) with the
kriging
algorithm and the option to use bodies of water as break-lines. We
carried out spatial autocorrelation analysis to detect clines by
calculating Moran’s I coefficient using PASSAGE v.1.1 (
www.passagesoftware.net)
with a binary weight matrix, 10 distance classes, and the assumption of
a random distribution. Haplogroup diversities were calculated using the
method of Nei.
47 To investigate the genetic affinities among populations, we used the freeware popSTR program (
http://harpending.humanevo.utah.edu/popstr/) to perform a principal component analysis (PCA) based on R1a subgroup frequencies.
Short tandem repeat (STR) analysis
We genotyped 1355 samples for 10–19 STRs (Y-STRs;
Supplementary Table 3) and calculated haplotype diversities, also using the method of Nei.
47 Coalescence times (
Td) of R1a subhaplogroups were estimated using the ASD
0 methodology described by Zhivotovsky
et al48 and modified according to Sengupta
et al.41 Given the uncertainty associated with Y-STR mutation rates,
24 we used both the evolutionary effective mutation rate of 6.9 × 10
−4 per 25 years
48 and, for comparison, the pedigree mutation rate of 2.5 × 10
−3 per generation.
49
Top of page
Results and Discussion
Refinement of hg R1a topology
Figure 1
shows, in context, the phylogenetic relationships of the markers we
genotyped in this study. These include three for which phylogenetic
placement was previously unknown: M746, M204, and L657. We localized the
rare M204 SNP based on a single Iranian sample confirmed by Sanger
sequencing to carry the derived allele.
37, 50
Phylogeography
We measured R1a haplogroup frequency by population (
Supplementary Table 4). Of the 2923 hg R1a-M420 samples, 2893 were derived for the M417
/Page7
mutations (1693 non-Roma Europeans and 1200 pan-Asians), whereas the
more basal subgroups were rare. We observed just 24 R1a
*-M420(xSRY10831.2), 6 R1a1
*-SRY10831.2(xM198), and 12 R1a1a1-M417
/Page7
*(xZ282,Z93). We did not observe a single instance of R1a1a-M198
*(xM417,Page7), but we cannot exclude the possibility of its existence.
Of the 1693 European R1a-M417
/Page7 samples, more than 96
% were assigned to R1a-Z282 (
Figure 2), whereas 98.4
% of the 490 Central and South Asian R1a lineages belonged to hg R1a-Z93 (
Figure 3), consistent with the previously proposed trend.
31 Both of these haplogroups were found among Near
/Middle East and Caucasus populations comprising 560 samples.
Subgroups of both R1a-Z282 and R1a-Z93 exhibit geographic localization within the broad distribution zone of R1a-M417
/Page7. Among R1a-Z282 subgroups (
Figure 2), the highest frequencies (~20
%) of paragroup R1a-Z282
* chromosomes occur in northern Ukraine, Belarus, and Russia (
Figure 2b). The R1a-Z284 subgroup (
Figure 2c) is confined to Northwest Europe and peaks at ~20
% in Norway, where the majority of R1a chromosomes (24
/26) belong to this clade. We found R1a-Z284 to be extremely rare outside Scandinavia. R1a-M458 (
Figure 2d) and R1a-M558 (
Figure 2e) have similar distributions, with the highest frequencies observed in Central and Eastern Europe. R1a-M458 exceeds 20
% in the Czech Republic, Slovakia, Poland, and Western Belarus. The lineage averages 11–15
% across Russia and Ukraine and occurs at 7
% or less elsewhere (
Figure 2d). Unlike hg R1a-M458, the R1a-M558 clade is also common in the Volga-Uralic populations. R1a-M558 occurs at 10–33
% in parts of Russia, exceeds 26
% in Poland and Western Belarus, and varies between 10 and 23
%
in the Ukraine, whereas it drops 10-fold lower in Western Europe. In
general, both R1a-M458 and R1a-M558 occur at low but informative
frequencies in Balkan populations with known Slavonic heritage. The
rarity of R1a-M458 and R1a-M558 among Central Asian and South Siberian
R1a samples (4
/301;
Supplementary Table 4) suggests low levels of historic Slavic gene flow.
In the complementary R1a-Z93 haplogroup, the paragroup R1a-Z93
* (
Figure 3b) is most common (>30
%) in the South Siberian Altai region of Russia, but it also occurs in Kyrgyzstan (6
%) and in all Iranian populations (1–8
%). R1a-Z2125 (
Figure 3c) occurs at highest frequencies in Kyrgyzstan and in Afghan Pashtuns (>40
%). We also observed it at greater than 10
% frequency in other Afghan ethnic groups and in some populations in the Caucasus and Iran. Notably, R1a-M780 (
Figure 3d) occurs at high frequency in South Asia: India, Pakistan, Afghanistan, and the Himalayas. The group also occurs at >3
% in some Iranian populations and is present at >30
% in Roma from Croatia and Hungary, consistent with previous studies reporting the presence of R1a-Z93 in Roma.
31, 51
Finally, the rare R1a-M560 was only observed in four samples: two
Burushaski speakers from north Pakistan, one Hazara from Afghanistan,
and one Iranian Azeri.
Y-STR haplotype networks and diversity
We genotyped a subset of 1355 R1a samples for 10–19 Y-chromosome STR loci (
Supplementary Table 3) and constructed networks for both hg R1a-Z282 and hg R1a-Z93 (
Supplementary Figure 1 and
Supplementary Figure 2).
Although we could assign haplotypes to various haplogroups, power to
identify substructure within hg R1a-M198 was limited, consistent with
previous work.
22, 52 Although haplotype diversity is generally very high (
H>0.95) in all haplogroups (
Supplementary Table 3), lower diversities occur in south Siberian paragroup R1a-Z93
* (
H=0.921), in Jewish R1a-M582 (
H=0.844) and in Roma R1a-M780 (
H=0.759), consistent with founder effects that are evident in the network patterns for these populations (
Supplementary Figure 2).
Origin of hg R1a
To
infer the geographic origin of hg R1a-M420, we identified populations
harboring at least one of the two most basal haplogroups and possessing
high haplogroup diversity. Among the 120 populations with sample sizes
of at least 50 individuals and with at least 10
%
occurrence of R1a, just 6 met these criteria, and 5 of these 6
populations reside in modern-day Iran. Haplogroup diversities among the
six populations ranged from 0.78 to 0.86 (
Supplementary Table 4). Of the 24 R1a-M420
*(xSRY10831.2)
chromosomes in our data set, 18 were sampled in Iran and 3 were from
eastern Turkey. Similarly, five of the six observed R1a1-SRY10831.2
*(xM417
/Page7)
chromosomes were also from Iran, with the sixth occurring in a Kabardin
individual from the Caucasus. Owing to the prevalence of basal lineages
and the high levels of haplogroup diversities in the region, we find a
compelling case for the Middle East, possibly near present-day Iran, as
the geographic origin of hg R1a.
Spatial dynamics of R1a lineage frequencies
We
conducted a spatial autocorrelation analysis of the two primary
subgroups of R1a (Z282 and Z93) and of each of their subgroups
independently (
Supplementary Figure 3).
Each correlogram was statistically significant. We observed clinal
distributions (continually decreasing frequency with increasing
geographic distance) across a large geographic area in the two
macrogroups and in M558 and M780 as well. One group (Z2125) did not
reveal any discernible pattern, and the analysis of four groups (Z282
*, Z284, M458, and Z93
*)
indicated potential clinal distributions that do not extend across the
full geographic range under study. Therefore, we also analyzed partial
ranges for Z282
* and M458 in Europe, the
Caucasus, and the Middle East, and for Z284 in Europe, but these partial
range analyses also failed to yield evidence of clinal distributions.
We also conducted PCA of R1a subgroups (
Figure 4). The first principal component explains 21
%
of the variation and separates European populations at one extreme from
those of South Asia at the other. The second explains 14.7
%
of the variation and is driven almost exclusively by the high presence
of M582 among some Jewish populations, particularly the Ashkenazi Jews.
PC2 separates them from all other populations. When we consider
haplogroups rather than populations (
Supplementary Figure 4), we see that the clustering of European populations is due to their high frequencies of M558, M458, and Z282
*, whereas the M780 and Z2125
* lineages account for the South Asian character of the other extreme.
To
put our frequency distribution maps, PCA analyses, and autocorrelation
results in archaeological context, we note that the earliest R1a
lineages (genotyped at just SRY10381.2) found thus far in European
ancient DNA date to 4600 years before present (YBP), a time
corresponding to the Corded Ware Culture,
53
whereas three DNA sample extracts from the earlier Neolithic Linear
Pottery Culture (7500–6500 YBP) period were reported as G2a-P15 and
F-M89(xP-M45) lineages.
54
This raises the possibility of a wide and rapid spread of
R1a-Z282-related lineages being associated with prevalent Copper and
Early Bronze Age societies that ranged from the Rhine River in the west
to the Volga River in the east
55 including the Bronze Age Proto-Slavic culture that arose in Central Europe near the Vistula River.
56
It may have been in this cultural context that hg R1a-Z282 diversified
in Central and Eastern Europe. The corresponding diversification in the
Middle East and South Asia is more obscure. However, early urbanization
within the Indus Valley also occurred at this time
57 and the geographic distribution of R1a-M780 (
Figure 3d) may reflect this.
To
evaluate the potential role of R1a diversification in these
post-Neolithic events, we took two approaches toward estimating the time
to the most recent common ancestor (TMRCA). The first was a Y-STR-based
coalescent time estimation, the results of which (
Supplementary Table 5) demonstrate the unsuitability of the pedigree mutation rate, as supported also by the evidence in Wei
et al,
3 the ages being severely underestimated. Alternatively times based on the evolutionary mutation rate,
48
which is prone to overestimation, should be regarded as the upper
bounds on the sub-hg dispersals. The second approach was TMRCA
estimation based on whole Y-chromosome sequencing data.
Whole Y-chromosome sequences from R1a and R1b: TMRCA estimates
The
SNPs that we genotyped across 126 populations reveal considerable
information about the topology of the haplogroup tree, but they were
ascertained in a biased manner, and they are too few in number to convey
any meaningful branch-length information. Hence, our SNP genotyping
results are devoid of temporal information. To obtain unbiased branch
lengths to estimate TMRCA, we analyzed whole Y-chromosome sequences
(9.99
Mb of which were usable) of 13 individuals: 8 R1a and 5 R1b. We used MEGA
57 to construct a bootstrap consensus maximum likelihood tree (
Figure 5) based on 928 R1 SNPs (
Supplementary Data File 1), of which 462 were previously named.
10
To define the ancestral and derived states of SNPs corresponding to the
roots of the R1a and R1b subtrees (branches 23 and 8 in
Figure 5, respectively), we called genotypes and constructed the tree jointly with previously published hg E sequences,
4 which constituted an outgroup.
A consensus has not yet been reached on the rate at which Y-chromosome SNPs accumulate within this 9.99
Mb sequence. Recent estimates include one SNP per: ~100 years,
58 122 years,
4 151 years
5 (deep sequencing reanalysis rate), and 162 years.
59
Using a rate of one SNP per 122 years, and based on an average branch
length of 206 SNPs from the common ancestor of the 13 sequences, we
estimate the bifurcation of R1 into R1a and R1b to have occurred ~25
100 ago (95
% CI: 21
300–29
000).
Using the 8 R1a lineages, with an average length of 48 SNPs accumulated
since the common ancestor, we estimate the splintering of R1a-M417 to
have occurred rather recently, ~5800 years ago (95
%
CI: 4800–6800). The slowest mutation rate estimate would inflate these
time estimates by one-third, and the fastest would deflate them by 17
%.
With reference to
Figure 1, all fully sequenced R1a individuals share SNPs from M420 to M417. Below branch 23 in
Figure 5,
we see a split between Europeans, defined by Z282 (branch 22), and
Asians, defined by Z93 and M746 (branch 19; Z95, which was used in the
population survey, would also map to branch 19, but it falls just
outside an inclusion boundary for the sequencing data
4).
Star-like branching near the root of the Asian subtree suggests rapid
growth and dispersal. The four subhaplogroups of Z93 (branches 9-M582,
10-M560, 12-Z2125, and 17-M780, L657) constitute a multifurcation
unresolved by 10
Mb of
sequencing; it is likely that no further resolution of this part of the
tree will be possible with current technology. Similarly, the shared
European branch has just three SNPs.
We caution
against ascribing findings from a contemporary phylogenetic cluster of a
single genetic locus to a particular pre-historic demographic event,
population migration, or cultural transformation. The R1a TMRCA
estimates we report have wide confidence intervals and should be viewed
as preliminary; one must sequence tens of additional R1a samples to high
coverage to uncover additional informative substructure and to bolster
the accuracy of the branch lengths associated with the more terminal
portions of the phylogeny. Although some of the SNPs on the lineages we
have defined by single SNPs are undoubtedly rare (eg, the Z2125 sub-hg
M434,
Figure 1,
Supplementary Table 4),
it remains possible that future genotyping effort using the SNPs in
Supplementary Data File 1 may expose other substructure at substantial
frequency, commensurate with more recent episodes of population growth
and movement. In addition, high coverage sequences using multiple male
pedigrees sampled across various haplogroups in the global Y phylogeny
will be needed to more accurately estimate the Y-chromosome mutation
rate. Nonetheless, despite the limitations of our small sample of R1a
sequences, the relative shortness of the branches and their geographic
distributions are consistent with a model of recent R1a diversification
coincident with range expansions and population growth across Eurasia.