Wednesday, March 14, 2012

Sri Lankan Population DNA Genetics 01

This is a first article on Sri Lankan Genetics. The data and and analysis is from HarappaDNA project run by Zack Ajmal.
a) Please read the previous blog post to understand the jargon and for a simple introduction to genetics.
b) There isnt enough of Sri Lankan data. If you live in the US/Canada and specially over the age of 40 please consider getting your autosomal DNA tested. Its $99 (and I think a $9/month compulsory 12 month subscription) for the autosomal test at 23andMe. I have already had my Y-DNA and mtDNA tested at FTDNA a few years back and have just sent in the saliva sample for the autosomal test. The autosomal results should be available in about two months and I will publish the results online.
Anyway here we go. What I have presented is a a subset of the data and analysis reported by Zack on HarappaDNA in Jan 2012 . This particular data set has 220 South Asian participants. All I have done is extracted a a total of 17 individuals, 6 Sri Lankan participants, some nearby South Indian participants and one Sindhi as a North-West Indian Comparison.

The Sri Lankan contingent of 6 as follows
  • Sri Lankan (ethnicity unknown)
  • Sinhalese Govigama
  • Sri Lankan Vellala (2 individuals)
  • Sri Lankan Vellala 1/2 and Telugu (1/2)
  • Sri Lankan (1/2) and German (1/2)
Note: Charts and Tables are interactive. Click on Legend or Column headers to Sort. (In brackets number of individuals in Group)

First an explanation of the Components/Legend.
  • S. Asian = South Asian and is roughly equivalent to the Ancestral North Indian (ANI) + Ancestral South Indian (ASI) of Reich et al. (more on that later)
  • Onge= The Onge are Andaman Islanders, who have no ANI component
  • E. Asian=Represents the component of Chinese etc component.
  • SW Asian= Its a bit of misnomer, It repesents NW Asian, Such as Iranian etc.
  • The Harappa analysis is an extended analysis of the Reich et al paper, but separates the ANI and ASI into S Asian and SW Asian components.
Reich et al abstract.(David Reichs other publications)
provide strong evidence for two ancient populations, genetically divergent, that are ancestral to most Indians today. One, the ‘Ancestral North Indians' (ANI), is genetically close to Middle Easterners, Central Asians, and Europeans, whereas the other, the ‘Ancestral South Indians' (ASI), is as distinct from ANI and East Asians as they are from each other By introducing methods that can estimate ancestry without accurate ancestral populations, we show that ANI ancestry ranges from 39–71% in most Indian groups, and is higher in traditionally upper caste and Indo-European speakers. Groups with only ASI ancestry may no longer exist in mainland India. However, the indigenous Andaman Islanders are unique in being ASI-related groups without ANI ancestry. Allele frequency differences between groups in India are larger than in Europe, reflecting strong founder effects whose signatures have been maintained for thousands of years owing to endogamy.
Figure 4. A model relating the history of Indian and non-Indian groups. Modeling the Pathan, Vaish, Meghawal and Bhil as mixtures of ANI and ASI, and relating them to non-Indians by the phylogenetic tree (YRI,(CEU,ANI),(ASI, Onge))), provides an excellent fit to the data. While the model is precise about tree topology and ordering of splits, it provides no information about population size changes or the timings of events. We estimate genetic drift on each lineage in the sense of variance in allele frequencies, which we rescale to be comparable to FST (standard errors are typically ±0.001 but are not shown).

So a couple of Initial observations. I'll revisit the charts and tables again if there is sufficient interest.
  • Regardless of how the data is sorted, by Onge or S Asian or European the Sri Lankan contingent groups together (except for the 1/2 German).
  • The Sri Lankan participants have a Y-DNA haplogroup of H. To quote from the Wiki Y-DNA H haplo group "seems to represent the main Y-haplogroup of the indigenous paleolithic inhabitants of India, because it is the most frequent Y-haplogroup of tribal populations (25-35%). H-M69 presence in upper castes is quite rare (ca. 10%) . Maybe the Sri Lankans are the Ravana's.
  • The mtDNA of one of the Sri Lankan participant (Sinhalese) is W3a. The Wiki quote for W Y-DNA haplo group is "Haplogroup W appears in Europe, West and South Asia. It is everywhere found as minority clade, with the highest concentration being in Northern Pakistan . A related unnamed N* clade is found among Australian Aborigines".
  • The mtDNa for the two Sri Lankan Tamils is M36. The Wiki quote for the M mtDNA haplo group is, "There is an ongoing debate concerning geographical origins of Haplogroup M and its sibling haplogroup N. Both these lineages are thought to have been the main surviving lineages involved in the out of Africa migration (or migrations) because all indigenous lineages found outside Africa belong to either haplogroup M or haplogroup N".
I need to find out What are the Y-DNa and mtDNA of the Onge Participants.
Reich et al quote.
These genomic analyses revealed two ancestral populations. "Different Indian groups have inherited forty to eighty percent of their ancestry from a population that we call the Ancestral North Indians who are related to western Eurasians, and the rest from the Ancestral South Indians, who are not related to any group outside India," said co-author David Reich
The one exception to the finding that all Indian groups are mixed is the indigenous people of the Andaman Islands, an archipelago in the Indian Ocean with a census of only a few hundred today. The Andamanese appear to be related exclusively to the Ancestral South Indian lineage and therefore lack Ancestral North Indian ancestry.

Reich et al divergence.
4,000 gens (100,000 yrs) ago Split of West African and Eurasian ancestors
2,000 gens (50,000 yrs) ago: Split of ANI and ASI ancestors
1,700 gens (42,500 yrs) ago: Split of Asian populations (‘proto-East Asia', ASI, and Onge)
600 gens (15,000 yrs) ago: Gene flow from ‘proto-East Asia' into the ancestral population of ANI and West Eurasians, so that the proto-West Eurasian/ANI mixture proportion is mP. Most of our simulations assume mP=100% (no gene flow), but we vary this parameter to test the robustness of our procedure if the ancestors of ANI and West Eurasians were mixed.
400 gens (10,000 yrs)ago: Split of CEU and Adygei
200 gens (5,000 yrs) ago: Age of the ancient mixture event that formed the Indian Cline.


  1. There is a new research came in 2013 looking at the mitochondrial genetics of the Srilankans.

    according to this analysis
    1. considerable number of maternal lineages of Sri Lanka is shared with India, more precisely with southern part of India
    2. the maternal genetic structuring is shaped by both ethnicity and geography.
    3. Vedda is not likely a genetic isolate and shares their lineages with their neighbours ( i.e. south India).

    1. Thanks Ken for the link. One thing that should be noted this is just the maternal mtDNA which is 1/24th of the genetic make up. See second box in DNA primer . Also Chaubey 2014 says that "The mtDNA (mitochondrial DNA) data suggest deep autochthonous diversity with minor sharing with East and West Eurasians,3 whereas, in contrast with this, the recent autosomal data showed substantial similarities of their genome with Caucasus and West Asians. However, at the current resolution, it is unclear that this sharing is extremely ancient or arisen with the arrival of new languages and farming."

      Quotes from Ranaweera et al 2014

      "indicated that almost 50% of the individuals from all the studied populations belonged to haplogroup M lineages (including haplogroup M, D and G)"

      "Three haplogroups, M2, U2i (U2a, U2b and U2c) and R5, recognized as a package of Indian-specific mtDNA clades harboring an equally deep coalescent age of about 50000–70000 years, 30 were present in the ethnic populations of Sri Lanka

      Pretty much the same as in India. i.e. the maternal mtDNA unites the country and the Paternal Y-DNA (yet to be done for Sri Lanka) divides by caste, groups etc.
      See Vijaya Kuveni: Paradigm for M mtDNA

      It is quite astonishing to see such a lower frequency of M haplogroup in the Vedda population when compared with southern Indian tribal groups (70–80%) as well as southern Indian caste populations (65%).
      I agree.

      I am surprised that the Ranaweera paper did not reference the Chandrasekar et al 2009 paper which is a similar study on a bigger scale in India

      The Vedda mtDNA, Specially Vedda-Rathugala (VA-Rat) and Vedda-Pollebadda (VA-Pol) seems a genetic isolate. See Figure 5 in Ranaweera et al. .