Showing posts with label Genetics. Show all posts
Showing posts with label Genetics. Show all posts

Friday, June 12, 2015

mtDNA of Sri Lankans

mtDNA Tree
from http://www.phylotree.org/tree/main.htm
Summary of Ranaweera et al 2014 (for complete Figures and Tables ). Note this is just the maternal mtDNA (maternal) which is 1/24th of the genetic make up. See second box in DNA primer
  • Vedda groups (mtDNA) is much different from the rest of Sri Lankans (higher frequency of haplogroup R30b/R8a1a3 in all Vedda subgroups). 
  • There is no clear genetic separation based on the PCA map between Sinhalese and Tamils, and between Up- and Low-country Sinhalese of Sri Lanka.
  • However, the closer association of the Up-country Sinhalese with the Sri Lankan
    Tamils than with the Indian Tamils is not in agreement with the geographic distances among them.
     
Pretty much the same as in India. i.e. the maternal mtDNA unites the country 
Maybe  Paternal Y-DNA (yet to be done for Sri Lanka) divides by caste, groups etc.
See Vijaya Kuveni: Paradigm for M mtDNA


Haplogroup frequency in Sri Lankan population
No. of samples (%)
Haplogroup Vedda Sinhalese
Up-country
Sinhalese
Low-country
Sri Lankan
Tamils
Indian
Tamils
Total
Haplogroup M 13 (17.33) 25 (41.67) 17 (42.5) 17 (43.59) 40 (70.18) 112 (41.33)
Haplogroup D 2 (2.67) 1 (1.67) 0 (0) 2 (5.13) 0 (0) 5 (1.85)
Haplogroup HV 0 (0) 1 (1.67) 1 (2.5) 7 (17.95) 0 (0) 9 (3.32)
Haplogroup N 0 (0) 2 (3.33) 0 (0) 0 (0) 1 (1.75) 3 (1.11)
Haplogroup R/U 0 (0) 1 (1.67) 0 (0) 0 (0) 3 (5.26) 4 (1.48)
Haplogroup R 34 (45.33) 10 (16.67) 10 (25) 3 (7.69) 5 (8.77) 62 (22.88)

Among groups
Among populations
within groups
Within populations
Model Variance P-value Variance P-value Variance P-value
Ethnic criteriab 1.72 0.039 8.61 0.001 89.66 0.001
Linguistic criteriac 2.57 0.002 8.2 0.001 89.23 0.001
Geographic criteriad 0.55 0.677 10.56 0.001 89.99 0.001
Vedda vs others 4 0.002 8.15 0.001 87.85 0.001
Up-country Sinhalese vs Low-country Sinhalese 1.19 0.814 9.82 0.001 91.37 0.001
Sri Lankan Tamils vs Indian Tamils 0.73 0.027 2.19 0.028 97.08 0.003
b) Five groups (Vedda people, Up-country Sinhalese, Low-country Sinhalese, Sri Lankan Tamils and Indian Tamils).
c) Three groups (Vedda dialect, Indo-European language and Dravidian language).
d) Seven provinces (North, North-Central, Central, Eastern, Uva, Sabaragamuwa and South).
NOTE: AMOVA is used to measure haplotype diversity.

indicated that almost 50% of the individuals from all the studied populations belonged to haplogroup M lineages (including haplogroup M, D and G)"

"Three haplogroups, M2, U2i (U2a, U2b and U2c) and R5, recognized as a package of Indian-specific mtDNA clades harboring an equally deep coalescent age of about 50000–70000 years, 30 were present in the ethnic populations of Sri Lanka


It is quite astonishing to see such a lower frequency of M haplogroup in the Vedda population when compared with southern Indian tribal groups (70–80%) as well as southern Indian caste populations (65%).

The Vedda mtDNA, Specially Vedda-Rathugala (VA-Rat) and Vedda-Pollebadda (VA-Pol) seems a genetic isolate. See Figure 5 in Ranaweera et al. 


Similar study (mtDNA) on a bigger scale in India Chandrasekar et al 2009  

--------------
sbarrkum

Friday, May 23, 2014

US did fake vaccinations to bin Laden family DNA.

One wonders why the places like Pakistan are violently resistant to having vaccinations done.  The normal arguments trotted out are that of Islamic fundamentalism.  Maybe there is a different fire to that smoke.

In 2011 the US organized a vaccination program in Abbottabad to find the DNA of Osama bin Laden's family and confirm that he was hiding in the compound.
The CIA organised a fake vaccination programme in the town where it believed Osama bin Laden was hiding in an elaborate attempt to obtain DNA from the fugitive al-Qaida leader’s family, a Guardian investigation has found.

As part of extensive preparations for the raid that killed Bin Laden in May, CIA agents recruited a senior Pakistani doctor to organise the vaccine drive in Abbottabad, even starting the “project” in a poorer part of town to make it look more authentic, according to Pakistani and US officials and local residents.

The doctor, Shakil Afridi, has since been arrested by the Inter-Services Intelligence agency (ISI) for co-operating with American intelligence agents.

http://www.zerohedge.com/news/2014-05-20/us-govt-admits-using-fake-vaccination-programs-gather-intelligence-swears-it-wont-do

http://www.theguardian.com/world/2011/jul/11/cia-fake-vaccinations-osama-bin-ladens-dna

Friday, December 27, 2013

Am I a Seyyid, a descendant of Prophet Muhammad

A recent blog at Harappadna discusses DNA of Sayyid (plural Sadah) who are considered to be descendants of Prophet Muhammad. One of the comments links to a study on Iranian Sadat (Sayyid)  population. The study found the most common haplotype values for seven marker, and I too have most of the common markers.  One small issue, paternal ancestry is Jaffna Tamil and oral history says our paternal line originated from Kalinga (orissa) in the 12th century.  The Y-DNA haplogroup is J2b2* (not a,b.c.d or e),


Marker Most Common DYS Value My Value
dys 393 12, 13 12
dys 390 23, 24 23
dys 394 14, 15 not tested
dys385a 13, 12 13
dys 385b 15, 13, 17 17
dys 392 11 11
dys 389-2 29, 30 28

As Razib Khan states
The Syed lineages don't exhibit a "Syed modal haplotype." What you should see is a Syed haplotype of ~50%, and then a range of other lineages which introgressed through people lying about their origins or women being unfaithful to their husbands. Instead there are a wide range of haplotypes. Being Syed is an honorific.
I guess there is a cautionary tale somewhere it is along the lines of not being too hung up on lineages. Also possibly if you are wedded to being of a particular lineage/race better not get your DNA tested.

Anyway you can read article and comments at link below
http://www.harappadna.org/2011/06/every-south-asian-arab-a-descendant-of-muhammad/
 

Tuesday, November 12, 2013

DNA and Skin Color

Recent study shows association rs1426654 SNP with skin pigmentation, explaining about 27% of total phenotypic* variation.  Previous studies have showed that rs1426654 SNP accounts for lighter skin in Europeans but not in East Asians.

Basically for South Asians if rs1426654 SNP is AA then light skin, if AG then medium dark and GG dark (see graph).  My rs1426654 is AG and I am medium dark. Also see this post on my DNA and heroin addiction etc.

Excerpts
We date the coalescence of the light skin associated allele at 22–28 KYA. Both our sequence and genome-wide genotype data confirm that this gene has been a target for positive selection among Europeans.


One of the key pigmentation genes in humans is SLC24A5.  That a non-synonymous variant (ref SNP ID: rs1426654) in the third exon of this gene explains 25–38% of the skin color variation between Europeans and West Africans. The ancestral (G) allele of the SNP predominates in African and East Asian populations (93–100%), whereas the derived (A) allele is almost fixed in Europe (98.7–100%)

In India here is a general trend of rs1426654-A allele frequency being higher in the Northern (0.70±0.18) and Northwestern regions (0.87±0.13), moderate in the Southern (0.55±0.22), and very low or virtually absent in Northeastern populations of the Indian subcontinent (Figure 2, Table S6). Notably, the Onge and the Great Andamanese populations of Andaman Islands also showed absence of the derived-A allele.

Loss of pigmentation in eastern and western Eurasia seems to be a case of convergent evolution (different mutations in overlapping sets of genes), the H. sapiens sapiens ancestral condition of darker skin is well conserved from Melanesia to Africa.

More at
http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003912

Via
http://blogs.discovermagazine.com/gnxp/2013/11/big-sweeps-happen/#.UoHuzxC-X1U


*Genotype vs. Phenotype:
Very important concept. Even if the genes are identical (genotype) the outward expression / looks (phenotype) could be different. Example would be identical twin, who have the same genes will have differences and fingerprints will be different. Another example would be children of short parents (and also have the sort genes) could be taller because of better nutrition.

The opposite is also true in that just because outward appearance is similar (phenotype) the genes (genotype) do not have to be similar. Example: Africans and Papua New Guineans though superficially similar are about the furthest apart genetically.

Friday, May 31, 2013

Vijaya Kuveni: Paradigm for M mtDNA in South Asia

PCA (left) and Admixture Bar (right)  plot
All Sri Lankans know the story of Vijaya and Kuveni from the Mahavamsa and Rajavali.  The basic crux of the story is that invaders, predominantly male married local women.  The gist of what DNA research is saying is that in India (and for Sri Lanka*) our mothers ancestry (mtDNA) is the same, but our paternal lines (Y-DNA) can be different.  One paper found that 70%  of India including 26 tribal populations carried the M mtDNA haplogroup.   In the Harappa DNA project 50% of the few Sri Lankan participants (8) had M mtDNA.  Two of the four self identified Tamils and two of the four self identified Sinhalese.
(* my extrapolation)

Before I get to excerpts of the research articles a few words on the PCA (Principal Component Analysis) plot on this page.  Its one of the few plots that I have seen where PCA captures the geographic distribution without a geo position data in the PCA analysis.  What I am trying to say is the V shape of India's genetic Cline is evident in the in the PCA Plot.

 From Chandrasekar et al
Macrohaplogroup M is ubiquitous in India and covers more than 70 per cent of the Indian mtDNA lineages The lineages M2, M3, M4, M5, M6, M18 and M25 are exclusive to South Asia, with M2 reported to be the oldest lineage on the Indian sub-continent.

The deep rooted lineages of macrohaplogroup ‘M’ suggest in-situ origin of these haplogroups in India. Most of these deep rooting lineages are represented by multiple ethnic/linguist groups including tribals of India
From Discover Magazine
An interesting point though is that the mtDNA, the female lineage, does not seem to diverge from other South Asians much at all. I find it intriguing that this is the same pattern we see along the major NW-SE axis of variation. It seems that mtDNA lineages unite South Asians, while the Y lineages separate them (by caste and region). The generality has many exceptions, but it points to a peculiar sex mediated admixture process from both the northwest and northeast. Men on the move have reshaped the genetics and culture of South Asia, but the mtDNA lineages still point to an ancient Eurasian group with distant but stronger affinities to the east than the west. The mtDNA are likely the purest distillation of ASI (Ancestral South Indian)
From  Witas et al in PLOSone
Ancient DNA methodology was applied to analyse freshly unearthed remains (teeth) of 4 individuals. Dated to the period between 2.5 Kyrs BC and 0.5 Kyrs AD the studied individuals carried mtDNA haplotypes corresponding to the M4b1, M49 and/or M61 haplogroups, which are believed to have arisen in the area of the Indian subcontinent during the Upper Paleolithic and are absent in people living today in Syria . 
Studied remains were excavated at two archaeological sites in the middle Euphrates valley and dated between the Early Bronze Age and the Late Roman period. The obtained data enrich the as yet modest database of Mesopotamian ancient DNA and suggest a possible genetic link of the region with the Indian subcontinent in the past leaving no traces in the modern population.


Update
This means the genes of prehistoric people are still prevalent among modern Sri Lankans

We report here the first complete mitochondrial sequences for Mesolithic hunter-gatherers from two cave sites. The mitochondrial haplogroups of pre-historic individuals were M18a and M35a. Pre-historic mitochondrial lineage M18a was found at a low prevalence among Sinhalese, Sri Lankan Tamils, and Sri Lankan Indian Tamil in the Sri Lankan population, whereas M35a lineage was observed across all Sri Lankan populations with a comparatively higher frequency among the Sinhalese.

First AASI mtDNA genomes from Sri Lanka (2500 and 5500 BC)
Please Read comments as well, knowledgeable
https://www.brownpundits.com/2022/11/30/first-aasi-mtdna-genomes-from-sri-lanka-2500-and-5500-bc/

Also see:
Sinhalese and Tamil DNA Admixture Analyis
My DNA 01: Heroin Addiction, Smoking etc
Sri Lankan Population DNA Genetics 01
Basic Primer on Population DNA Genetics
List of reference and excerpts

Thursday, April 18, 2013

Sinhalese and Tamil DNA Admixture Analyis

Updated analysis of DNA admixture of Sri Lankan participants at HarappaDNA. There are 7 Sri Lankans (3 Sinhalese, 4 Tamils). I have not included the part Sri Lankans whose immediate parents are not from Sri Lanka.
For comparison of Sri Lankan DNA with neighboring populations I have included seven other populations, TN Tamil(7), TN Tamil Brahmins(14), Kerala(10), Bengali(7), Punjabi(18) and Iranian(8). (TN=Tamil Nadu, (#) = the number of individuals). So before the results,
  *please consider getting a DNA test. Its USD 99 at 23andMe.
  *All Charts are Interactive. Clicking on them will sort the chart or table and info on data point.

Average Component Admixture for Populations

Components are based on reference population peaks. Please see National Geographic Reference Populations Overview and Regions overview for lucid description of similar analysis. Do also have a look at the complete Harappa World Admixture. This analysis is a subset of Harappa World Admixture.

South Indian Component
1) Decreases from TN Tamil (60%) > SL Tamil (58%) > Sinhalese (55.5%)
2) Bengali's, Kerala and TN Tamil Brahmins have approx the same (48%)
3) As expected Iranias have the least (3%). Europeans (not in this data) have 0%.

Baloch Component
Ranges from 40% (Punjabi's) to 29.1% (TN Tamil) for South Asian populations. Sinhalese and SL Tamils are in the mid range with approx 31%.

Caucasian Component
Iranians have 41% of this component. The Punjabis have 9.8%. Southern and East South Asians have less that 5%, with Sinhalese 3.3% and Tamil 2.1%

South East (SE) Asian Component
Bengali's have the highest percentage (4.7%) of this component reflecting proximity of borders with SE Asian populations. Sinhalese have 1.6% of SE component while, TN-Tamils and SL-Tamils both have 1.3% of SE Asian component.

Chart of Component Admixture for each Individual

The South Indian component is close to 60% for both Sinhalese and Tamils except for individual HRP0122 (49%) and HRP0232 (53%). HRP0232 is I, this blog author. The low f53% (for Sri Lankan) South Indian component is probably because of a maternal great grandmother who was probable European. Thats is reflected in the elevated NE European component of 4.7% compared other Sri Lankans with less than 2%. HRP0122 too has elevated NE European component of 4%. Maybe HRP0122  may care to comment on the elevated NE European component. HRP0122 and I we have corresponded by email and know of our identities.

Table of HarappaID's etc

Table has the Sri Lankan (Sinhalese and Tamil) and few subset individuals with their Harappa ID, Self ID'd Ethnicity and Assigned Group Population for this Analysis.
Resources:
a) Excel Spreadsheet of Data used in this analysis.
b) The complete World Admixture results at HarappaDNA.

Also See:
1) My DNA 01: Heroin Addiction, Smoking etc
2) Sri Lankan Population DNA Genetics 01
3) Basic Primer on Population DNA Genetics

Friday, February 15, 2013

East Asian Small Breasts Linked to 35,000-Year-Old DNA Mutation

Here are the excerpts from the NY Times article
The traits — thicker hair shafts, more sweat glands, characteristically identified teeth and smaller breasts — are the result of a gene mutation that occurred about 35,000 years ago, the researchers have concluded. 

The discovery explains a crucial juncture in the evolution of East Asians. But the method can also be applied to some 400 other sites on the human genome. The DNA changes at these sites, researchers believe, mark the turning points in recent human evolution as the populations on each continent diverged from one another. 
The first of those sites to be studied contains the gene known as EDAR. Africans and Europeans carry the standard version of the gene, but in most East Asians, one of the DNA units has mutated.
Mice already have EDAR, an ancient mammalian gene that plays a leading role in the embryo in shaping hair, skin and teeth. The Broad team engineered a strain of mice whose EDAR gene had the same DNA change as the East Asian version of EDAR. When the mice grew up, the researchers found they did indeed have thicker hair shafts, confirming that the changed gene was the cause of East Asians’ thicker hair. 

A series of selections on different traits thus made the variant version so common among East Asians. About 93 percent of Han Chinese carry the variant, as do about 70 percent of people in Japan and Thailand, and 60 to 90 percent of American Indians, a population descended from East Asians
Journal Reference:
Yana G. Kamberov, Sijia Wang, Jingze Tan, Pascale Gerbault, Abigail Wark, Longzhi Tan, Yajun Yang, Shilin Li, Kun Tang, Hua Chen et al. Modeling Recent Human Evolution in Mice by Expression of a Selected EDAR Variant. Cell, 14 February 2013 DOI: 10.1016/j.cell.2013.01.016

Sunday, April 22, 2012

'Eggless' chick laid by hen in Sri Lanka

from the BBC via finance blog Naked Capitalism

Only in Sri Lanka
A Sri Lanka hen has given birth to a chick without an egg, in a new twist on the age-old question of whether the chicken or the egg came first. Instead of passing out of the hen's body and being incubated outside, the egg was incubated in the hen for 21 days and then hatched inside the hen.
The chick is fully formed and healthy, although the mother has died.
The government veterinary officer in the area said he had never seen anything like it before.
PR Yapa, the chief veterinary officer of Welimada, where it took place, examined the hen's carcass.
He found that the fertilised egg had developed within the hen's reproductive system, but stayed inside the hen's body until it hatched.
A post-mortem conducted on the hen's body concluded that it died of internal wounds.
The BBC's Charles Haviland in Colombo says that the story has made headlines in Sri Lanka, with the Sri Lankan Daily Mirror's concluding: "The chicken came first; not the egg."

Friday, April 20, 2012

Eating Meat helped Humans have a shorter Breast Feeding period

New research appears to explain that eating meat was why humans breast feed less than great apes.
Below excerpt from the  Daily Mail
The research compared 67 species of mammals, including humans, apes, mice and killer whales, and found a clear correlation between eating meat and earlier weaning.
The research compared 67 species of mammals, including humans, apes, mice and killer whales, and found a clear correlation between eating meat and earlier weaning.
They found young of all species stop suckling when their brains have developed to a particular stage, but that carnivores reached this point more quickly than herbivores or omnivores. 
Eating meat enabled the breast-feeding periods and thereby the time between births to be shortened,' said Elia Psouni, lead author of the study. 'This must have had a crucial impact on human evolution.'
Among natural fertility societies, the average duration of breast-feeding is 2 years and 4 months. This is not much in relation to the maximum lifespan of our species, around 120 years.
It is even less if compared to our closest relatives: female chimpanzees suckle their young for 4 to 5 years, whereas the maximum lifespan for chimpanzees is only 60 years.
Below excerpts from the  Psouni, Elia et al abstract at PLoS

Our large brain, long life span and high fertility are key elements of human evolutionary success and are often thought to have evolved in interplay with tool use, carnivory and hunting. 
Crucially, carnivory predicted the time point of early weaning in humans with remarkable precision, yielding a prediction error of less than 5% with a sample of forty-six human natural fertility societies as reference. Hence, carnivory appears to provide both a necessary and sufficient explanation as to why humans wean so much earlier than the great apes.
While early weaning is regarded as essentially differentiating the genus Homo from the great apes, its timing seems to be determined by the same limited set of factors in humans as in mammals in general, despite some 90 million years of evolution.
Our analysis emphasizes the high degree of similarity of relative time scales in mammalian development and life history across 67 genera from 12 mammalian orders and shows that the impact of carnivory on time to weaning in humans is quantifiable, and critical.
Since early weaning yields shorter interbirth intervals and higher rates of reproduction, with profound effects on population dynamics, our findings highlight the emergence of carnivory as a process fundamentally determining human evolution.

Monday, March 26, 2012

My DNA 01: Heroin Addiction, Smoking etc

Got my DNA results from 23andMe, very quick in almost 3 weeks.
First off,  the 23andMe autosomal, Y-DNA, mtDNA and mitochondrial data can be downloaded here.

So until I get Zacks analysis of ancestral groups, here are a very few of the large amount of health and trait indicators I got with my DNA results analysis.  Please Note most of this type of research has been done on people with European ancestry, and applicability to South Asians is yet to be determined.
rsid       chromosome position  genotype       Trait
rs1799971     6       154402490    AG       Heroin Addiction
rs17822931   16        46815699    TT       Ear Wax Type
rs762551     15        72828970    AA       Caffeine Metabolism
rs1051730    15        76681394    AG       Smoking Behavior

Heroin Addiction: rs1799971  AG: Substantially higher odds
This study of 139 heroin addicts (primarily Swedes) and 170 non-addicts found that people with at least one G at rs1799971 have almost 2.9 times the odds of being a heroin addict. .
I was dead scared of the horse because of all the stuff I had read about teeth falling out etc. Good thing I am no longer a young adult and that I read all the warnings when i was a young adult.
  • Zhang H et al. (2006) . “Association between two mu-opioid receptor gene (OPRM1) haplotype blocks and drug or alcohol dependence.” Hum Mol Genet 15(6):807-19.
  • Bart G et al. (2004) . “Substantial attributable risk related to a functional mu-opioid receptor gene polymorphism in association with heroin addiction in central Sweden.” Mol Psychiatry 9(6):547-9.
Ear Wax: rs17822931  TT: Dry Ear wax
Earwax type is highly heritable. This means that this trait is controlled almost entirely by your genes-environmental factors play little or no role. Because of this, simply knowing your genotype is enough to know your earwax type.
I have dry earwax and body sweat does not smell all that much (according to others) even though I sweat profusely (overweight). The sweat does tend to smell when I eat steak over two three days (this happens only during Christmas/Year end) and when I eat Ethiopian/Indian food. I think this because of larger amounts of Fenugreek (sinhala uluhal) in Berbere and Indian curry powder.  I guess that is Environment overshadowing Genes.
Caffeine Metabolism: rs762551  AA: Substantially higher odds
The form of the SNP rs762551 a person has determines how fast CYP1A2 metabolizes caffeine. In this study, people with the slower version of the CYP1A2 enzyme who also drank at least two to three cups of coffee per day had a significantly increased risk of a non-fatal heart attack. The study found that fast metabolizers, on the other hand, may have actually reduced their heart attack risk by drinking coffee.
I drink more than 6 cups of black coffee at work (US), specially because its free. Drink the stuff even before I go to sleep. 
Smoking Behavior: rs1051730  AG: More on Average if Smoker
Genes vs. Environment: Not all smokers are created equal—some light up just a few times a day, while others go through multiple packs. There are many social and environmental factors that affect whether people start smoking, but once they do, research based on Dutch Twins has shown that genetic factors play a large part in how dependent on nicotine they'll become and how much they'll smoke.
I used to smoke almost 3 packs (20 packs) as a teenager and young adult. Then stopped for about 13 years, started again and was smoking about 30 cigarettes. The only reason it was not more was cost and restrictions in ability to smoke in many locations. Stopped again and its been more than 6 years with the main motivation for stopping being cost and that I am a little too old to be bumming cigarettes.
  • Lerman C., Berrettini W. Elucidating the role of genetic factors in smoking behavior and nicotine dependence. Am. J. Med. Genet. B Neuropsychiatr. Genet. 2003;118B:48-54.
Also see
Sinhalese and Tamil DNA Admixture Analyis
Vijaya Kuveni: Paradigm for M mtDNA in South Asia

Monday, March 19, 2012

Wet or Dry Ear wax and Armpit Odor

Excerpt from a Nature Genetics Article.
They write that earwax type and armpit odor are correlated, since populations with dry earwax, such as those of East Asia, tend to sweat less and have little or no body odor, whereas the wet earwax populations of Africa and Europe sweat more and so may have greater body odor.
They show that a SNP, 538G right arrow A (rs17822931), in the ABCC11 gene is responsible for determination of earwax type. The AA genotype corresponds to dry earwax, and GA and GG to wet type. A 27-bp deletion in ABCC11 exon 29 was also found in a few individuals of Asian ancestry.
I think the excerpt of article is self explanatory. When I have my autosomal DNA, I too will check for rsID rs17822931 and see the genotype. I already know my ear wax type. (Please see here on reading rsID from a typical autosomal file).

According to the ALFRED database, This SNP is the first example of DNA polymorphism determining a visible genetic trait.


From:  Gene Expression: Wet or Dry Ear Wax 
Original articleNature Genetics 38, 324 - 330 (2006) A SNP in the ABCC11 gene is the determinant of human earwax type Koh-ichiro Yoshiura et al
Other related articles:  The impact of natural selection on an ABCC11 SNP determining earwax type Ohashi J et al 2011  (A blog post discussing article)

Wednesday, March 14, 2012

Sri Lankan Population DNA Genetics 01

This is a first article on Sri Lankan Genetics. The data and and analysis is from HarappaDNA project run by Zack Ajmal.
First:
a) Please read the previous blog post to understand the jargon and for a simple introduction to genetics.
b) There isnt enough of Sri Lankan data. If you live in the US/Canada and specially over the age of 40 please consider getting your autosomal DNA tested. Its $99 (and I think a $9/month compulsory 12 month subscription) for the autosomal test at 23andMe. I have already had my Y-DNA and mtDNA tested at FTDNA a few years back and have just sent in the saliva sample for the autosomal test. The autosomal results should be available in about two months and I will publish the results online.
Anyway here we go. What I have presented is a a subset of the data and analysis reported by Zack on HarappaDNA in Jan 2012 . This particular data set has 220 South Asian participants. All I have done is extracted a a total of 17 individuals, 6 Sri Lankan participants, some nearby South Indian participants and one Sindhi as a North-West Indian Comparison.

The Sri Lankan contingent of 6 as follows
  • Sri Lankan (ethnicity unknown)
  • Sinhalese Govigama
  • Sri Lankan Vellala (2 individuals)
  • Sri Lankan Vellala 1/2 and Telugu (1/2)
  • Sri Lankan (1/2) and German (1/2)
Note: Charts and Tables are interactive. Click on Legend or Column headers to Sort. (In brackets number of individuals in Group)

First an explanation of the Components/Legend.
  • S. Asian = South Asian and is roughly equivalent to the Ancestral North Indian (ANI) + Ancestral South Indian (ASI) of Reich et al. (more on that later)
  • Onge= The Onge are Andaman Islanders, who have no ANI component
  • E. Asian=Represents the component of Chinese etc component.
  • SW Asian= Its a bit of misnomer, It repesents NW Asian, Such as Iranian etc.
  • The Harappa analysis is an extended analysis of the Reich et al paper, but separates the ANI and ASI into S Asian and SW Asian components.
Reich et al abstract.(David Reichs other publications)
provide strong evidence for two ancient populations, genetically divergent, that are ancestral to most Indians today. One, the ‘Ancestral North Indians' (ANI), is genetically close to Middle Easterners, Central Asians, and Europeans, whereas the other, the ‘Ancestral South Indians' (ASI), is as distinct from ANI and East Asians as they are from each other By introducing methods that can estimate ancestry without accurate ancestral populations, we show that ANI ancestry ranges from 39–71% in most Indian groups, and is higher in traditionally upper caste and Indo-European speakers. Groups with only ASI ancestry may no longer exist in mainland India. However, the indigenous Andaman Islanders are unique in being ASI-related groups without ANI ancestry. Allele frequency differences between groups in India are larger than in Europe, reflecting strong founder effects whose signatures have been maintained for thousands of years owing to endogamy.
Figure 4. A model relating the history of Indian and non-Indian groups. Modeling the Pathan, Vaish, Meghawal and Bhil as mixtures of ANI and ASI, and relating them to non-Indians by the phylogenetic tree (YRI,(CEU,ANI),(ASI, Onge))), provides an excellent fit to the data. While the model is precise about tree topology and ordering of splits, it provides no information about population size changes or the timings of events. We estimate genetic drift on each lineage in the sense of variance in allele frequencies, which we rescale to be comparable to FST (standard errors are typically ±0.001 but are not shown).

So a couple of Initial observations. I'll revisit the charts and tables again if there is sufficient interest.
  • Regardless of how the data is sorted, by Onge or S Asian or European the Sri Lankan contingent groups together (except for the 1/2 German).
  • The Sri Lankan participants have a Y-DNA haplogroup of H. To quote from the Wiki Y-DNA H haplo group "seems to represent the main Y-haplogroup of the indigenous paleolithic inhabitants of India, because it is the most frequent Y-haplogroup of tribal populations (25-35%). H-M69 presence in upper castes is quite rare (ca. 10%) . Maybe the Sri Lankans are the Ravana's.
  • The mtDNA of one of the Sri Lankan participant (Sinhalese) is W3a. The Wiki quote for W Y-DNA haplo group is "Haplogroup W appears in Europe, West and South Asia. It is everywhere found as minority clade, with the highest concentration being in Northern Pakistan . A related unnamed N* clade is found among Australian Aborigines".
  • The mtDNa for the two Sri Lankan Tamils is M36. The Wiki quote for the M mtDNA haplo group is, "There is an ongoing debate concerning geographical origins of Haplogroup M and its sibling haplogroup N. Both these lineages are thought to have been the main surviving lineages involved in the out of Africa migration (or migrations) because all indigenous lineages found outside Africa belong to either haplogroup M or haplogroup N".
I need to find out What are the Y-DNa and mtDNA of the Onge Participants.
Reich et al quote.
These genomic analyses revealed two ancestral populations. "Different Indian groups have inherited forty to eighty percent of their ancestry from a population that we call the Ancestral North Indians who are related to western Eurasians, and the rest from the Ancestral South Indians, who are not related to any group outside India," said co-author David Reich
The one exception to the finding that all Indian groups are mixed is the indigenous people of the Andaman Islands, an archipelago in the Indian Ocean with a census of only a few hundred today. The Andamanese appear to be related exclusively to the Ancestral South Indian lineage and therefore lack Ancestral North Indian ancestry.

Reich et al divergence.
4,000 gens (100,000 yrs) ago Split of West African and Eurasian ancestors
2,000 gens (50,000 yrs) ago: Split of ANI and ASI ancestors
1,700 gens (42,500 yrs) ago: Split of Asian populations (‘proto-East Asia', ASI, and Onge)
600 gens (15,000 yrs) ago: Gene flow from ‘proto-East Asia' into the ancestral population of ANI and West Eurasians, so that the proto-West Eurasian/ANI mixture proportion is mP. Most of our simulations assume mP=100% (no gene flow), but we vary this parameter to test the robustness of our procedure if the ancestors of ANI and West Eurasians were mixed.
400 gens (10,000 yrs)ago: Split of CEU and Adygei
200 gens (5,000 yrs) ago: Age of the ancient mixture event that formed the Indian Cline.

Basic Primer on Population DNA Genetics

Basically a primer meant to help understand a few blog posts on Sri Lankan Population Genetics I plan to be writing in the near future. Click here for the Latest list of  Sri Lankan Population Genetics Posts.



Genome
The genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus , in RNA . The genome includes both the genes and the non-coding sequences of the DNA/RNA. The haploid human genome (23 chromosomes ) is estimated to be about 3.2 billion base pairs long and has about 20,000–25,000 distinct genes.

A human genome is written out in two chains of abut 3 billion chemical blocks (6 billion in all) that can be thought of as letters of an alphabet;  A(Adenine), C(Cytosine), G(Guanine), and T(Thymine).

Any person’s genome is derived from 47 stretches of DNA corresponding to 46 chromosomes and the mitochondrial DNA.  (A human cell has 23 pairs of chromosomes: 23 are paternal and 23 are maternal.).
Two of these chromosomes are identified as X and Y.  A father passes down both X and Y chromosomes, but the mother passes down no Y chromosomes.  Therefore, the paternal lineage is often traced using the Y chromosome.  Mitochondrial DNA is—one 200,000th portion of a genome—passed down along the maternal line only.

Gene

A gene consists of tiny fragments of the genome typically around 1000 letters longGenes are used as templates to assemble the proteins that do most of the work in cells.In between genes is noncoding DNA, sometimes referred to as junk DNA.

The gene is the molecular unit stretches of DNA and RNA which are the heredity of a living organism. A modern working definition of a gene is " a locatable region of genomic sequence, corresponding to a unit of inheritance, which is associated with regulatory regions, transcribed regions, and or other functional sequence region . Each gene has a specific location ( locus ) on a chromosome and may come in several forms (alleles). In simpler language the DNA that gives Brown Hair, the DNA that gives Black hair are both alleles of the hair color gene (more info here). A more detailed non technical introduction to the genomes and genes, is Introduction to genetics).


Genotype vs. Phenotype:
Very important concept. Even if the genes are identical (genotype) the outward expression / looks (phenotype) could be different. Example would be identical twin, who have the same genes will have differences and fingerprints will be different. Another example would be children of short parents (and also have the sort genes) could be taller because of better nutrition.

The opposite is also true in that just because outward appearance is similar (phenotype) the genes (genotype) do not have to be similar. Example: Africans and Papua New Guineans though superficially similar are about the furthest apart genetically.
African
Papua New Guinean

What Kind of Genetic Tests: (see here for more non technical info)
Humans have 22 pairs autosomal non sex related chromosomes. The other is a pair of X and Y, XX in the case of a female, XY in the case of a male. These are the tests currently available.
  • Y-DNA; This test is only for males (X-Chromosome) and gives your direct male ancestry. i.e. the genes that were passed down from, your fathers, fathers, father etc. Currently about 67 markers are tested.
  • mt-DNA: Males and Females can be tested. This gives the genes in the mitochondrial cells that were passed from your mothers, mothers, mother etc.
  • Autosomal: Males and Females can be tested. This tests the 22 pairs of autosomal non sex related chromosomes. Currently about 0.024% of about 3 billion base pairs are tested.
Assume there is genome that is Pure Sri Lankan and Pure European. Three generations ago the maternal great grand mother (DF=Direct Female) and Paternal great grandfather (DM=Direct Male) were Pure Sri Lankan. The maternal sides daughters and paternal side sons always marry Pure European.
When the Person gets tested the Y-DNA and mt-DNA test will show that they are Pure Sri Lankan. However the autosomal test will show 25% Sri Lankan and 75% European. That because 3 generations there were 8 great grand parents (2number generations=23=8). 2 were Sri Lankan (2/8=25%) and 6 were European (6/8=75%).

Testing and Results
I have almost no clue as to the steps between sending your saliva and getting your genome data. There is DNA amplification, get more copies of the same DNA from the small sample sent. Then probably analysis thru machines like the Illumina which are like 10th generation HPLC's ( High Pressure Liquid Chromatograph ).
Depending on the kind of machine (chip), then different parts of the genome (~0.024% of 32 billion base pairs) gets tested. That means when comparison and analysis of results from different machines needs to be done, then the common tested locations need to be extracted before analysis can be done. Say for example you got your autosomal tests done at FTDNA and you are submitting the results to a research group that has mainly 23andMe results. Then the researcher will have to extract the data common to both FTDNA and 23andMe before any analysis can be done.

Anyway once the analysis is done you will get a whole lot of results, ranging from heath to ancestry. Other than that you also get approx 5mb data file in text format.
What Can be done with the Raw DNA data
a) There is SNPTips a free Firefox extension that will automatically match your genotype SNP's with others.
b) You could analyse it at http://snpinfo.niehs.nih.gov/snpfunc.htm. Use rsid from example above and then paste into SNP Function Prediction or SNP Information in DNA Sequence see results. For SNP Function Prediction you need to click some of the boxes like "Based on Genotype Data from dbSNP" say Asian. I have no clue as to what the results mean, its going to be a learning curve.
d) Do more research yourself or participate (anonymously if you wish) in many of the projects, such as HarrapaDNA (for South Asian analysis), Dodecad Ancestry Project and Eurogenes Ancestry Project.

Data and Analysis
This section focuses on general outline of data preparation and analysis of the raw autosomal data.
The 5 mb raw autosomal data (from 23andMe) will look like below.
rsid       chromosome position genotype
rs3094315  1          742429    AG
rs12562034 1          758311    AG
rs3934834  1          995669    CC
rs9442372  1          1008567   AG
rs3737728  1          1011278   AG
rsid or SNP: Typically only 0.024% (still thousands) of SNP are tested at locations (positions) known for genetic diversity (there are about 3 billion base pairs).
chromosome: The chromosome number of the 22 pairs autosomal non sex related chromosomes.
position: Position (also called locus or marker) of the place tested..
genotype: The base pair (or alleles), one from each strand in double helix. Each will be one of the four bases that make the DNA A (adenine) , G (guanine), and T (thymine).

To do analysis and comparison for genetic affinities. (See here for an in depth description of using ADMIXTURE at Razib Khan's Gene Expression and Anderson et al (2010) Data Control..(complete pdf).
Note: There is a software program ADMIXTURE and admixture the process of mixing of genes.

  1. Your raw data is combined with thousands of other genomes, some available freely from studies and others handed over by people who have got their genome test.
  2. Software like plink is used to do genome association analysis. Additionally it is used to create standard file formats that are used as inputs to other genome analysis software such as ADMIXTURE
  3. ADMIXTURE's input is binary PLINK (.bed) or ordinary PLINK (.ped and .map). The plink .ped file contains the genotype information (which SNP variants are where) and the .map file is essentially a list of the SNP names.
  4. To use ADMIXTURE, you need an idea of K, your belief of the number of ancestral populations.
  5. You can run ADMIXTURE in regular mode or supervised mode. Supervised mode is essentially anchoring ouptut to some reference populations. The reference population can either be autosomal data of real individuals or zombies. Zombies are recreated data to reference a hypothetical genetically pure indidividual or population (genomes created using the --simulate option of plink from allele frequencies) .
Important Caveats
  • Admixture analysis cannot distinguish between recent and ancient gene flow or directionality of flow
  • It is important to recognize that regions of highest haplo group frequency are not necessarily representative of origin. An obvious example is haplo group C, which displays its highest frequency in Polynesia (Kayser et al. 2000), but Polynesia is one of the last regions known to be colonized by modern humans (Sengupta et al, 2005).
  • Linguistic and Cultural (possible proxies for "Race") groups may not be the not the same as the genetic grouping.