Go Back   Family Tree DNA Forums > Blogs > DIY Genetics

Experiments with genetics data and software.
Rating: 7 votes, 4.71 average.

DIY Genetics: First Experiment Results

Submit "DIY Genetics: First Experiment Results" to Digg Submit "DIY Genetics: First Experiment Results" to del.icio.us Submit "DIY Genetics: First Experiment Results" to StumbleUpon Submit "DIY Genetics: First Experiment Results" to Google
Posted 27th December 2011 at 10:27 AM by nathanm

After months of experimenting, here are the first results I'm comfortable sharing. Keep in mind much of the process is by necessity subjective: which populations and SNPs to include, how many components to analyze (K), how to group and sort the populations, and assigning names to the components. What's absolutely objective is the percent of each component for the populations. Those are calculated by the algorithm in Admixture, based on genetic similarity. The Admixture runs took between 24 minutes for K=4, up to 4 hours and 20 minutes for K=15. And that's on top of learning how to use the software, data preparation, importing the results into a spreadsheet, creating bar graphs to visualize the data, then analyzing and interpreting those results.

I reused many of the same component names as Dienekes. However, mine are in no way equivalent to the Dodecad Project components. I merely sorted the results by each component, then observed which populations that component peaked in. Usually, it's fairly easy to name the components, but sometimes it's difficult to choose a label that's all-encompassing for the seemingly disparate populations who share it.

The linked bar graphs are first grouped by geography, then sorted within those groupings by whichever component is modal. However, I kept the highly admixed populations of the US separate, including the ubiquitous Utah samples, Mexican-Americans from Los Angeles, and my parents. I'll give you a play-by-play replay of how the components split out in Admixture runs from K=4 to K=15 (the attached diagram is a visual companion to the commentary).
  • At K=4, the components are quite obviously centered on African, American, Asian, and European populations.
  • At K=5, the Asian component splits into two components, clearly peaking in populations centered in East and South Asia.
  • At K=6, the African component splits into two; I've followed Dienekes' naming convention. Paleo-African peaks in the San, !Kung, and Pygymy populations. Neo-African peaks in West Africans, such as the Bambara, Yoruba, and Dogon.
  • At K=7, a few Oceanian/Pacific Islander populations that were predominantly South or East Asian before exhibit their own component, which I've called Austronesian. Papuan was about two-thirds South Asian and one-third East Asian; Melanesian was about half each. Now they're both mostly Austronesian. Tongan was mostly East Asian; at this level it's about half East Asian and Austronesian. Samoan was also mostly East Asian, now it's about two-thirds East Asian and one-third Austronesian. A few other populations from Southeast and South Asia also have significant amounts of the Austronesian component. Specifically, the Paniya population of South India is predominantly Austronesian.
  • At K=8, the East Asian component splits further into Northeast and Southeast Asian components. The Oceanian populations are comprised of almost only Austronesian and Southeast Asian from this point on. The South Asian component begins to predominate among Central Asian populations instead of European.
  • At K=9, a Middle Eastern component breaks out from European. However, the predominant component among most North African and Middle Eastern populations is still European.
  • At K=10, an East Asian component reappears, distinct from the Northeast and Southeast Asian components.
  • At K=11, an East African component splits from the Neo-African. The East Asian component disappeared again, but a West Asian component appears. It's the predominant component among populations of the Caucasus and Central Asia. It's also the second highest component for South Asian populations. The Middle Eastern component is now predominant among North African populations. However, it's either first or second among actual Middle Eastern populations, with West Asian vying for the top spot.
  • At K=12, a strong North African component emerges from the Middle Eastern, which also shows up in some East African populations, and to a lesser degree in Middle Eastern populations. The Middle Eastern component is now predominant among actual Middle Eastern populations.
  • At K=13, a separate East Asian component again reappears, while the North African component is gone. A strong component emerges in just one population: Hadza. A few other populations have almost 10% of this component. The one thing in common between them is they all live in Tanzania today. This was the first and only instance of such a component in all these Admixture runs.
  • At K=14, the American component splits into two, somewhat graduated between North and South America, except for Greenlanders, who are strangely in the middle. The Tanzania component is gone, but a new component has split off from Paleo-African, predominant in Pygmies. It's taken over as the second highest African component, although Paleo-African is still present in most African populations in lesser quantities.
  • At K=15, a separate North African component reappears. It could alternatively be labeled as Mediterranean, since it also shows up significantly in many European populations--much higher than the North African component at K=12. The two American components are somewhat shuffled; there isn't a clear delineation between North and South America. The second American component appears to be North American, as it's still clinal from north to south, with a major exception: Greenlanders don't have this component. The other American component is no longer clinal at all, but still appears in the Greenlanders.
At the finest-grained resolution, the component tally is seven Eurasian, five African, two American, and one Oceanian. There is still only a single component that can be identified as European, which is one reason it's difficult to distinguish between different groups within Europe. Europeans are far more homogenous than people from other, similar sized areas.

One possible improvement to the charts would be to rearrange all the columns in the spreadsheet so the components are always in the same order. Then I could plot adjacent graphs that would be coherent as a whole (see these nice examples from a few academic papers). Then I could pick a color scheme at K=15 that would consolidate gracefully back to K=4. For example, if I made all the African components shades of green, the Asian components shades of blue, etc., it would probably be easier to read the charts without constantly referring back to the legend. I might try something like that in the future.
Attached Images
File Type: pdf exp-diag.pdf (19.5 KB, 1355 views)
Posted in Uncategorized
Views 11577 Comments 1 Edit Tags
Total Comments 1

Comments

  1. Old Comment

    Missing examples

    Oops! I mentioned, but forgot to link, some good examples where they effectively layout adjacent Admixture charts. Here are three examples:
    Posted 27th December 2011 at 12:57 PM by nathanm nathanm is offline
 

All times are GMT -5. The time now is 02:42 AM.


Family Tree DNA - World Headquarters

1445 North Loop West, Suite 820
Houston, Texas 77008, USA

Phone: (713) 868-1438 | Fax: (832) 201-7147
Copyright 2001-2010 Genealogy by Genetics, Ltd.
Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.