Announcement

Collapse
No announcement yet.

Reference Populations Methodolgy

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reference Populations Methodolgy

    FTDNA's Razib Khan has tried to explain the methodology in choosing the 33 reference populations for myOrigins. I still can't figure out how an Armenian, for instance, is representative of the whole of Asia Minor, or a Pathan of the whole of Central Asia, or a Gujarati of the vast South Asian sub-continent. Could someone please explain the following geno-jargon in plain English?

    "Model-based methods of admixture inference are sensitive to the input data which is evaluated to establish population substructure. To maintain consistency, it is important to fix a reference set that can be used across runs of individual ancestry predictions. In other words, though individuals whose ancestries are being computed may vary, the reference genotypes against which they are compared to infer those ancestries will remain constant. To construct a reference set, we collected populations from multiple sources: GeneByGene DNA customer database, Human Genome Diversity Project, International HapMap Project & Estonian Biocentre.

    We assembled a large number of candidate reference populations which were relatively unadmixed and sampled widely in terms of geography. From these, we removed related or outlier individuals with the Plink software, utilizing identity-by-descent (IBD) analysis and visually inspecting multi-dimensional scaling plots (MDS). Further visualization established that the reference population sets were indeed genetically distinct from each other. We also ran Admixture and MDS with specific populations to asses if any individuals were outliers or exhibited notable gene flow from other reference groups, removing these. Admixture was run on an inter and intra-continental scale to establish a plausible number of K values utilizing the cross-validation method [Alexander2011]. After removing markers that were missing in more than 5 percent of loci and those with minor allele frequencies below 1 percent, the total intersection of SNPs across the pooled data set was 290,874. The final number of individuals in was 1,353.

    To validate our Reference Population set, we tested them against a list of well studied benchmark groups whose ancestral background in the literature has been well attested. Additionally, we also cross-checked against individuals with attested provenance within the GeneByGene DNA database."

  • #2
    Originally posted by KaiserT View Post
    I still can't figure out how an Armenian, for instance, is representative of the whole of Asia Minor, or a Pathan of the whole of Central Asia, or a Gujarati of the vast South Asian sub-continent. Could someone please explain the following geno-jargon in plain English?
    It's very simple. Those are the populations he had samples, seriously all the other words are scientific psychobabble. All the tool does is use statistical analysis to match your DNA to the closest sample populations.. nothing more nothing less.

    Doesn't sound like such a great selling point for the tool anymore does it?

    “If you can't dazzle them with brilliance, baffle them with *******.” -- attributed to W.C. Fields
    Last edited by thetick; 18 January 2016, 11:26 PM.

    Comment


    • #3
      Many of the users here agree that additional population samples, including even from isolated regions, could help refine the MyOrigins results - to a certain extent.

      The methodology is explaining that the selected populations must be reasonably "pure" enough representatives of their region. For example, an Anatolian Turk is too mixed with Siberian and Central Asian ancestry to represent Asia Minor, in comparison to an Armenian.

      In the case of South Central Asia, maybe a Burusho or some other appropriate ethnic group could be added, it's true, and some other ethnicity calculators do incorporate a larger number of populations.

      Comment


      • #4
        Thanks khazaria. So if I understand the process, it goes something like this:

        1) A large number of 'relatively unadmixed' and 'geographically widely sampled' candidate reference populations were selected.

        2) 'Related' and 'outlier' populations were removed.

        3) 'Genetically distinct' population sets were chosen.

        Voila!

        While I agree that adding more reference population sets would help, increasing the population clusters would also help in covering large swathes that are unrepresented at present. Perhaps some current clusters could be broken up into two (eg, Asia Minor into Anatolia & Caucasus, South Asia into North India and South India, Middle East into Levant and Peninsular Arabia, etc), and some new ones could be added (eg, Oceania).

        Keenly awaiting the next myOrigins update.

        Comment


        • #5
          The last step, validation, is a stumbling block for me. As genealogists, many of us are keenly aware of how extensively the populations of many regions, such as Europe, have moved around. Significant migrations dating back to Roman times are well documented, and have continued right down to the present, when large numbers of refugees are again moving through Europe. As genealogists, too, we know that many people, quite possibly a large majority, have no idea where their ancestors really came from — yet the models depend on finding relatively "pure" representatives of ethnic groups. These are facts that must temper our enthusiasm about measuring admixture.

          Therefore, we have to inquire, how good is any current admixture model at predicting ethnic ancestry? How can we measure the predictive value of these models? Is there an intrinsic limit in how well they can predict ethnic ancestry, and if so, what is that limit?

          Another way of thinking about the problem of ethnic origins is to ask whether there is any real genetic difference between, say, Germans and the French, or are they really just ordinary people who happen to speak different dialects? Yes, there are cultural differences going back many centuries, but I'm not sure we are permitted to leap to the conclusion that cultural differences imply also a different set of genes. It may well be that our expectations about ethnic distinctiveness (that those people are different from us) are not realistic. I'm not suggesting that the people of the world lack genetic diversity, only that we may be more alike, more mixed up, than we believed.

          Comment


          • #6
            Originally posted by KaiserT View Post
            FTDNA's Razib Khan has tried to explain the methodology in choosing the 33 reference populations for myOrigins. I still can't figure out how an Armenian, for instance, is representative of the whole of Asia Minor, or a Pathan of the whole of Central Asia, or a Gujarati of the vast South Asian sub-continent. Could someone please explain the following geno-jargon in plain English?

            "Model-based methods of admixture inference are sensitive to the input data which is evaluated to establish population substructure. To maintain consistency, it is important to fix a reference set that can be used across runs of individual ancestry predictions. In other words, though individuals whose ancestries are being computed may vary, the reference genotypes against which they are compared to infer those ancestries will remain constant. To construct a reference set, we collected populations from multiple sources: GeneByGene DNA customer database, Human Genome Diversity Project, International HapMap Project & Estonian Biocentre.

            We assembled a large number of candidate reference populations which were relatively unadmixed and sampled widely in terms of geography. From these, we removed related or outlier individuals with the Plink software, utilizing identity-by-descent (IBD) analysis and visually inspecting multi-dimensional scaling plots (MDS). Further visualization established that the reference population sets were indeed genetically distinct from each other. We also ran Admixture and MDS with specific populations to asses if any individuals were outliers or exhibited notable gene flow from other reference groups, removing these. Admixture was run on an inter and intra-continental scale to establish a plausible number of K values utilizing the cross-validation method [Alexander2011]. After removing markers that were missing in more than 5 percent of loci and those with minor allele frequencies below 1 percent, the total intersection of SNPs across the pooled data set was 290,874. The final number of individuals in was 1,353.

            To validate our Reference Population set, we tested them against a list of well studied benchmark groups whose ancestral background in the literature has been well attested. Additionally, we also cross-checked against individuals with attested provenance within the GeneByGene DNA database."
            For that reason, Gedmatch is essential. The admixture analyses contain a much broader range of reference groups, i.e. in the spreadsheets. Each FTDNA member has their own particular area(s) of interest. There will always be people disappointed with My Origin's choices. That is why multiple admixture programs are required.

            Comment


            • #7
              Originally posted by KaiserT View Post
              Perhaps some current clusters could be broken up into two ... South Asia into North India and South India
              These changes have been promised to arrive by March:

              Southern Europe will be split into Italy + Sardinian + Spain + Balkans.

              North India and South India will be separate categories apparently under those names. It looks like North India will be combining the existing Pashtun samples with some from the northern reaches of India proper and that is why they are renaming it from "Central Asia". There already is a "South Asia" category separate from "Central Asia".

              They will have Northwest Asia and Siberian categories distinct from Northeast Asia.

              New World will be split into North Amerindian and South Amerindian after they add at least one North American native population, apparently the Pima.

              Mbuti Pygmies will be separate from West Africa.

              Originally posted by KaiserT View Post
              some new ones could be added (eg, Oceania).
              A Papuan reference population representing Oceania will be added in March.

              Comment


              • #8
                Originally posted by khazaria View Post
                These changes have been promised to arrive by March:......
                Thank you for the update on the upcoming changes. One more question: Any idea if FTDNA would upload new myOrigins maps for those who have already tested in the past?

                Comment


                • #9
                  I find it a little ridiculous that on MyOrigins I have absolutely 0 Western and Central Europe. But according to ever other Testing company I am around 50% which is also confirmed by by genealogical research. They definately need to step their game up on their autosomal results.

                  Comment


                  • #10
                    Originally posted by daragon24 View Post
                    I find it a little ridiculous that on MyOrigins I have absolutely 0 Western and Central Europe. But according to ever other Testing company I am around 50% which is also confirmed by by genealogical research. They definately need to step their game up on their autosomal results.
                    Me too! Despite of MyOrigins showing Southern, Western and Central Europe, North Africa, Western Africa and Native American for me. I have no idea about Asia Minor (8%) for me. I never have heard about Armenian ancestry in my family and at the region of my ancestors.

                    Comment


                    • #11
                      This thread illustrates that what we hear about admixture tools is often anecdotal. Some people find the results very much at odds with what they know from the paper trail (often well researched), some people try all of the admixture tools until they find one that appears to fit what they already know, and others report that the admixture results are accurate, informative, and useful. I have no reason to doubt the validity of these individual experiences.

                      What I would really like to see is a large-scale analysis that compares the results of a single, well-developed admixture model with the expectations of people who, for example, have traced all of their ancestors back to about 1800, in order to measure the predictive value of the model. If such a study exists, I haven't heard about it. FTDNA would be in a good position to conduct such a study, but it wouldn't be easy, and it might not be in the interest of any of the vendors to discover, for example, that their proprietary admixture model generates accurate, informative, or useful results for only 50% of the participants. But without such a study, I don't see how the models can be improved, or what the inherent limitations of those models might be.

                      Comment


                      • #12
                        Originally posted by khazaria View Post
                        These changes have been promised to arrive by March:

                        Southern Europe will be split into Italy + Sardinian + Spain + Balkans.

                        North India and South India will be separate categories apparently under those names. It looks like North India will be combining the existing Pashtun samples with some from the northern reaches of India proper and that is why they are renaming it from "Central Asia". There already is a "South Asia" category separate from "Central Asia".

                        They will have Northwest Asia and Siberian categories distinct from Northeast Asia.

                        New World will be split into North Amerindian and South Amerindian after they add at least one North American native population, apparently the Pima.

                        Mbuti Pygmies will be separate from West Africa.



                        A Papuan reference population representing Oceania will be added in March.
                        will they refine their admix calculators to reduce the amount of overestimated east asia for people with known amerindian/new world ancestry.

                        Comment


                        • #13
                          Originally posted by crossover View Post
                          will they refine their admix calculators to reduce the amount of overestimated east asia for people with known amerindian/new world ancestry.
                          I'm sure that will be the result of the addition of whatever North Amerindian samples they're adding. It should then look more like what you see on AncestryDNA's admixture reports and in Eurogenes K36.

                          Comment


                          • #14
                            myOrigins Update

                            I had recently written to FTDNA regarding the forthcoming myOrigins update:

                            "I understand that in the next couple of months 'myOrigins' shall be updated, with some new population clusters likely to be added, besides other improvements. I wish to know if those who have already undergone the autosomal test would be able to get their myOrigins map and Ethnic Makeup updated automatically?"

                            Here is a prompt and reassuring reply I received the next day:

                            "If we make any additions to the myOrigins page, we will update everyone in our database regardless of when they have tested.

                            Best Regards,

                            Thomas H
                            Information Specialist
                            Family Tree DNA"

                            Comment


                            • #15
                              Originally posted by KaiserT View Post
                              I had recently written to FTDNA regarding the forthcoming myOrigins update:

                              "I understand that in the next couple of months 'myOrigins' shall be updated, with some new population clusters likely to be added, besides other improvements. I wish to know if those who have already undergone the autosomal test would be able to get their myOrigins map and Ethnic Makeup updated automatically?"

                              Here is a prompt and reassuring reply I received the next day:

                              "If we make any additions to the myOrigins page, we will update everyone in our database regardless of when they have tested.

                              Best Regards,

                              Thomas H
                              Information Specialist
                              Family Tree DNA"
                              It wouldn't have made any sense whatsoever not to re-run everyone's autosomal file through the new calculator. They don't have to retest anyone at all to do that.

                              Comment

                              Working...
                              X