No announcement yet.

Ancestry's "Ethnicity Estimate 2020 White Paper"

  • Filter
  • Time
  • Show
Clear All
new posts

  • Ancestry's "Ethnicity Estimate 2020 White Paper"

    This may not exactly fit the definition of a scientific paper, but if not, I think it's close enough to fit under this forum rather than in another one.

    A .pdf of's "Ethnicity Estimate 2020 White Paper" is available at lIx9JePmTj5mNN5nnrvserl8oCsqw4UUKwmhoKpp0-Bk3o1Nf0 . The opening Summary section says
    The AncestryDNA® science team has developed a fast, sophisticated, and accurate method for estimating the historical origins of customers’ DNA going back several hundred to over 1,000 years. Our newest approach improves upon our previous version in the number of possible regions that a customer might be assigned (from 61 to 70) as well as an increase in accuracy to both regions assigned and the percentage assigned to each region. We have added nine new regions as well as made improvements to the composition of our reference panel and inference algorithm, resulting in more accurate estimates overall. Given the cutting-edge nature of this type of science, we will continue to refine our approach and improve estimates.

    The basic idea behind ethnicity estimation involves comparing a customer’s DNA to the DNA of people with long family histories in a particular region or group, what we call the reference panel, and looking for segments of DNA that are most similar. If, for example, a section of a customer’s DNA looks most similar to DNA in the reference panel from people from Sweden, that section of the customer’s DNA is said to be from Sweden, and so on. The end result is a portrait of a customer’s DNA made up of percentages of the 70 ethnicities contained in the reference panel.

    That is a short version of how AncestryDNA determines a customer’s ethnicity estimate. The rest of the white paper will delve more deeply into

    1. How the reference panel samples are chosen, their makeup, and how the panel is validated
    2. How the algorithm that determines a customer’s genetic ethnicity works and how it is validated
    This update is being applied to all new AncestryDNA kits now being processed, and will be phased in to existing accounts by mid-September 2020. So, for those who already have DNA test results at Ancestry, it would be a good idea to save your "DNA Origins" current results, and perhaps your "DNA Story," in order to compare with the upcoming update.

    I must admit that my eyes glazed over reading this white paper, but perhaps it will be worthwhile to review, if only to compare against FTDNA's upcoming ethnicity update to myOrigins (version 3.0). FTDNA has released their own white papers for earlier updates to myOrigins, and will no doubt do so for the 3.0 update.

    Note that in Roberta Estes' blog post, "Sneak Preview: FamilyTreeDNA's myOrigins Version 3.0," she announced that myOrigins 3.0 will use 90 populations, and compare that to the amount that Ancestry is and will be using: increasing from 61 to 70. myOrigins 3.0 will also feature Chromosome Painting, apparently as a separate tool/feature from the existing Chromosome Browser (Ancestry has no Chromosome Browser, and has made no announcement of any new tool such as Chromosome Painting). FTDNA's myOrigins algorithm will also be improved, per an answer Roberta gave in the comments to her blog post.

  • #2
    Something to watch for in these documents is how the reference panels are validated. Even with all the statistical tools available for assuring that a panel is statistically homogeneous and at the same time represents enough independent pedigrees to be potentially meaningful, there is still the problem that a homogeneous reference group could end up representing only a narrow slice of a historical population that was more diverse. In spite of the many conceptual pitfalls, there are some performance tests that can help to evaluate whether the results of the algorithm make sense, such as how often the algorithm gives children ethnic components that neither of their parents have. I would love to see some numbers!


    • #3
      I always wondered if that was how you built an algorithm! Glad to see that's so, so when one day I conduct my own population study I'll be ready to search out at least several thousand examples from all over a country ( and categorize by region ) to make sure I'm accurate! (LOLLOLLOL)