Announcement

Collapse
No announcement yet.

My Origins Results......

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • josh w.
    replied
    Originally posted by robe3b View Post
    The following has been extracted from the myOrigins paper by Razib Khan and Rui Hu:

    We assembled a large number of candidate reference populations which were relatively unadmixed and sampled widely in terms of geography. From these we removed related or outlier individuals with the Plink software, utilizing identity-by-descent (IBD) analysis and visually inspecting multi-dimensional scaling plots (MDS). Further visualization established that the reference population sets were indeed genetically distinct from each other. We also ran Admixture and MDS with specific populations to asses if any individuals were outliers or exhibited notable gene flow from other reference groups, removing these. Admixture was run on an inter and intra-continental scale to establish a plausible number of K values utilizing the cross-validation method [Alexander2011]. After removing markers which were missing in more than 5 percent of loci and those with minor allele frequencies below 1 percent, the total intersection of SNPs across the pooled data set was 290,874. The final number of individuals in was 1,353.



    I wonder how many "related or outlier individuals" were actually removed from the "large number of candidate reference populations which were relatively unadmixed and sampled widely in terms of geography".
    The unadmixed conclusion is surprising. There have been admixture studies of many of the reference group populations and they all show admixture, e.g. Diaspora Jews, Italians, Irish.

    Leave a comment:


  • robe3b
    replied
    Originally posted by josh w. View Post
    There is clearly a need for more data from MO. The admixture programs at Gedmatch show geographical area profiles for each reference group with new groups added all the time. 23&me is honest enough to post it's false positive and false negative rates.
    The following has been extracted from the myOrigins paper by Razib Khan and Rui Hu:

    We assembled a large number of candidate reference populations which were relatively unadmixed and sampled widely in terms of geography. From these we removed related or outlier individuals with the Plink software, utilizing identity-by-descent (IBD) analysis and visually inspecting multi-dimensional scaling plots (MDS). Further visualization established that the reference population sets were indeed genetically distinct from each other. We also ran Admixture and MDS with specific populations to asses if any individuals were outliers or exhibited notable gene flow from other reference groups, removing these. Admixture was run on an inter and intra-continental scale to establish a plausible number of K values utilizing the cross-validation method [Alexander2011]. After removing markers which were missing in more than 5 percent of loci and those with minor allele frequencies below 1 percent, the total intersection of SNPs across the pooled data set was 290,874. The final number of individuals in was 1,353.

    I wonder how many "related or outlier individuals" were actually removed from the "large number of candidate reference populations which were relatively unadmixed and sampled widely in terms of geography".

    Leave a comment:


  • josh w.
    replied
    Originally posted by josh w. View Post
    Yes, I also hope that the algorithms also respond to recent complaints about MO 'misses'.

    To amplify on my last post. From the White Paper authorship, I might be incorrect but it appears that the Admixture pattern is computed by the Harrapa World program and then in a second step the best fit to the admixture pattern is estimated by an Oracle like program.
    There is clearly a need for more data from MO. The admixture programs at Gedmatch show geographical area profiles for each reference group with new groups added all the time. 23&me is honest enough to post it's false positive and false negative rates.

    Leave a comment:


  • josh w.
    replied
    Originally posted by josh w. View Post
    Clarification. R Kahn has been involved with Harrapa World where his calculator is used. Since the information is private, I can only speculate.
    For example, does MO employ the calculator developed at Harrapa World.

    Leave a comment:


  • josh w.
    replied
    Originally posted by josh w. View Post
    The lead author of the White Paper also developed Harrapa World. I suspect that MO might be derived from the Harrapa World program rather than be Harrapa World per se. Of course, they had to do a new data analysis for MO but within Harrapa World there have been new data analyses with changes in geographical regions..
    Clarification. R Kahn has been involved with Harrapa World where his calculator is used. Since the information is private, I can only speculate.
    Last edited by josh w.; 15 May 2014, 04:37 PM.

    Leave a comment:


  • hfp43
    replied
    Harappa World And Native American

    FWIW, Harappa World estimates 0.63% "American" and 0.70% "Beringian" for my Family Finder raw data. The sum of those is similar to my Native American results with Eurogenes K13 and MDLP World, and a bit higher than my NA results with Dodecad World9. MyOrigins shows no NA, even though my 33% French Canadian ancestry makes some detectable degree of NA highly likely.

    Leave a comment:


  • josh w.
    replied
    Originally posted by vinnie View Post
    Do you mean to say they may literally be using Harrapa World? Is there a relationship between its developer and FTDNA?

    I find it odd that my mother has no European Coastal Plain in her MO. Harappa shows over 15% N.European and over 1% "American", which is believe is essentially N. Euro; every calculator at Gedmatch shows some N.E. for her.
    The lead author of the White Paper also developed Harrapa World. I suspect that MO might be derived from the Harrapa World program rather than be Harrapa World per se. Of course, they had to do a new data analysis for MO but within Harrapa World there have been new data analyses with changes in geographical regions..
    Last edited by josh w.; 15 May 2014, 01:18 PM.

    Leave a comment:


  • josh w.
    replied
    Originally posted by robe3b View Post
    As far as I'm concerned, my MO results are consistent (with the exception of the NA component estimate) with those of Gedmatch ancestry calculators, namely Dodecad's World9 and Eurogenes' K13. The Italian (or Sardinian) component is always present; conversely, the Middle Eastern component almost never shows up.

    Eurogenes K13 4-Ancestors Oracle

    Using 4 populations approximation:
    1 Mayan + North_Amerindian + Sardinian + Spanish_Cataluna @ 9.677
    2 French + Mayan + North_Amerindian + Sardinian @ 9.686
    3 Mayan + North_Amerindian + Sardinian + Spanish_Castilla_Y_Leon @ 9.808
    4 Mayan + North_Amerindian + Sardinian + Spanish_Valencia @ 9.867
    5 Mayan + North_Amerindian + Sardinian + Spanish_Castilla_La_Mancha @ 9.922
    6 Mayan + North_Amerindian + Portuguese + Sardinian @ 9.940
    7 Irish + Karitiana + Sardinian + Tunisian @ 9.961
    8 Mayan + North_Amerindian + Sardinian + Spanish_Aragon @ 9.976
    9 Algerian + Irish + Karitiana + Sardinian @ 9.979
    10 Mayan + North_Amerindian + Sardinian + Spanish_Extremadura @ 9.979

    World9 4-Ancestors Oracle

    Using 4 populations approximation:
    1 S_Italian + French_Basque + Pima + MEX30 @ 2.184
    2 S_Italian_Sicilian + French_Basque + Pima + MEX30 @ 2.195
    3 S_Italian + Pima + MEX30 + Pais_Vasco @ 2.234
    4 S_Italian_Sicilian + Pima + MEX30 + Pais_Vasco @ 2.239
    5 S_Italian + Colombians + French_Basque + MEX30 @ 2.258
    6 S_Italian_Sicilian + Colombians + French_Basque + MEX30 @ 2.278
    7 C_Italian + Pima + MEX30 + Aragon @ 2.295
    8 S_Italian + Colombians + MEX30 + Pais_Vasco @ 2.303
    9 S_Italian_Sicilian + Colombians + MEX30 + Pais_Vasco @ 2.316
    10 C_Italian + Colombians + MEX30 + Aragon @ 2.353

    The Italian component was missing on PF; MO seems to have corrected this discrepancy.
    My comments about admixtures referred to World or Eurogenes per se. Do they show any Middle Eastern component

    Leave a comment:


  • vinnie
    replied
    Originally posted by josh w. View Post
    To amplify on my last post. From the White Paper authorship, I might be incorrect but it appears that the Admixture pattern is computed by the Harrapa World program and then in a second step the best fit to the admixture pattern is estimated by an Oracle like program.
    Do you mean to say they may literally be using Harrapa World? Is there a relationship between its developer and FTDNA?

    I find it odd that my mother has no European Coastal Plain in her MO. Harappa shows over 15% N.European and over 1% "American", which is believe is essentially N. Euro; every calculator at Gedmatch shows some N.E. for her.

    Leave a comment:


  • josh w.
    replied
    Originally posted by John McCoy View Post
    I suspect the underlying problem, or one of them, is that MyOrigins is still using very small numbers of samples for individual "populations". That is, a small number of samples does not seem sufficient to capture and represent the gene frequencies of the entire population of, say, Italy. While the samples surely capture some of the existing variation in their target populations, it seems to me that the small sample sizes will inevitably introduce a very large component of statistical noise into the percentage figures. The changes reported here, from the Population Finder to MyOrigins, are telling us something about the magnitude of the noise component. Maybe MyOrigins is more accurate in some sense, maybe not, and the magnitude of the discrepancy between the two algorithms should not be expected to affect everybody in the same way.

    Is MyOrigins constructed in such a way that new reference samples can be added to the existing ones as they become available? An algorithm that adjusts itself as new data are added would seem to be the way forward.
    Yes, I also hope that the algorithms also respond to recent complaints about MO 'misses'.

    To amplify on my last post. From the White Paper authorship, I might be incorrect but it appears that the Admixture pattern is computed by the Harrapa World program and then in a second step the best fit to the admixture pattern is estimated by an Oracle like program.

    Leave a comment:


  • robe3b
    replied
    Originally posted by josh w. View Post
    It may be a function of when the admixture took place with earlier admixture being less apparent. Ashkenazis do not show a Northern Mediterranean component with MO even though it is quite evident in Dodecad and Eurogenes. Perhaps founder effects and genetic drift in Ashkenazis contributed to the MO oversimplification. MO and PF were not intended to give a complete picture--more like the #1 hunch from Oracle. The Oracle estimates derive from the full composites in Dodecad and Eurogenes
    As far as I'm concerned, my MO results are consistent (with the exception of the NA component estimate) with those of Gedmatch ancestry calculators, namely Dodecad's World9 and Eurogenes' K13. The Italian (or Sardinian) component is always present; conversely, the Middle Eastern component almost never shows up.

    Eurogenes K13 4-Ancestors Oracle

    Using 4 populations approximation:
    1 Mayan + North_Amerindian + Sardinian + Spanish_Cataluna @ 9.677
    2 French + Mayan + North_Amerindian + Sardinian @ 9.686
    3 Mayan + North_Amerindian + Sardinian + Spanish_Castilla_Y_Leon @ 9.808
    4 Mayan + North_Amerindian + Sardinian + Spanish_Valencia @ 9.867
    5 Mayan + North_Amerindian + Sardinian + Spanish_Castilla_La_Mancha @ 9.922
    6 Mayan + North_Amerindian + Portuguese + Sardinian @ 9.940
    7 Irish + Karitiana + Sardinian + Tunisian @ 9.961
    8 Mayan + North_Amerindian + Sardinian + Spanish_Aragon @ 9.976
    9 Algerian + Irish + Karitiana + Sardinian @ 9.979
    10 Mayan + North_Amerindian + Sardinian + Spanish_Extremadura @ 9.979

    World9 4-Ancestors Oracle

    Using 4 populations approximation:
    1 S_Italian + French_Basque + Pima + MEX30 @ 2.184
    2 S_Italian_Sicilian + French_Basque + Pima + MEX30 @ 2.195
    3 S_Italian + Pima + MEX30 + Pais_Vasco @ 2.234
    4 S_Italian_Sicilian + Pima + MEX30 + Pais_Vasco @ 2.239
    5 S_Italian + Colombians + French_Basque + MEX30 @ 2.258
    6 S_Italian_Sicilian + Colombians + French_Basque + MEX30 @ 2.278
    7 C_Italian + Pima + MEX30 + Aragon @ 2.295
    8 S_Italian + Colombians + MEX30 + Pais_Vasco @ 2.303
    9 S_Italian_Sicilian + Colombians + MEX30 + Pais_Vasco @ 2.316
    10 C_Italian + Colombians + MEX30 + Aragon @ 2.353

    The Italian component was missing on PF; MO seems to have corrected this discrepancy.

    Leave a comment:


  • John McCoy
    replied
    I suspect the underlying problem, or one of them, is that MyOrigins is still using very small numbers of samples for individual "populations". That is, a small number of samples does not seem sufficient to capture and represent the gene frequencies of the entire population of, say, Italy. While the samples surely capture some of the existing variation in their target populations, it seems to me that the small sample sizes will inevitably introduce a very large component of statistical noise into the percentage figures. The changes reported here, from the Population Finder to MyOrigins, are telling us something about the magnitude of the noise component. Maybe MyOrigins is more accurate in some sense, maybe not, and the magnitude of the discrepancy between the two algorithms should not be expected to affect everybody in the same way.

    Is MyOrigins constructed in such a way that new reference samples can be added to the existing ones as they become available? An algorithm that adjusts itself as new data are added would seem to be the way forward.

    Leave a comment:


  • josh w.
    replied
    Originally posted by josh w. View Post
    Not sure if we fully understand the changes from PF to MO. Given the various migrations from the Near East and North Africa to southern Europe (Phoenicians, Moors--migrations might have begun before written history), there should be some sign of this migration among Mediterranean Europeans. Sephardic Jews might be subsumed under Northern Mediterranean with no mention of the Near East.
    It may be a function of when the admixture took place with earlier admixture being less apparent. Ashkenazis do not show a Northern Mediterranean component with MO even though it is quite evident in Dodecad and Eurogenes. Perhaps founder effects and genetic drift in Ashkenazis contributed to the MO oversimplification. MO and PF were not intended to give a complete picture--more like the #1 hunch from Oracle. The Oracle estimates derive from the full composites in Dodecad and Eurogenes
    Last edited by josh w.; 15 May 2014, 09:59 AM.

    Leave a comment:


  • josh w.
    replied
    Originally posted by Rafael Fernandes View Post
    As I said previously, the fact that your Middle Eastern ancestry has fallen from the older program to the new one may be due to the fact that the older one's ability to measure Southern European ancestry was more limited. Population Finder was identifying many people from Southern Europe -- specially those of Italian or Greek ancestry -- as being as much as 50% Middle Eastern, for lacking samples that accurately represented their ancestry. In the case of people with Southern European blood, this may mean having inflated results in both the North European and Middle Eastern clusters because these are the ancestral components closest to the Southern European one. It was frequent to see a South Italian, for example, being guessed as 50% Western European and 50% Middle Eastern, or a Portuguese person being guessed as 80% Western European and 20% Mozabite. My results went through the same as yours, albeit less dramatically; my Middle Eastern percentage has fallen from 24% in PF to 16% in myOrigins, and I have little doubt that this is because of myOrigin's greater accuracy.
    Not sure if we fully understand the changes from PF to MO. Given the various migrations from the Near East and North Africa to southern Europe (Phoenicians, Moors--migrations might have begun before written history), there should be some sign of this migration among Mediterranean Europeans. Sephardic Jews might be subsumed under Northern Mediterranean with no mention of the Near East.

    Leave a comment:


  • robe3b
    replied
    Originally posted by Rafael Fernandes View Post
    As I said previously, the fact that your Middle Eastern ancestry has fallen from the older program to the new one may be due to the fact that the older one's ability to measure Southern European ancestry was more limited. Population Finder was identifying many people from Southern Europe -- specially those of Italian or Greek ancestry -- as being as much as 50% Middle Eastern, for lacking samples that accurately represented their ancestry. In the case of people with Southern European blood, this may mean having inflated results in both the North European and Middle Eastern clusters because these are the ancestral components closest to the Southern European one. It was frequent to see a South Italian, for example, being guessed as 50% Western European and 50% Middle Eastern, or a Portuguese person being guessed as 80% Western European and 20% Mozabite. My results went through the same as yours, albeit less dramatically; my Middle Eastern percentage has fallen from 24% in PF to 16% in myOrigins, and I have little doubt that this is because of myOrigin's greater accuracy.
    Quite so, Rafael. The 14.38% Middle Eastern ancestry estimate from Population Finder was in fact my only Italian g-grandfather contribution to my genome; he was Northern Italian, from Genoa. On the contrary, it seems PF correctly identified what it called my Spanish-Basque-French ancestry, which added up to 49.4% of my ethnic makeup. Thus, the only remaining issue to be resolved is the sharp decrease in my Native American estimate (36.4%, according to PF, 31% as stated by myOrigins). Thanks for your help.

    Leave a comment:

Working...
X