Announcement

Collapse
No announcement yet.

SNPs vs. groups of SNPs

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SNPs vs. groups of SNPs

    I've spent some more time going over the SNP aging methodology for estimates published at YFull. I think I had a major misunderstanding earlier.

    Under their terminology, the "formed" date refers to the estimated age of the most immediate ancestor on the current version of the phylogenic tree.

    "TMRCA" is just that, based on an estimated average mutation rate of once every 144.41 years for each base pair. I originally thought it had to do with the expansion date estimated from comparing STR haplotypes.

    But more than that, this age is based on the total number of mutations observed in a coherent group of SNPs--not just a single base pair. For the sake of convenience, this group is commonly referred to by the name of just one of these SNPs.

    These SNP groups are not necessarily located particularly near one another, nor is there a set, standard number of SNPs required to make a coherent set. They're just identified by network analysis over a large group of donors.


    So my questions are:

    # 1. Is this new understanding fundamentally correct? Any important additional information you'd care to share?

    # 2. When customers order an SNP to be tested from FTDNA's tree menu, are they testing all the individual SNPs in the block, or just the specific SNP from which the group referenced at YFull takes its name?

  • #2
    Originally posted by benowicz View Post
    . . ."TMRCA" is just that, based on an estimated average mutation rate of once every 144.41 years for each base pair. . . .

    Kind of a typo there, or at least very poor wording. To clarify, the entire Y chromosome, consisting of 8,467,165 base pairs, will experience, on average, one mutation every 144.41 years. That seems to be why YFull estimates the age of an SNP by multiplying 144.41 by the number of individual mutations within that network node. Or at least that's how I interpreted this example.

    https://www.yfull.com/tree/R-Y9087/


    So this also implies that a stable SNP, defined by good network analysis with an adequate sample size, should almost never experience a back mutation. Odds are like one in 1.2 or 1.3 billion.

    That may answer my second question. With odds like these, for a la carte SNP tests, there would be no practical reason to test more than one SNP within the group.

    Comment


    • #3
      For most SNP's, I think the prevailing theory is that "equivalent" SNP's (at the same node on the current haplotree) arose as independent mutations (and I don't immediately see how that idea could be disproved, since we weren't there to see the mutations happening).

      However, we see them today as "equivalent" only because nobody has yet been tested who has one mutation in the "equivalent" group without the others. When, eventually, someone turns up who has one or more mutations within the "equivalent" group but not all of them, then the group will have to be split, and the corresponding node in the haplotree will have to be split. The haplotree remains a work in progress, both for the terminal SNP's and for the upstream nodes.

      There are many assumptions built into the "age" estimates, and the assumptions are not easily tested. Ancient, or at least medieval DNA may eventually help to establish some independent date ranges that will help calibrate the current age estimates that are based on the notion of a mutational clock that always, magically, runs at the same rate.

      Comment


      • #4
        Originally posted by John McCoy View Post
        For most SNP's, I think the prevailing theory is that "equivalent" SNP's (at the same node on the current haplotree) arose as independent mutations (and I don't immediately see how that idea could be disproved, since we weren't there to see the mutations happening).

        However, we see them today as "equivalent" only because nobody has yet been tested who has one mutation in the "equivalent" group without the others. When, eventually, someone turns up who has one or more mutations within the "equivalent" group but not all of them, then the group will have to be split, and the corresponding node in the haplotree will have to be split. The haplotree remains a work in progress, both for the terminal SNP's and for the upstream nodes.

        There are many assumptions built into the "age" estimates, and the assumptions are not easily tested. Ancient, or at least medieval DNA may eventually help to establish some independent date ranges that will help calibrate the current age estimates that are based on the notion of a mutational clock that always, magically, runs at the same rate.
        I was having high hopes that testing of historic DNA would offer significant benefits, and then I read an explanation that for that we need to know whether we are testing our ancestors = testing a population that made input to the current DNA. Testing siblings or very close cousins of our ancestors might be good enough. And once we move past genealogical timeframe, that is something we are unlikely to know. When moving further back in time that would be impossible to know.

        That is exactly one of scenarios with equivalent SNPs. The branches we do not see might be extinct.


        Mr. W.

        P.S.
        I know that ancient DNA, and not historic DNA, is the most used term regardless of the age.
        Last edited by dna; 10th May 2018, 04:43 PM.

        Comment


        • #5
          Originally posted by benowicz View Post
          Kind of a typo there, or at least very poor wording. To clarify, the entire Y chromosome, consisting of 8,467,165 base pairs, will experience, on average, one mutation every 144.41 years. That seems to be why YFull estimates the age of an SNP by multiplying 144.41 by the number of individual mutations within that network node. Or at least that's how I interpreted this example.

          https://www.yfull.com/tree/R-Y9087/
          The quoted mutation rate is ONLY for the region that YFull does their analysis off of. They work off of a subset of the y to have consistent results from current sequencing technologies. Look at the math described in some of the R-U106 age analysis efforts under http://www.jb.man.ac.uk/~mcdonald/genetics/build37.html to get a parallel understanding for how ages are calculated.

          Comment

          Working...
          X