Announcement

Collapse
No announcement yet.

YFull definition of "formed" & TMRCA

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • YFull definition of "formed" & TMRCA

    I need help to make sure I understand YFull’s definitions of “formed” age and TMRCA.

    The definition as per their YFull FAQ is: Subclade "formed" age: The TMRCA (time to most recent common ancestor) of a subclade is used as the "formed" age of each branch of the subclade. Stated otherwise, the formed age of a branch is the same as the TMRCA of the "parent" subclade of that branch.

    My interest is in the 7 sub-clade of Hg N-L550 phylogenetic tree, especially the N-FGC14542 sub-clade. From the YFull N-L550 tree, v5.08, the N-L550 TMRCA = 2900 ybp, and all the 7 sub-clades have “formed”= 2900 ybp, as per the above definition. The TMRCA of the sub-clades are, in date order: 2900, 2800, 2800, 2700, 2700, 2400 and 1450 ybp (my FGC14542 being 2800 ybp).

    I cannot believe, and it must be statistically impossible, that all 7 sub-branches were formed, i.e. N-L550 mutated 7 times, 2900 ybp, so I must be wrong in assuming “formed” = mutated. How do I interpret "formed"? And how do I get the age?

    My real quest is to find when the N-FGC14542 mutation happened - what, if anything, can I conclude on this from the YFull data? Are there any other sources of data that can provide this information?

  • #2
    The formed and TMRCA dates are just estimates and they are based on an average mutation of one SNP every 144.41 years and an assumed age of 60 years for living providers of YFull samples. On average one male generation is 32.5 years which is about 4.44 generations per mutation. So if a person born 3044 years ago that had the N-L550 mutation had 4 male children, 16 male grandchildren, 64 male great-grandchildren, and 256 great-great-grandchildren then it isn't unreasonable that there were 7 separate SNP mutations by 2,900 years ago within those 4.44 generations and that they all had descendants that are alive now that have also had BigY DNA testing.
    Last edited by Armando; 20th November 2017, 08:32 AM.

    Comment


    • #3
      Well argued Armando.

      I knew the ages were estimates, but I had not followed through the way you did. However, I still believe the estimates, while theoretically possible, are wrong. Average mutation rates are perfectly applicable when used over a large timescale and many clades, but down at a small sub-clade, over a relatively small timescale, the "average mutation rate" methodology is not applicable.

      Comment


      • #4
        The key question here, I think, is the range of variation of mutation rates and the accumulation of mutations observed in a population of Y chromosomes -- and I don't think there is enough data to say very much about that. In order to validate the TMRCA estimates, we would need a large group of "ancient" samples of known ages from which to assess the antiquity of the SNP's that define major branches of the Y chromosome haplotree. The idea of a mutational clock, such that mutations inevitably accumulate at a constant rate, is an attractive one, and possibly the only available basis for estimating when genetic lineages diverged where the dates cannot be measured directly from, say, the fossil record, but it is only a hypothesis, based on untested assumptions that seem way too good to be true.

        Comment


        • #5
          Originally posted by John McCoy View Post
          The key question here, I think, is the range of variation of mutation rates and the accumulation of mutations observed in a population of Y chromosomes -- and I don't think there is enough data to say very much about that. In order to validate the TMRCA estimates, we would need a large group of "ancient" samples of known ages from which to assess the antiquity of the SNP's that define major branches of the Y chromosome haplotree. The idea of a mutational clock, such that mutations inevitably accumulate at a constant rate, is an attractive one, and possibly the only available basis for estimating when genetic lineages diverged where the dates cannot be measured directly from, say, the fossil record, but it is only a hypothesis, based on untested assumptions that seem way too good to be true.
          Yes, ancient DNA from populations known to have living descendants would need to be analyzed. And preferably tracked over millennia.

          Mr. W

          P.S.
          myOrigins would also benefit from the ancient DNA analysis.

          Comment


          • #6
            What do you all think about this suggestion?

            I don't have access to the much larger FTDNA database of results for Hg N-L550 sub-clades, but I counted all the YFull sub-clade results posted on their Hg N-L550 tree and found a wide range of results: 3, 5, 8, 25, 97, 9, and 4. 97 is the N-L1025 sub-clade.

            Can we propose that this could be a "proxy" for the formation/mutation of the sub-clade, i.e. N-L1025 being the oldest, by a wide margin? We cannot get an actual date, but at least relative dates for the sub-clades formation.

            Comment


            • #7
              Originally posted by John McCoy View Post
              ... we would need a large group of "ancient" samples of known ages from which to assess the antiquity of the SNP's that define major branches of the Y chromosome haplotree. The idea of a mutational clock, such that mutations inevitably....
              Maybe we have that in some years from now, this project will run over 5 years. 1,000 prehistoric individuals to be genetically mapped

              Comment


              • #8
                Originally posted by Svein Davidsen View Post
                Well argued Armando.

                I knew the ages were estimates, but I had not followed through the way you did. However, I still believe the estimates, while theoretically possible, are wrong. Average mutation rates are perfectly applicable when used over a large timescale and many clades, but down at a small sub-clade, over a relatively small timescale, the "average mutation rate" methodology is not applicable.
                It's not the clades that matter. It's the number of mutations each sample that matter. Of course, anything that is variable can have a much wider margin of error in smaller groups when applying an average calculated from a large dataset as a constant in formula but that is implied with estimates and averages and should be understood without having to be said. However, the dataset isn't so small for N-L550 but it is for some of the downstream subclades. To see the number of mutations per subclade go to an SNP such as L550 then click on info next to the TMRCA or you can open it in a new page or new tab which takes you to https://www.yfull.com/branch-info/N-L550/.You will see a table of all kits that are downstream from L550 and next to each sample id the number of reliable SNPs is next to the sample id. Those are the number of SNPs each kit has downstream from the SNP you choose to look at the info on. We can see that the number of SNPs varies for two reasons which are the variability of test results and the variability of the number of mutations each lineage has. There is anywhere between 17 and 27 SNPs. YFull corrects the number of SNPs probably based on assumed positives that didn't appear in the test result of the sample. Then the averaged mutation rate is multiplied against each sample to get the average age of the common SNP, or group of SNPs in other cases, and then they add up the samples ages and divide them by the number of samples. The formula is shortened since they had already averaged the subclades. The formula is then (3381+3344+2683+3041+2373+2732+3106)/7

                In a separate academic study in 2009 by Yali Xue and Chris Tyler-Smith the average number of SNPs was also found to be about 1 every 4 generations with NGS testing similar to BigY in people with well documented genealogies. Two Y chromosomes from a deep-rooting pedigree were genotyped and resequenced. They showed zero Y-STR differences after typing 67 Y-STRs, but four base substitutions after comparing ~10 Mb DNA sequence. See https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3312576/ That mutation rate is what is found with YFull's mutation rate which is based off the study by Adamov et al. 2015.

                Poznik et al. 2016 used the mutation rate from Balanovsky et al. 2015 and the ages they have for the subclades that they calculated are somewhat similar to those of YFull. You can download the supplementary file of you want to see if a subclade of N is calculated in there.

                Dr. Iain McDonald has been calculating P312 subclade ages and they are at http://www.jb.man.ac.uk/~mcdonald/ge...312/table.html You can compare them with YFull to see how they differ.

                The estimated age of a subclade that is several thousands of years old isn't going to be miscalculated to be thousands of years older or younger than it actually is. It will be off by a few hundred and that is as close as we can get with current testing and participation rates.

                Originally posted by Svein Davidsen View Post
                What do you all think about this suggestion?

                I don't have access to the much larger FTDNA database of results for Hg N-L550 sub-clades, but I counted all the YFull sub-clade results posted on their Hg N-L550 tree and found a wide range of results: 3, 5, 8, 25, 97, 9, and 4. 97 is the N-L1025 sub-clade.

                Can we propose that this could be a "proxy" for the formation/mutation of the sub-clade, i.e. N-L1025 being the oldest, by a wide margin? We cannot get an actual date, but at least relative dates for the sub-clades formation.
                The subclades are based on how branches show based on the participation rate which is also based on survivors. There is no way to tell if N-L1025 just happened to have more survivors or coincidentally has a much higher participation rate or is actually older. I have great-uncles that had a lot of male children and I have great-uncles that had very few male children. If all of the male children of my great-uncles have the same number of male children and that continues for 16 generations then over time the lineages from my great-uncles with more male children will have more male descendants and therefore there will be more branches since on average they will all have about 1 new SNP mutation per 4 generations. If you apply that scenario to N-L550 but with one son having a lot of descendants and the other sons having very few then there will be a lot more participants, if there is an equal participation rate, for the son with a lot of descendants. If the the mutation rate were a true constant, and not an average, all of the descendants of N-L550 would have the exact same number of mutations unless it's an extremely unlikely case of a lineage of only the youngest child of every single generation meaning there would be more time between the marriage of Mr. N-L550 and that descendant. Since N-L550 is about 88 generations between now and Mr. N-L550 the average number of years per generation for N-L550 should be close to that of other lineages. Using the number of participants in a branch is even less scientific than calculating an average rate of SNP mutations in the Y-chromosome since there are even more variables that can't be measured by using the number of participants in branches. We don't know how many children each generation Mr. N-L550 had and we don't know the participation rate of all of the branches of all of his descendants.

                Comment


                • #9
                  Originally posted by John McCoy View Post
                  The key question here, I think, is the range of variation of mutation rates and the accumulation of mutations observed in a population of Y chromosomes -- and I don't think there is enough data to say very much about that. In order to validate the TMRCA estimates, we would need a large group of "ancient" samples of known ages from which to assess the antiquity of the SNP's that define major branches of the Y chromosome haplotree. The idea of a mutational clock, such that mutations inevitably accumulate at a constant rate, is an attractive one, and possibly the only available basis for estimating when genetic lineages diverged where the dates cannot be measured directly from, say, the fossil record, but it is only a hypothesis, based on untested assumptions that seem way too good to be true.
                  No is saying that the mutation rate is an exact constant. The estimates are just that because the mutation rate is an average and not a constant. The number of SNPs each person has since any of the subclades appeared in the fossil record will always vary. So even if we have thousands of well sequenced fossil records that have tight C14 dating the descendant SNP variability will always cause the mutation rate to be questioned. Another problem with a lot of fossils is that it is very common that they can't be fully sequenced so we can only get the lower bound date of the appearance of specific SNPs and not necessarily the exact age. On top of that the C14 dating is also variable so the range of dates that the specimen could have lived in can be larger than what a lot of people desire.
                  Last edited by Armando; 21st November 2017, 08:09 PM.

                  Comment


                  • #10
                    Thanks to everyone, specially Armando, for comprehensive and enlightening inputs.

                    I now need to follow-up on all the references to see if I can come any nearer to my aim of locating "Where and When" for the birth of N-FGC14542! Ancient DNA where are you?!

                    Comment


                    • #11
                      It might be useful to remember that even when we have ancient DNA samples to work with, they only place an upper bound on the "birth dates" of SNP's. For any SNP, the earliest date we find only tells us that the mutation happened earlier than that date, but not by how much.

                      Comment

                      Working...
                      X