No announcement yet.

The science behind Big Y 700 mutation rates

  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by benowicz View Post

    . . .1. I manually calculated the ancestral modal haplotype. It seems to me clear that the programming logic for those calculators is pretty crude and either not appropriate generally or at least not appropriate for the specific cases that I encountered in my analyses. For example, I think there is a much better case for assuming the ancestral modal value in ambiguous cases where there is, strictly speaking, no mathematical modal, to be the median allele count rather than the lowest allele count. . . .
    Forgot to mention that a corollary of this is my opinion is that it's not reasonable to apply either the step-wise or infinite allele method universally across the board for all markers when calculating TMRCA. So I basically had to perform this manually, outside of the standardized, automated calculators I found online. From what I can tell, nobody ever claimed that multi-step mutations were more than a small minority of total mutation events--maybe less than 10%, although they do appear to occur more frequently on multi-copy markers. This is a pretty important point.


    • #17
      Originally posted by benowicz View Post
      . . . The end results of my revisions are not too different--I figure the estimated birth year of the MRCA for FGC23343 at the 50% confidence level was about 718 A.D. But I had to make a few judgments and manual adjustments . . .

      3. I think it's possible to revise downward any rate adjustments due to the likelihood of convergent mutations by analyzing more than 2 donor haplotypes. My independent attempts to calculate these adjustments and compare them to the adjusted rates used by several other popular TMRCA calculators came pretty close--typically within 3% of the rates implied by those calculators' results. I consider those differences to be insignificant, and currently think they're due to some highly dubious algorithms weighting the impact of specific loci. Anyhow, confident that I can reasonably approximate the expected impact of convergent mutations for any observed genetic distance, it follows that the probability of those convergent mutations with respect to the constructed ancestral modal haplotype decreases proportionately to the number of independent donor haplotypes examined. . .
      Upon further reflection, I think the downward adjustment reflected in that original calculation was too high. The theory is that each additional haplotype analyzed should reduce the chance that any convergent mutations would affect the reconstructed ancestral modal haplotype, and that can only happen if there are at least 4 haplotypes being analyzed, since by definition convergent mutations appear in matching pairs.

      Unfortunately, there aren't enough independent donor haplotypes available to do this at any level other than the parent level of this subclade, FGC23343. Anyhow, now examining 5 rather than 3 haplotypes, I have to revise backward the estimated birth year at the 50% confidence level for the MRCA to 636 A.D., about 2 or 3 generations earlier than my last estimate. That's just slightly closer to the estimated value under the SNP analysis using what I call the Xue rates (i.e., 876 A.D.), vs. the Adamov rates (i.e., 333 A.D.).


      • #18
        Does y-chromosome DNA only change during sperm cell production (i.e. from generation to generation)? It is my understanding the y-chromosome does not recombine seeing as women offer no y-chromosome to pair with therefore I assume the mutation must occur in the testes. In this scenario each individual sperm cell has the potential to deviate from the norm. So maybe 2% out of the millions have this new SNP (or STR value).

        Or can there be mutations within a single individual's lifetime (i.e. born E1b1a7, mutated to E1b1a7a and then pass that on to all sperm cells)? Of course in that example I'm assuming that this person is the founder of the E-U174 lineage. I'm not implying that a person would mutate from an upstream haplotype into an already existing subclade.


        • #19
          Recombination (occuring only during meiosis) and mutation (occuring at any time, and not just during meiosis or mitosis) are distinct processes. In order to be passed on to the next generation, only a Y DNA mutation that occurs in the "germ line" matters, of course. Y DNA mutations can occur in any human cell with a nucleus, but unless the mutation ends up in a sperm cell, the mutations can't be passed on.


          • #20
            Originally posted by benowicz View Post
            . . . I've also tried to cross-reference these results to other STR and SNP data sets available for descendants of other, better studied remote MRCA--the descendants of Sir John Stewart of Bonkyll, d. 1289, MRCA for bearers of the SNP S781, in particular. I don't have direct evidence about some of the relevant variables, like test resolution, but pilot analyses are encouraging. You could also argue against direct comparability of the S781 results to FGC23343 because they're on a very different estimated genetic distance--I figure about 23 generations on average for S781 vs. about 39 generations for FGC23343.
            I think the SNP and STR data for S781 supports the method I used to derive a MRCA date for FGC23343 of around 636 A.D.

            Sir John Stewart of Bonkyll definitely died in 1298, and the information I've seen about his family leads me to believe he was born sometime around 1235. This is at about the 60% confidence level for my analysis of 6 111-marker STR haplotypes, using the so-called 2017 Iain McDonald mutation rates and my method for estimating convergent mutations.

            The resolution data for S781 on the FTDNA haplotree is obviously not available at a granular level, but the typical number of private variants under S781 conforms to the expected pattern, with a handful of subclades reporting about 12, but most reporting about 8 (i.e., 8*150%=12, which would be consistent with the typical coverage rates of 10 mbp and 15 mbp for BigY-500 and -700, respectively). This translates to about 61 years per SNP on the -700 platform, and 92 years per SNP per the -500 platform.

            I don't have that granular resolution data for FGC23343, either. But based on data published on various open STR projects, I think there is good reason to believe that a large majority of them are -700. The typical private variant counts, about 21 SNPs arithmetic mean and 19 SNPs geometric mean, suggest a conservative MRCA estimate between 684 A.D. and 806 A.D., using the SNP mutation rates derived from the S781 exercise. As a reminder, my analysis of 5 111-marker STR haplotypes for FGC23343 returned a MRCA around 636 A.D. at the 50% confidence level.

            There aren't really enough currently identified FGC23343 donors to maximize precision under this convergence-adjusted STR analysis for any of its subclades. But maybe rough piggy-back estimates could be made by adding 103 years for BigY-700 kits or 155 years for the -500 kits for each SNP within the most recent shared block, based on the binomial distribution. It's very rough, given the wide variety of reported private variants, but it's somewhere to start until more data becomes available.