Announcement

Collapse
No announcement yet.

Ian McDonald's 2017 STR marker mutation rate schedule

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ian McDonald's 2017 STR marker mutation rate schedule

    I'm interested in finding an authoritative source--like an academic paper--for what is purported to be a schedule of STR loci mutation rates originally published by Ian McDonald in 2017.

    http://collins.dnagen.org/dna/haplog...naldRates.html

    I've already seen a couple of DNA project websites that list the relevant values, but I'm looking for something specifically published by McDonald himself. Strangely, none of the sources I've seen for this data quote the title of any paper or journal. On average, the rates are quite a bit higher than other iterations I've seen, with a lower standard deviation. I'm not questioning the accuracy of the data I've seen, I'd just like to have some kind of formal citation, if not a copy of the original paper.

  • #2
    Following the link you cited, I ended up with this:
    • Iain McDonald, University of Manchester, Unpublished average of rates reported by Heinila (2012), Burgarella et al. (2011) and Willems et al. (2016) [Yahoo Group: R1b1c_U106-S21/2017-09]
    I hope this helps!

    Comment


    • #3
      Thanks.

      I wanted to see the direct statement from McDonald, but if I understand the citation correctly, that may not be possible. They're saying that this was from a Yahoo Group posting in September 2017, right? If I understand correctly, the Yahoo stuff is no longer available online, so I guess I'll never get to see the original post.

      I also notice that the two lists don't agree with one another--for example, the Collins website gives DYS 435 as 0.001693333 and the Ferguson website gives it as 0.000910. That's a problem.

      The Ferguson version of the Iain McDonald figures are only only a couple hundred thousandths of a percentage points off from the Heinila rates, rather than the roughly 3/100ths of a percentage points implied by the Collins website. Seems too way too high of a difference.

      From what I can tell, Colin Ferguson adapted a whole new version of the McGee Y Utility around his rates, and very sensibly documented a detailed chronological development chronology along with it. It conveys the impression that a lot of care was taken.

      http://dna.cfsna.net/HAP/Modified-yUtility.htm

      The "I. McDonald" option returns figures consistent with the figures published on the Ferguson website, so I'm going out on a limb and say that those are more likely to be correct.

      Comment


      • #4
        Maybe another point that could be important--although I've seen these rate sets cited as "McDonald 2017" or some variant thereof, the language from that Ferguson website may indicate that really just be the averaging of figures from academic papers by several other researchers. That is, these figures are not necessarily the results of direct studies by Iain McDonald, but just a simple averaging of results from other researchers. Something anybody could have done without any special techniques, provided they had access to the several academic papers.

        Comment


        • #5
          Just happened to run across a photo from his presentation :https://www.facebook.com/SAPPtool/po...5042085682742/

          HTH

          Comment


          • #6
            Originally posted by benowicz View Post
            I'm interested in finding an authoritative source--like an academic paper--for what is purported to be a schedule of STR loci mutation rates originally published by Ian McDonald in 2017.

            http://collins.dnagen.org/dna/haplog...naldRates.html

            I've already seen a couple of DNA project websites that list the relevant values, but I'm looking for something specifically published by McDonald himself. Strangely, none of the sources I've seen for this data quote the title of any paper or journal. On average, the rates are quite a bit higher than other iterations I've seen, with a lower standard deviation. I'm not questioning the accuracy of the data I've seen, I'd just like to have some kind of formal citation, if not a copy of the original paper.
            That page has been updated with the Yahoo forum source, and with a history of corrections.
            Last edited by sb10; 27 March 2021, 03:59 PM.

            Comment


            • #7
              Originally posted by sb10 View Post

              That page has been updated with the Yahoo forum source, and with a history of corrections.
              I don't know how to explain this, but cross-referencing Dr. McDonald's confidence tables in the SAPP Facebook post of 4 November 2019, as re-posted by PNBridgema, agreed almost EXACTLY with the results of my own calculations using the original rates quoted on the Collins Y DNA site. I mean, I differed by only a couple of years, which I totally expected, since McDonald's confidence tables are clearly rounded for presentation purposes.

              I know that sounds weird, since the average of those old rates on the Collins Y DNA site was much higher than both those on the Ferguson site and the Heinila 2012 rates, but the Collins rates fit just about EXACTLY with Dr. McDonald's confidence schedule.

              I have faith in the Excel formulas I'm performing myself because they agree so closely with the results of the Ferguson tweak of the McGee Y Utility when I use the mutation rates published on their site. So although normally I'd suspect there was an error in my own Excel workbook calculations, I'm pretty confident that I've ruled that out in this case. It has to be that either McDonald's 2019 confidence schedule is calculating his adjustments for convergence in a way that is wildly different than Ferguson and/or myself, or that the rates originally published on the Collins Y DNA site were much closer to what McDonald himself was actually using. As of November 2019, anyhow.

              Sorry. I thought about mentioning this earlier, but I didn't think anyone else cared.

              Comment


              • #8
                I'll just point out one other thing that confuses me a little bit about the Ferguson utility's depiction of the confidence interval:

                "The TMRCA is expected to be in the range shown (5% to 95%)"

                http://dna.cfsna.net/HAP/Modified-yUtility.htm

                The standard definition of the 95% confidence interval describes a region whose volume covers 95% of the total distribution and is symmetrically centered at the 50% confidence level. That is, its lower bounds are at the 2.5% point (not 5% point) and its upper bounds are at the 97.5% point (not 95% point). 97.5% less 2.5%=95%. It's generally used where the risk of overstatement is considered as significant as understatement.

                https://en.wikipedia.org/wiki/Confidence_interval

                The one-sided 95% confidence level (i.e., 95% point) is often quoted as a statistic, but it's not often presented as an interval because its lower bound is obviously set at the 0% point, and in context the relevant risk is presumed to be understatement. In such a case it doesn't seem particularly useful to identify a non-zero lower bound specifically. So I'm not exactly sure what is meant by the Ferguson note, but I think it could be a typo.

                Using just one example: Assuming a 30 year generation span and an infinite allele GD of 11 at 111 markers, and what Ferguson calls the McDonald 2017 rates, the Ferguson calculator returns an upper bound of 990 years TMRCA. My Excel spreadsheet, using 30 year generations, GD of 11 and Ferguson's McDonald 2017 rate yields an upper bound of 968 years TMRCA at the 95% point, for a delta of -22 years.

                McDonald's schedule from 2019 does seem to be in increments of 30-year generations. McDonald gives an upper bound of 900 years at GD 11, which I assumed to be at the 97.5% point, and which corresponds to TMRCA of 1,044 years in my spreadsheet using Ferguson's version of McDonald's rates, for a delta of +144 years. Whereas substituting the old Collins Y DNA rates published under the title McDonald 2017 in my spreadsheet yielded 844 years TMRCA at the 95% point and 907 years at the 97.5% point, for delta of -56 years and +7 years, respectively, vs. McDonald's 2019 schedule.

                Assuming McDonald himself is using the standard definition of the 95% confidence interval, bound at the 2.5% and 97.5% points, the old rates formerly published on the Collins website return a more consistent result--delta of only +7 years compared to +144 years for the Ferguson rates. I don't know the details of how McDonald himself is calculating his adjustments for convergence, but unless there is some incredible coincidence in my spreadsheet, the rates implicit in McDonald's own confidence schedule are much closer to the so-called old Collins version.

                My delta vs. Ferguson using his rates at the 95% point of -22 years I don't think is super troubling. It is almost one generation, so I would like to get a better idea of where I differ. But I'm not going to lose any sleep over it. I got a little boost of confidence (no pun intended) in matching McDonald's own schedule so closely. Still, it's odd that the underlying rates are so much higher than the Ferguson version or Heinila 2012. Maybe there is a coincidence there that will come back to bite me later.

                I guess I'm saying that there are some significant undefined parameters in these schedules, but I think there is good reason to believe that the Collins version of McDonald's 2017 rates may be more accurate.
                Last edited by benowicz; 28 March 2021, 01:14 AM.

                Comment


                • #9
                  Originally posted by benowicz View Post
                  I'll just point out one other thing that confuses me a little bit about the Ferguson utility's depiction of the confidence interval:

                  "The TMRCA is expected to be in the range shown (5% to 95%)"
                  Okay, just to be clear, I know this could be a correctly constructed 90% confidence interval (i.e., 95% upper bound less 5% lower bound=90%). It's just that all the papers I've been reading so far consistently publish only the 95% confidence interval, including the McDonald 2019 schedule in the SAPP post mentioned by PBridgema. It seems like a well-understood convention that everyone conforms to. I'm only emphasizing this point as a qualification to my observation that what I've been calling the old Collins version of the McDonald rates seems to be a much better fit than the Ferguson version.

                  Comment

                  Working...
                  X