Announcement

Collapse
No announcement yet.

Big Y SNP mutation rate

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by JeffWexler View Post
    For the reasons discussed earlier in this thread, the actual mutation rate per SNP may vary broadly, so we're dealing with approximations, which become less reliable with smaller sample sizes and shorter periods of time to an MRCA.
    In fact, there are two separate concepts here. First, even if the actual mutation rate really is 1 SNP per 150 years, we don't expect to see exactly that figure in every sample, because mutation is a random event. Real samples of a biological process in which the mutation rate is 1 per 150 years will vary just because it is a random event, like tossing a coin. The coin toss is a good analogy, in fact, because almost every coin has 50:50 odds, but if you try the experiment of tossing the coin 10 times, you are likely to obtain different outcomes rather than exactly 5 heads and 5 tails.

    The second source of variation, however, is that mutation (or more precisely, the occurrence of a mutation that was not corrected by the various DNA repair enzymes) is a biological process that is subject in some degree to genetic control, and therefore we expect different individuals to have different SNP mutation rates, assuming we had a way to measure them. It is reasonable to suppose that the rate at which SNP's are produced may differ from one individual to another, or from one family to another.

    This situation would be like tossing coins where some of the coins happen to be irregular in some way, enough to make them favor one side or the other. It is easy to see that you would have to toss a particular coin an enormous number of times before you could be sure the underlying odds were 50:50 and not, say, 45:55. For the SNP rate, we usually don't get to try the experiment enough times to make that sort of distinction.

    Comment


    • #17
      John--

      In addition to the possibility that different individuals may have different mutation rates, it would seem that mutations are more likely in the children of older fathers.

      The paper available at http://rspb.royalsocietypublishing.o...42898.full.pdf finds some increase in the mutation rate of the germline mutation rate for older fathers (although not as great an increase as one might have expected).

      That paper discusses the frequency of STR mutations on autosomal chromosomes, but I imagine that it provides some insight into the frequency of SNP mutations on Y-DNA.

      Comment


      • #18
        Yeah, I am noticing some definite outliers. For example, I have a person that has 3 private combBED SNPs under an SNP that is about 1,800 years old. Perhaps his SNPs exist outside the area Big Y covers or in areas that are deemed unreliable? Perhaps his family line just did not mutate much over this time period? On the other side of the spectrum, you see outliers with way too many SNPs.

        In any case, especially for the small sample sizes we are working with, these outliers skew results tremendously. I would say, until sample sizes are high enough that under-mutators cancel out over-mutators, you almost need to exclude these outliers from age calculations.

        Comment


        • #19
          The same issue is presented (and exacerbated) when trying to calculate the time to an MRCA for a cluster based upon results from men belonging to a number of subclusters, some of which include more men than other men.

          Assume, for example, that: (1) a cluster is comprised of two subclusters, A and B; (2) there are six men in subcluster A and two men in subcluster B; (3) the six men in subcluster A have an average of three private SNPs, and share four SNPs up to the cluster-defining SNP; and (4) the two men in subcluster B have an average of two private SNPs, and share two SNPs up to the cluster-defining SNP.

          The men in subclusters A and B would have a combined 50 SNPs up to the cluster-defining SNP level (for an average of 6.25 SNPs).

          The men in subcluster A would have a combined 42 SNPs up to the cluster-defining SNP level (for an average of 7 SNPs), while the men in subcluster B would have a combined 8 SNPs up to the cluster-defining SNP level (for an average of 4 SNPs). Weighing clusters A and B equally, they would have an average of 5.5 SNPs to the cluster-defining SNP level.

          Thus, our calculations will be distorted not only by a small sample size (which will result in the calculations being unduly affected by a disproportionately large (or small) number of mutations on a single line), but also by a sample set that is disproportionately weighted towards a particular subcluster.

          Because a handful of Y-DNA lines have flourished while others have dwindled or disappeared altogether, and because of the numerous factors influencing the decision whether to do Big Y testing (one of which, I believe, is the fact that close matches have done or are doing Big Y testing), it's likely not possible to use a fully representative sample set (especially in a genealogical time frame).

          Comment


          • #20
            Resurrecting an old topic (I'm a relative noob so bear with me).
            So in reading through this I've come to the conclusion that for genealogical research purposes.. if I assume that each branch mutation is (mutation)*200 years (or 150), then the sum of mutations between branches times (200 or 150) gives me a reasonably safe time period to be looking for common ancestors?

            From what I read here (and can comprehend, lol) it sounds like a CA could be more recent but probably not more distant? Thanks for bearing with me and I hope this makes sense!

            Comment


            • #21
              Re "t sounds like a CA could be more recent but probably not more distant? " yes i agree at least somewhat.
              To give you one example of how this would be With Big-Y 700 the number of SNPs that differ between me and my brother is ten(10).
              So we are much closer than the calculation will estimate.
              my Confirmed Haplogroup is R-Y41600

              Here is the list of SNPs where we have differences with my brother
              M367, L362, BY23717, BY26105, BY26109, BY26110, BY26111, RS79412108, BY42594, BY227927

              It made me wonder if the test were correct and maybe some were false positives
              As a test of the possibility I paid for Yseq.org to test both M367 and L362 an they confirmed my values were not the expected ones

              John

              Comment


              • #22
                John, your brother has the same haplogroup designation as you, correct? Perhaps we are talking about different things? These 10 SNP differences between you and your brother must not be the same as the "mutation" SNPs that delineate the haplogroups otherwise the two of you would be in 2 different haplogroups. These 10 are your "Private Variants" yes? Granted, I have no understanding of exactly how these are derived, that's just my (tentative) understanding. I suppose I'd assume that not all private variants become haplogroup defining mutation variants?

                For example, looking at R-Y41600 on the Big Y Block Tree, according to the left side (mutation "blocks", afik) I see that there are 3 mutations between R-Y38430 and R-Y41600, which would theoretically mean these two haplogroups would share a CA 450 to 600 years ago or less.

                Comment


                • #23
                  Thanks to whoever for bringing this thread back to the top of the queue again. It seems like this is going to be a perennial topic.

                  I just got BigY700 results for a kit I administer, and although his branch of the phylogenetic tree is not well developed, I did notice that one sub-branch has nearly twice as many SNPs since the MRCA as any of the other six (i.e., average of 35 vs. an average of 18). I have to assume that the six are more indicative of the true TMRCA than the one outlier--although that branch by far has the largest number of participants as well.

                  I've searched Google over and over again for the consensus mutation rates, and I keep getting something in the neighborhood of 81 to 85 years, although obviously, as this experience shows, the differences among individual observations can be extreme.

                  I've been pouring over the browser details supporting some specific no-calls with my co-administrator, and the possibility has occurred to me that FTDNA has mis-allocated some SNPs that actually belong to the MRCA block within a descendant block. Maybe one or two are mis-allocated, but I highly doubt the whole 17 SNP disparity is mis-allocated.
                  Last edited by benowicz; 13 September 2020, 04:47 PM.

                  Comment


                  • #24
                    From first (genetic) principles, I see no reason to expect that all SNP's have the same mutation rate, nor even that the actual probability of mutation of a particular SNP will be the same in every individual. The simplifying assumption, that we can obtain reasonable or even realistic results by applying a derived "average" mutation rate, seens to be a good idea, but we still have to validate the results in some way by comparisons with evidence that has verifiable dates. It seems logical to try to make the age estimates as internally consistent as possible, but there is always the possibility that our estimates, or some part of them, are way off! I think it would be an endless task to rework the ages of all nodes of the Y haplotree every time the data change, so it may be that the inconsistencies we see on the current tree are just temporary anomalies waiting to be addressed. A tricky business, that gets more complicated with every Big Y result!

                    Comment


                    • #25
                      Originally posted by John McCoy View Post
                      From first (genetic) principles, I see no reason to expect that all SNP's have the same mutation rate, nor even that the actual probability of mutation of a particular SNP will be the same in every individual. The simplifying assumption, that we can obtain reasonable or even realistic results by applying a derived "average" mutation rate, seens to be a good idea, but we still have to validate the results in some way by comparisons with evidence that has verifiable dates. It seems logical to try to make the age estimates as internally consistent as possible, but there is always the possibility that our estimates, or some part of them, are way off! I think it would be an endless task to rework the ages of all nodes of the Y haplotree every time the data change, so it may be that the inconsistencies we see on the current tree are just temporary anomalies waiting to be addressed. A tricky business, that gets more complicated with every Big Y result!
                      John, I agree with everything you stated. I know in regard to my own paternal line; in which I have a confirmed paternal line back to an ancestor who was from Romerike, Norway, fought in the Battle of Hastings, 1066, and was awarded land for his service, he was a Knight who lived 1037-1093. I have been able to confirm date timeframes for many of my confirmed snps. My confirmed 13th cousin, confirmed 8th cousin, confirmed 4th cousin, confirmed 3rd cousin 1x removed; all have done snp testing, so this has enabled me to confirm ancestors who actually carried specific snps.

                      For instance, my great nephew, whose paternal great-grandfather was my birth father; also did the Big Y 700 testing; and we match all snps except for three unamed snps that I do not have, and I have one snp that he does not have. There are 3 generations between us, so this means a new snp every generation I believe.

                      Best regards, Doug

                      Comment


                      • #26
                        Yes Doug your family example is closer to the statistics used of dating SNPs

                        RE some of the questions earlier
                        John, your brother has the same haplogroup designation as you, correct?
                        Yes FTDNA gives us the same terminal group. Thus the FTDNA software that does the matching must have ti ability to ignore some number of SNP differences
                        I have thought about asking for a second analysis from someone - any recommendations on that thought.?

                        Perhaps we are talking about different things? I do not think so These are the same SNPs from the BIG-Y 700 as used by every one else.


                        These 10 are your "Private Variants" yes? I do not think that is the case - those are all named SNPs and especially M367 and :L362 have been used for years.

                        So my take on it is that my- Y-DNA is a statistically different from what others see
                        But that should throw a bit of caution to people trying to us just one set of SNP differences to make a determination of the age of MRCA

                        I had 3 SNPs tested at YSEQ in addition to the two above I had CTS6519 tested as a control that I has the right data.
                        Here is what YSEQ reported
                        ===============
                        CTS6519
                        [CTS6519]


                        hg38 Position: ChrY:14850633..14850633
                        Ancestral: T
                        Derived: C MY Value C
                        Reference: Chris Tyler-Smith (2011)
                        ISOGG Haplogroup: R1b1a1a2a1a2a7~
                        Comments: Extracted from 1000 genomes data.
                        ==================
                        M367
                        hg38 Position: ChrY:3020587..3020587
                        Ancestral: A
                        Derived: G MY Value G
                        Reference: Cinnioglu et al. 2004
                        ISOGG Haplogroup: J (Private)
                        Comments: No control in J-P58* but derived in a R-M124* WTY participant
                        ==================
                        L362
                        hg38 Position: ChrY:3020591..3020591
                        Ancestral: A
                        Derived: T MY Value C
                        Reference: Thomas Krahn (FTDNA)
                        ISOGG Haplogroup: R2a1a
                        Comments: Found in a hg R-L21/P314.2 person
                        =========================
                        I note for the above I am group R1b and the M367 SNP is usually found in group J and L362 is in R2a
                        And my value for L362 of C is not the mutation of A->T that was found before

                        So my thought is my results are "unusual"


                        RE I suppose I'd assume that not all private variants become haplogroup defining mutation variants?
                        I do not think that is the case because while any given SNP mutation is unusual they can and do happen more than once in the history of mankind.
                        and my results are an example of that.

                        John

                        Comment


                        • #27
                          Originally posted by JSW View Post
                          Yes Doug your family example is closer to the statistics used of dating SNPs

                          RE some of the questions earlier
                          John, your brother has the same haplogroup designation as you, correct?
                          Yes FTDNA gives us the same terminal group. Thus the FTDNA software that does the matching must have ti ability to ignore some number of SNP differences
                          I have thought about asking for a second analysis from someone - any recommendations on that thought.?

                          Perhaps we are talking about different things? I do not think so These are the same SNPs from the BIG-Y 700 as used by every one else.


                          These 10 are your "Private Variants" yes? I do not think that is the case - those are all named SNPs and especially M367 and :L362 have been used for years.

                          So my take on it is that my- Y-DNA is a statistically different from what others see
                          But that should throw a bit of caution to people trying to us just one set of SNP differences to make a determination of the age of MRCA

                          I had 3 SNPs tested at YSEQ in addition to the two above I had CTS6519 tested as a control that I has the right data.
                          Here is what YSEQ reported
                          ===============
                          CTS6519
                          [CTS6519]


                          hg38 Position: ChrY:14850633..14850633
                          Ancestral: T
                          Derived: C MY Value C
                          Reference: Chris Tyler-Smith (2011)
                          ISOGG Haplogroup: R1b1a1a2a1a2a7~
                          Comments: Extracted from 1000 genomes data.
                          ==================
                          M367
                          hg38 Position: ChrY:3020587..3020587
                          Ancestral: A
                          Derived: G MY Value G
                          Reference: Cinnioglu et al. 2004
                          ISOGG Haplogroup: J (Private)
                          Comments: No control in J-P58* but derived in a R-M124* WTY participant
                          ==================
                          L362
                          hg38 Position: ChrY:3020591..3020591
                          Ancestral: A
                          Derived: T MY Value C
                          Reference: Thomas Krahn (FTDNA)
                          ISOGG Haplogroup: R2a1a
                          Comments: Found in a hg R-L21/P314.2 person
                          =========================
                          I note for the above I am group R1b and the M367 SNP is usually found in group J and L362 is in R2a
                          And my value for L362 of C is not the mutation of A->T that was found before

                          So my thought is my results are "unusual"


                          RE I suppose I'd assume that not all private variants become haplogroup defining mutation variants?
                          I do not think that is the case because while any given SNP mutation is unusual they can and do happen more than once in the history of mankind.
                          and my results are an example of that.

                          John
                          Yes, I agree with your yseq results, Thomas & Astrid Krahn are respected geneticists; and I have had much success testing paternal relatives; in order to confirm timeframes & specific ancestors who carried specific snps. Best regards, Doug

                          Comment


                          • #28
                            John,

                            Actually, my great nephew, Edward has two not three named snps that I do not have, and I have one unnamed snp that he does not have. I asked Edward to look at his Big Y700 matches; to see if he has any closer matches then me to him; because these two named BY snps must have mutated with his paternal grandfather or birth father, his paternal great-grandfather was my birth father. I could be wrong about these two new BY snps; maybe there is a different explanation for them?

                            Best regards, Douglas W. Fisher(Wells)

                            Comment

                            Working...
                            X