Announcement

Collapse
No announcement yet.

Evaluating a Big Y match

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by wkauffman View Post
    The bottom line is that one should not consider what FTDNA supplies in terms of Big-Y matches and information. The only viable information from the match list is whether there might be a new result in your general haplogroup region. FTDNA is comparing Big-Y raw results against about 40% of the known Y-SNPs. FTDNA does not have an up-to-date internal tree to correctly identify and place called SNPs where they belong. This is where specific haplogroup analysis efforts and/or FGC/YFULL analysis provides the real answer. The U106 project admins visited FTDNA last fall to specifically show FTDNA IT how our comparison was done to remove the upstream SNPs and to properly identify the inconsistent ones that are provided as part of the "novel" results. They got the picture that they had missed the boat on a number of items related to properly analyzing and comparing Big-Y files. But so far no change has occurred in what they are providing to us the customer.
    Why has the U106 project not updated their SNP tree from all of the Big Y results that they have? I don't see how the Big Y can be a success at the current price. I don't see many success stories at all from the Big-Y at present.

    Comment


    • #32
      Originally posted by 1798 View Post
      Why has the U106 project not updated their SNP tree from all of the Big Y results that they have?
      It seems that you haven't checked the project results pages for a long time - https://www.familytreedna.com/public...ction=yresults. We've incorporated new subclades, which are shared by two or more members, found in their Big Y results, as the members provide us with their bed and vcf files from Big Y. Those files are the raw data which are analyzed by the spreadsheet one of the project members developed to weed out false "novel variants" that FTDNA doesn't weed out. This is what wkauffman wrote about in his post, which you responded to.

      Originally posted by 1798 View Post
      I don't see how the Big Y can be a success at the current price. I don't see many success stories at all from the Big-Y at present.
      Read my post recently in another thread at http://forums.familytreedna.com/show...2&postcount=12. I wrote there what I regard as a success story: "I used FGC for analysis of my Big Y BAM file. I was given a list, based on their analysis, of my best quality novel variants, which they named - FGC13480-FGC13492. I then had 12 of the 13 made testable at YSEQ, which one of my semi-close matches at FTDNA tested. (We're an 83/111 match. My estimate of when our common ancestor lived is 1,200-1,500 years ago.) He was found to be FGC13492+, forming a new subclade of R-CTS2509." That new subclade is reflected in the project results page and our U106 haplotree.

      Comment


      • #33
        Originally posted by T E Peterman View Post
        What counts is unshared novel variants. A lot of the 78 are shared with everyone else in your haplogroup. To be clear, novel doesn't mean unique, it just means new, as in just discovered.
        Timothy Peterman
        I agree with Timothy. Luckily, FTDNA gives you the list of Novel Variants that you have and your match doesn't and vice versa. Unluckily, they don't tell you why they are unmatched.

        As an example, one of my matches and I share a surname, although our MRCA is probably some time before 1650. We share 194 novel variants. I have 39 that he doesn't have and he has 28 that I don't have. I have spent many hours reconciling all of these. In a large number of cases, one or the other of us have been found to have "no-calls" for the variant in question.

        If you only look at the match reports or the exported .csv files for the two of us, you cannot determine whether it is a true mismatch or a no-call. For that, you need to look at the .bed or .vcf file. Unfortunately, you only get to see that if the other party is willing to send you a copy. Even a project admin cannot download the raw data for anybody but himself.

        After taking into account the no-calls, the novel variants that occur higher up the haplotree, and the otherwise flakey results, I have cut this down to five variants that I have and he doesn't and two that he has and I don't. That averages out to about 3.5. If you apply the 135-years value to that, it gives an TMRCA of 465 years, or about 1550. Of course with only two samples, the error-bands are quite large.

        Both my match and I have tested to 111 STR markers. The FTDNA TiP report indicates about a 70% probability of the common ancestor being within the last 14 generations. If you assume a generation as 33 years, that would be about 462 years ago. That puts things in the same ballpark.

        Comment


        • #34
          By the way, I have added the novel variants data for all 37 of my I-L813 to a spreadsheet for analysis. There are about 925 novel variants among the 38 of us. Of these, about 100 appear to be useful for defining subclades. A total of 375 are currently unique to individuals, for an average of about 10 unique variants per man. An additional 75 appear to be associated with haplogroups above I-L813 in the haplotree. This leaves about 375 of the 925 novel variants in the category I call flakey or inconsistent. This means that about 40% of the novel variants are what I call weeds, that need to be removed from consideration.

          Comment


          • #35
            Originally posted by MMaddi View Post
            It seems that you haven't checked the project results pages for a long time - https://www.familytreedna.com/public...ction=yresults. We've incorporated new subclades, which are shared by two or more members, found in their Big Y results, as the members provide us with their bed and vcf files from Big Y. Those files are the raw data which are analyzed by the spreadsheet one of the project members developed to weed out false "novel variants" that FTDNA doesn't weed out. This is what wkauffman wrote about in his post, which you responded to.



            Read my post recently in another thread at http://forums.familytreedna.com/show...2&postcount=12. I wrote there what I regard as a success story: "I used FGC for analysis of my Big Y BAM file. I was given a list, based on their analysis, of my best quality novel variants, which they named - FGC13480-FGC13492. I then had 12 of the 13 made testable at YSEQ, which one of my semi-close matches at FTDNA tested. (We're an 83/111 match. My estimate of when our common ancestor lived is 1,200-1,500 years ago.) He was found to be FGC13492+, forming a new subclade of R-CTS2509." That new subclade is reflected in the project results page and our U106 haplotree.
            I have checked the results but it is the U106 Y-tree that has not been updated with all the Big Y SNPs.

            A GD of 28 at 111 markers is more like 2,800 ybp.

            Comment


            • #36
              Originally posted by 1798 View Post

              A GD of 28 at 111 markers is more like 2,800 ybp.
              I see. So, you think that someone who shares one of the singletons from my Big Y results has a TMRCA of 2,800 years with me.

              The upstream subclade for both of us is CTS2509. According to Dr. McDonald, whom you seem to regard as knowledgeable (and he is), CTS2509 is about 2,100 years old. Can you explain to me how FGC13294 is 700 years older than its parent subclade?

              With all due respect, I'd have more interest in your commentary about Big Y results and the R1b-U106 Project if you've had the Big Y test yourself and had not left the R1b-U106 Project. But I guess you'd rather speak from your personal opinion from the outside than having some involvement in the process.
              Last edited by MMaddi; 14 July 2015, 03:21 PM.

              Comment


              • #37
                Originally posted by MMaddi View Post
                I see. So, you think that someone who shares one of the singletons from my Big Y results has a TMRCA of 2,800 years with me.

                The upstream subclade for both of us is CTS2509. According to Dr. McDonald, whom you seem to regard as knowledgeable (and he is), CTS2509 is about 2,100 years old. Can you explain to me how FGC13294 is 700 years older than its parent subclade?

                With all due respect, I'd have more interest in your commentary about Big Y results and the R1b-U106 Project if you've had the Big Y test yourself and had not left the R1b-U106 Project. But I guess you'd rather speak from your personal opinion from the outside than having some involvement in the process.
                I have a GD of 39 at 111 markers to other testers with the same terminal SNP and it is estimated to be 4000ybp,(S5520 (S5520) 2069 BC). I am in a U106 project at present.

                It might be a good idea to hold off on Big Y for a few months. There are hints that standard pricing will be reduced and more features added later this year.
                https://www.genomeweb.com/sequencing...e-y-chromosome
                Last edited by 1798; 15 July 2015, 01:00 AM.

                Comment


                • #38
                  Originally posted by 1798 View Post
                  I have a GD of 39 at 111 markers to other testers with the same terminal SNP and it is estimated to be 4000ybp,(S5520 (S5520) 2069 BC). I am in a U106 project at present.
                  This doesn't answer my question of how my novel variant from Big Y is shared by someone else. We are both CTS2509+, estimated by Iain McDonald to be 2,100 years old. We both agree (!) that Iain is very good at estimating subclade ages, based on Big Y results, which are more accurate than dating using STR results.

                  So, I'll repeat the question. How can my novel variant (found in no other Big Y result from other CTS2509+ men), which has been named FGC13492, be 2,800 years old when it's downstream from CTS2509 which is only 2,100 years old? Did Mr. CTS2509 have a time machine that allowed him to go 700 years into the past and father his son? Or did Mr. FGC13492 have the time machine, which he used to go 700 years into the future to father his father?

                  You can see why I'm confused, can't you? Maybe you can educate me.

                  Comment


                  • #39
                    Originally posted by MMaddi View Post
                    This doesn't answer my question of how my novel variant from Big Y is shared by someone else. We are both CTS2509+, estimated by Iain McDonald to be 2,100 years old. We both agree (!) that Iain is very good at estimating subclade ages, based on Big Y results, which are more accurate than dating using STR results.

                    So, I'll repeat the question. How can my novel variant (found in no other Big Y result from other CTS2509+ men), which has been named FGC13492, be 2,800 years old when it's downstream from CTS2509 which is only 2,100 years old? Did Mr. CTS2509 have a time machine that allowed him to go 700 years into the past and father his son? Or did Mr. FGC13492 have the time machine, which he used to go 700 years into the future to father his father?

                    You can see why I'm confused, can't you? Maybe you can educate me.
                    That is just an estimate not a fact. A GD of 28 seems a lot between two men who are recently related.

                    Comment


                    • #40
                      Originally posted by 1798 View Post
                      That is just an estimate not a fact. A GD of 28 seems a lot between two men who are recently related.
                      The way I arrived at the figure of a TMRCA of 1,200-1,500 years is by comparing the 67 marker haplotypes of myself, the person who tested FGC13492+ and a closer match of 63/67 (now 104/111 with me). I used the McGee Y-DNA utility at http://www.mymcgee.com/tools/yutility.html. Many people regard the McGee utility as more accurate than FTDNA's TiP calculator.

                      Since all three of us now have 111 markers, I reran the comparison. The McGee utility doesn't use all 111 markers. It only used 94. However, I got similar results as the comparison I'd done in the past with 67 markers. In fact, at this comparison level my 104/111 match has a TMRCA with the FGC13492+ match of 1,200 years (same as before) and I have a TMRCA of 1,350 years (not the 1,500 years I got with 67 markers) with the FGC13492+ match.

                      I think you're using too simplistic a view of GD and TMRCA. I suspect that you're taking our GD of 28 at 111 markers and mulitplying that by 100 to come up with a TMRCA of 2,800 years. That's simplistic because it doesn't take into account the different mutation rates of different markers, which is something a TMRCA calculator like FTDNA's TiP or the McGee utility does. Not all marker mismatches are the same, as you seem to assume. Plus, there is an element of probability involved and it may be that the TMRCA between the FGC13492+ match and me is an outlier on the close end.
                      Last edited by MMaddi; 16 July 2015, 09:37 AM.

                      Comment


                      • #41
                        Originally posted by MMaddi View Post
                        The way I arrived at the figure of a TMRCA of 1,200-1,500 years is by comparing the 67 marker haplotypes of myself, the person who tested FGC13492+ and a closer match of 63/67 (now 104/111 with me). I used the McGee Y-DNA utility at http://www.mymcgee.com/tools/yutility.html. Many people regard the McGee utility as more accurate than FTDNA's TiP calculator.

                        Since all three of us now have 111 markers, I reran the comparison. The McGee utility doesn't use all 111 markers. It only used 94. However, I got similar results as the comparison I'd done in the past with 67 markers. In fact, at this comparison level my 104/111 match has a TMRCA with the FGC13492+ match of 1,200 years (same as before) and I have a TMRCA of 1,350 years (not the 1,500 years I got with 67 markers) with the FGC13492+ match.

                        I think you're using too simplistic a view of GD and TMRCA. I suspect that you're taking our GD of 28 at 111 markers and mulitplying that by 100 to come up with a TMRCA of 2,800 years. That's simplistic because it doesn't take into account the different mutation rates of different markers, which is something a TMRCA calculator like FTDNA's TiP or the McGee utility does. Not all marker mismatches are the same, as you seem to assume. Plus, there is an element of probability involved and it may be that the TMRCA between the FGC1392+ match and me is an outlier on the close end.
                        I am not saying that I am right. Here is what Michal posted at anthrogenica.
                        “The major conclusion remains unchanged, which means that U106 diverged most likely between 6500 and 5000 years ago, probably within the 6000-5500 BP time frame.”

                        Comment


                        • #42
                          Originally posted by 1798 View Post
                          I am not saying that I am right. Here is what Michal posted at anthrogenica.
                          “The major conclusion remains unchanged, which means that U106 diverged most likely between 6500 and 5000 years ago, probably within the 6000-5500 BP time frame.”
                          Well, that's wonderful, but what does it have to do with you disagreeing with my estimate that my 83/111 match who shares one of my Big Y singletons with me has a TMRCA with me of 1,500 years or maybe less? You could have left it at just saying that you may not be right. The rest has nothing to do with the question we're discussing.

                          Comment


                          • #43
                            Originally posted by MMaddi View Post
                            Well, that's wonderful, but what does it have to do with you disagreeing with my estimate that my 83/111 match who shares one of my Big Y singletons with me has a TMRCA with me of 1,500 years or maybe less? You could have left it at just saying that you may not be right. The rest has nothing to do with the question we're discussing.
                            Yfull gives a TMRCA of 4,800 for Z9 and your SNP is 12 downstream of Z9. At 150 years per SNP that would put your CTS2509 at 3000 ybp and you and your match are one below it.
                            Last edited by 1798; 16 July 2015, 12:47 PM.

                            Comment


                            • #44
                              Originally posted by 1798 View Post
                              Yfull gives a TMRCA of 4,800 for Z9 and your SNP is 12 downstream of Z9. At 150 years per SNP that would put your CTS2509 at 3000 ybp and you and your match are one below it.
                              Iain McDonald, whom you cite approvingly all the time, disagrees. Based on comparison of a dozen or more Big Y results from CTS2509+ men, he estimates CTS2509 is about 2,100 years. I would take Iain's estimate over YFull's estimate, since Iain has access to a larger dataset for CTS2509. Does YFull even make an official estimate for CTS2509, based on analysis of BAM files from CTS2509+ men, or are you just extrapolating, based on your SNP counting?

                              Plus, you can't seem to decide whether you prefer a SNP mutation rate of 150 or 136 years. You don't seem to want to even consider FGC's 90 rate of 90 years, which I pointed out is based on significantly larger coverage of the y chromosome. If you pick and choose your SNP mutation rate to come up with the answer you want to get, that's not very scientific.

                              Comment


                              • #45
                                Originally posted by MMaddi View Post
                                Iain McDonald, whom you cite approvingly all the time, disagrees. Based on comparison of a dozen or more Big Y results from CTS2509+ men, he estimates CTS2509 is about 2,100 years. I would take Iain's estimate over YFull's estimate, since Iain has access to a larger dataset for CTS2509. Does YFull even make an official estimate for CTS2509, based on analysis of BAM files from CTS2509+ men, or are you just extrapolating, based on your SNP counting?

                                Plus, you can't seem to decide whether you prefer a SNP mutation rate of 150 or 136 years. You don't seem to want to even consider FGC's 90 rate of 90 years, which I pointed out is based on significantly larger coverage of the y chromosome. If you pick and choose your SNP mutation rate to come up with the answer you want to get, that's not very scientific.
                                I didn't write the script for Yfull. R-Z325CTS2509/S1734 * Z324 * FGC362/Y1407/Z8172... 3 SNPsformed 3300 ybp, TMRCA 2600 ybp

                                Comment

                                Working...
                                X