No announcement yet.

Sudden, large increase in private variant counts

  • Filter
  • Time
  • Show
Clear All
new posts

  • Sudden, large increase in private variant counts

    I know that on occasion private variants may not be added to counts in the Block Tree until well after the original test is completed--maybe years after. Probably because ambiguous scan data is clarified by the test of a closely related donor.

    But how common is this? Can any users provide anecdotal data?

    And were there any tipoffs that such an event was coming? I've heard that the kit's profile page might sometimes list private variants that are not reflected in the Block Tree in such cases. Is that true?

    In theory that makes sense to me that such things might happen. But some data make me wonder if this could be a pervasive phenom affecting some Discover TMRCA date estimates. Which I don't think I like. It suggests that there is so much doubt about the identification of SNPs that anyone without extremely close matches might have almost useless data.

  • #2
    Just to clarify, the situation described above has NOT happened to me. I'm just hypothesizing a scenario that seems to be implicit in Discover's very weird dating of R-FGC23343 and subclades. I don't think--or at least don't WANT to think--this will actually happen to FGC23343. This thread is just soliciting data to assess, even if only based on anecdotal data, how likely the scenario is.

    After reperforming a TMRCA calculation for an historical benchmark clade under my new, slightly-tweaked algorithm I had an insight about Discover's (probable) methodology that I think very likely explains their very weird, shifting estimates for R-FGC23343 and subclades. My guess is that their fundamental orientation is


    • #3
      Hello again. I may not even be speaking the same 'issue' but I thought I would respond anyway just in case. I have two examples of SNPs that I found by accident in kit 967421.

      I was looking in the SNP tool for BY227622 - a listed variant on my kit. When I got there, and clicked on the link, it pulled up the graphical representation with location - I'll post here below. I DID speak with them about it - they told me that BY227622 is 'garbage' and that the determination of it's 'garbageness' made the ones closeby it also Garbage.'

      My issue, which I am calling about tomorrow, (sent an email a week ago and no response so far....) is that they gave me specific regions that are 'noisy' and said that BY227622 is in one of them. After the call, I realized that more than HALF the variants on MY branch of the Y tree are also in "noisy" regions - it seems that 'noisy' means 'usable' not garbage.

      See nexrt shot


      • #4
        ​You can see BY227622 on the right - and the edge of another on the far left - and then one in the middle. These are already 'named' but not yet on the tree. At least one of them I can pull up by name but not the other. That means that other testers have them but they don't have a 'place' yet. The other is still just a number - which means, as I believe you fear, that yes, there's a 'private variant' in my kit that is obviously there and just as obviously ignored. Had I not stumbled upon it I would never ever ever know.image.png


        • #5

          So you found the other two just by chance when you pulled up the scan for the first one? The other two don't appear in your list of private variants? When you say that they're ignored, you don't mean that they aren't simply not included in the Block Tree's calculation for the summary statistic of 'average private variants' for that terminal clade?

          Sorry to keep repeating myself. This seems as if it may be a slightly different situation than other people's explanations had led me to believe. I thought that there were variants included in a kit's profile for private variants, but just not included in the calculation of 'average private variants' listed for their terminal clade in the Block Tree. But the situation you describe gives me the impression that sometimes there are private variants in existence which aren't even listed in the kit's profile.

          I can hypothesize why they may have decided not to included 2 of these 3 in the Block Tree. Because they are relatively close together, it makes me wonder whether this is evidence of some rare event other than a straightforward point mutation--like maybe a gene conversion, in which a single event could result in several "non-consensus" values at once.

          "A different type of recombination Using sequence data generated by new technology that reads long strands of individual DNA molecules, Chang and Larracuente developed a strategy to assemble a large part of the Y chromosome and other repeat-dense regions. By assembling a large portion of the Y chromosome, they discovered

          If something like that were the case, it would be a bit redundant to place more than one of these within the Block Tree. The statistic of interest in calculating TMRCA would be the number of events, and not necessarily the number of specific locations effected by these events. Most often only a single location is affected, but sometimes more than one is. It is suspicious to see more three all within a relatively close physical proximity. If that is indeed what is going on, I could understand their rationale for excluding the other two from the kit profile's list of private variants as well.

          Thanks again for sharing. I haven't seen anything like that in the kits I manage, so on the surface it doesn't seem to have a role in the weird TMRCA figures I'm seeing. But then again, I wasn't looking for it. I'll have another gander.


          • #6
            Hello. Ok, so it seems I AM on the same page. More details....and a couple of simple basic questions.

            Question: I was told by a Big Y expert that certain regions of the Y were "noisy" and that my BY227622 was "garbage" because of its location - ergo the other ones nearby were also garbage, or, like your comment fears, humanly interpreted as garbage. image.png

            I double-checked the numbers to be certain I had them correct. BY227622 is at location 56838312 - clearly on the q12arm. The centered one in the graphic above is at location 56838275. I also learned that the cursor hovering over it will show 'derived' for BY227622 because it is the graph FOR that variant. Hovering over this other one brings up 'mismatch.'

            Step 2: Check Y-Browse to see if there is a name for 56838275 (when a SNP is 'discovered' for the first time, an alpha numerical 'name' is reserved for it - off books. When it is found a second time, FT 'names' it in their customer results tool and uploads the named SNP to Y Browse at ISOGG. And it DOES:


            to get details, click on the name: BY227574



            • #7
              Also, when I pull up BY227574 on my results page, as it happens, there are MORE + results for it than there are for the other one: this shot is scrolled down so you can see some more of the positive reads - You can see at the top of the page that for whatever reason (as yet unexplained except for that it's 'garbage' ) it is marked "?" instead of derived.



              • #8
                Here is the info for the same region of my cousin's kit, whose terminal clade is R-FT372222. So yes, this does seem to be a "garbage" region, or at least a region without any diagnostic significance. Maybe everybody has similar reads, meaning that even if they were "real", they would belong to a very ancient branch of the phylogenetic tree, providing no useful information on common ancestry for many hundreds of thousands of years.

                The company may have other reasons for specifically calling these "garbage"--maybe mechanically they are difficult to read, maybe their physical location is more prone to the rare events like gene conversion, etc. that make true point mutations difficult to distinguish, etc. But at the very least, the fact that my cousin also shares them seems to prove they have no diagnostic utility.



                • #9
                  My kit in question has 3046 named variants and 28182 no calls - one of which should have been a call as seen above. The results tool is currently refusing to "show all" or show "no" after 4+ minutes of spinning blue wheel so I can't say as to the others. As for the 'noisy' centromere region - given to me as a 'garbage' section, my group of 14 men have 14 SNPs listed here; SIX of which have been placed on the Y tree. Apparently they are NOT GARBAGE - in fact one of them was used to define my new Haplogroup. I can expand further if you're interested - but I can say that this graphic is not the only one I've seen that pulls up unlisted/ignored/no-called SNPs. One is shared by others in our group of 14 - my tester got a healthy share of the SNP but all the reads were low quality.

                  I write this because FT is making a great big deal of us having those three new SNPs, but flat-out ignoring a 'soft-read' that ties to another group. I am suspicious because (I haven't done the logic 'proof' yet but we both could parse it out) additions to the Y-tree have rules dictated by logic. Did they ignore the​ 'soft +" because it would break the logic?

                  I am still waiting for the answer to my Feb 1 question, too.....from the experts.


                  • #10
                    Ok, so that BY226722 shows up with your cousin too! It shows up in several of my fellow branch testers - does it show up as a non-matching variant with your cousin's other matches?

                    Maybe it's way up the tree - maybe they are looking at another lightening strike like the R branch sustained recently....My concern - listed above regarding the centromere section - does "noisy" = "garbage?" And yes, I sent an email back the day of this conversation but haven't heard back yet.

                    The 'other' thing these could be - are representations of the 'new' 'Dick Cheney' STRs - reported from an undisclosed location (They all start with FY). However, (and this to me is like quantum physics - ie different rules) I read somewhere that an STR still has only one location. So it would just be in one spot, not 14, 22, or however many.....


                    • #11
                      Originally posted by ACLineberry View Post
                      Ok, so that BY226722 shows up with your cousin too! It shows up in several of my fellow branch testers - does it show up as a non-matching variant with your cousin's other matches? . . .
                      No, it doesn't show up. I'm guessing that the consensus is so strong regarding the variants' lack of diagnostic utility that they don't consider them "proper SNPs". I'm thinking that if they included every variant without any diagnostic significance, the list would grow to a monstrous size that would only obscure the potential significance of the "true SNPs" on that list.

                      So my thinking distinguishes between "variants", which is a specific location with an allele value different than the branch's ancestral value, and a "true SNP", which is a variant that meets certain additional criteria designed to detect individual point mutation events, which have practical utility for determining TMRCA.

                      Re: these other 3 variants, I find this kind of thing easier to follow when summarized in a graphical format. Sometimes narratives confuse me. Maybe FTDNA would find it easier to understand, too, if you constructed a little table for each variant, showing each donor whom you know to be positive, their currently designated terminal clade, and the specific block where the variant in question appears in that donor's path, if at all. It would make it easier for me to follow the adjustments to the current phylogeny needed to make it all make sense.



                      • #12
                        Ok, I am lost at the 'in-path location.' In fact, my current thought is that they are as well. BY227622 was actually listed (ie noted and given forth) on my fellow tester's results - which came in Dec 2022. Not even three months before I started up my questioning. If garbage, did they decide that AFTER Dec 2022? Seems unlikely, given that I was presented with sections of the Y known to be 'noisy.'

                        Because I don't understand the in-path location, what I CAN produce is a screen shot of my Gang of 14 on the block tree. Underneath you will see where my tester is different from the rest. It can only 'PROVE' that these testers actually registered a result - and that the result was a 'no' for BY226722. And they are so little....start on the left: Nicolas C NO; John W NO; Ken P NO; and Thomas John NO.

                        BY226722 appears in our 'old' subgroup. Before Ken's results, my tester was in the parent FGC76653 group and the Ft146735 was the subgroup formed about 1850. There is no way to 'prove' any other tester does NOT have it - as it might be a no-call. But clearly, my tester has it - and the two members of FT146735 do not. Neither does Thomas John. Because it is listed as a non-matching variant by Family Tree - they have (at least at some point) accorded it significance.

                        Yes, our three shared variants (and a surname) are significant, but why isn't BY226722? I'm in contact with the manager of John Perkins kit (he's in the parent group FGC76653) and he does NOT have the variant. But it's not listed. Otherwise I never would have known. My guy could have developed the mutation in utero - but the results leave a valid question deserving (IMO) of a satisfactory answer.

                        Nicholas C NO; John W NO Ken P NO Thomas John NO