Announcement

Collapse
No announcement yet.

The day Discover just stopped trying

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Thanks for the response.

    Page 3 of the white paper shows that about 4.5% of all identified SNPs were detectable under BY500 but not BY700, versus about 37% detectable under BY700 but not BY500. So the situation you describe ought not to happen very often.

    Yes, SNPs are supposed to be permanent. But not every variant is an SNP. I don't think I can add much useful info here, but I accept that there are some rare individual circumstances where variants are not permanent--maybe an actual although improbable back mutation has been observed or that variant is demonstrably due to something other than an individual point mutation. In which case it would not be classified as an SNP.

    To be honest, I generally don't like second guessing FTDNA on that kind of granular level. The whole point of my statistical aggregation methodology is to avoid getting into the weeds by depending on consistent, large identifiable patterns.

    My absolute worst nightmare scenario would be to be told that some bizarre, totally opaque technical distinction, completely invisible from my account data could legitimately result in a 1,000 year impact on my TMRCA calculation. That would cause me to entirely lose faith in the whole genetic genealogy industry. That's why I absolutely refuse to believe the Discover database's insistence that the R-FT372222 lineage experienced only one single SNP in the 800 year period immediately after diverging from R-FGC23343. I prefer to believe that the staff specifically responsible for that database are simply incompetent and/or don't care about the accuracy of their dates, rather than that FTDNA's testing staff are incapable of consistently producing meaningful results.

    I'm not really sure I'm completely following your description re: the inconsistency of reported match results with current position in the phylogenetic tree. I don't want to add to the confusion by expressing a strong opinion when I possibly do not understand the situation. But I also can't understand why they won't simply update the tree if you can prove that you have legitimate matching results that conflict with it. Like with screen prints of the chromosome browser data. That's all they needed to update the tree for me.​



    Y Chromosome browsing tool.png
    Attached Files
    Last edited by benowicz; 19 February 2023, 03:45 PM.

    Comment


    • #17
      Originally posted by benowicz View Post
      . . . My absolute worst nightmare scenario would be to be told that some bizarre, totally opaque technical distinction, completely invisible from my account data could legitimately result in a 1,000 year impact on my TMRCA calculation. That would cause me to entirely lose faith in the whole genetic genealogy industry. That's why I absolutely refuse to believe the Discover database's insistence that the R-FT372222 lineage experienced only one single SNP in the 800 year period immediately after diverging from R-FGC23343. I prefer to believe that the staff specifically responsible for that database are simply incompetent and/or don't care about the accuracy of their dates, rather than that FTDNA's testing staff are incapable of consistently producing meaningful results. . .
      It would be like you took your car in for an oil change and the mechanic told you you needed some kind of $1,000 repair. After you spend an hour debating with your wife whether to just buy a new car, in the end they just hand you a $2,000 bill because the mechanic forgot to mention that labor and whatever other charges weren't included in the original quote.

      "Yeah, sorry about that. Forgot to mention we're sometimes completely unreliable."

      Comment


      • #18
        Another exhibit supporting the hypothesis that the Discover team are just phoning it in.


        ZAP002.png

        Comment


        • #19
          Hello again! You are doing me a great service by aiding my understanding - for some reason, the graphic from your response #12: https://forums.familytreedna.com/for...910#post333910

          is not showing up - it is only a blank white square - could you repost it?

          Comment


          • #20
            I may be flat-out wrong in my concerns - and just unable to see my disconnect in the logic. I have tried to come up with an example for you from my data - and I'm unable thus far.....

            See if the following a: makes sense, and b: presents a real issue.

            My kit has 13 matching testers. One of them, along with mine, form a haplogroup. On the block tree, the avg. # of 'private' variants is 1. In fact, each kit has exactly one private variant. So far so good......

            ​​​​​​


            On the "matches' page, here are the 'non' matching variants my kit shows compared to Kenneth P: BY44213, BY44214, BY51073, BY49114, BY227622, FT92348, 12158439, TY92287, 11739854.

            The conclusion I have drawn is that the term 'private' variant cannot possibly refer to the 'non-matching' variants. The only way the term 'private variant' can be defined, consistent with the block tree, is for the term 'private variant' to refer ONLY to the variants that are identified by their location on the Y (rather than an alphabetical prefix). 12158439 is my kit's 'private' variant - 11739854 is Ken's. That can easily be confirmed by using the 'private variant' tab.

            Here is my issue: both of those 'private' variants actually DO have alpha prefixes. 11739854 was discovered by the Chinese and has been named TY133388. Meanwhile Family Tree has ALSO labeled it and included it in Y-Browse as ​FTD37694.

            12158439, my kit's variant, has an official name - FTA74215. It was named a year before my kit was run and it was named by Family Tree DNA. Screenshot....

            ​​​



            So, by no logic I can discern, FT has chosen two - out of the nine - to treat differently.

            My kit was tested before Ken's. It had four numbered (i.e. 'private') variants, which were added to the 'non-matching' variants box of every other tester on the matches page. When Ken's test came in - he had three of those four in common with Jim (so his 'list' did not include them as non-matching). After review, Family tree switched out the numerical IDs for the alphanumerical ones - both on the block tree and on the matches page, listing them as the defining SNPs of our subbranch.

            I have no problem atall with the re-naming and new block. That makes sense. What I cannot understand is why other variants (ie ones already discovered) are ignored.

            Jim and Ken have NINE nonmatching variants, not 2.

            In trying to figure out what's so different about those two variants vs. the others, I have considered, as you mentioned, instability. If true then why list them at all?

            I considered the fact that the numbered ones may just not yet have a place on the tree - but there are more than a few, take BY44213, BY44214, BY51073, BY49114, BY227622 for examples; which have the lettered prefix but are not placed on the tree either, so that's not it.

            Further, inversely, ACT876 was discovered early on and is on the Y tree on a different branch. Only one of our 14 testers carries it. Literally none of these variants are 'private' yet some are so called and others not.

            In particular, consider BY227622. According to Ken's results, the only man he differs with is Jim. In fact Jim does have this variant. Jim's results vs Ken's show that Ken doesn't have it. Also, neither do Thomas John, Nicholas or John W. That accounts for about 1/3 of our group - all close together. Had BY227622 been chosen (and it could have been) instead of Nicholas and John W's FT146735, the Thomas John would be in their group and FT146735 would be the ignored straggler. What of the other 11 testers - almost 2/3 of our group? All no-calls? I have no way to ascertain that information.

            Any insight you have would be gratefully appreciated!

            Anne

            I just want someone to explain to me the 'rules' for segregating private variants.

            Comment


            • #21
              Originally posted by ACLineberry View Post
              Hello again! You are doing me a great service by aiding my understanding - for some reason, the graphic from your response #12: https://forums.familytreedna.com/for...910#post333910

              is not showing up - it is only a blank white square - could you repost it?
              Hope this works.



              Calculation of I-FGC76653.png

              Comment


              • #22
                Yes! I can see it now. I get it - Thanks! And, I got a few answers today to my semantics questions - Getting my wisdom teeth pulled was easier....and I'm still somewhat confused. I was told some of the variants are 'garbage' including my BY227622, which is supposed to be in a 'noisy' region. That arose over a NEW problem I found - on the SNP browser graphic for that variant, there are three more nearby. Hovering over them gives 'mismatched' instead of 'derived.' On Y-browse they have alpha numeric names, but when I'm back on FT those names aren't in there. In other words, had they not been next to the one I was looking at, I never would have known they were there at all (they aren't on the tree, aren't listed in the SNP results and aren't subjected to a match/nonmatch search).

                I've also found one of the defining tree SNPs in my kit with only eight weak reads. They were consistent, and registered as nonmutant. It was registered as nonmutant - which is where interpretation comes in - I would have erred on the side of no-call (but I don't have expertise). I've found another with about 20 reads, all very weak, and about half and half. It too registered as a called nonmutation.

                Ignore if I'm pedantic, but I learned to an extent how the 'system' goes:
                1. If you are the first tester with the variant, FT reserves an alphanumeric ID for it. It stays a number on their system and on Y-Browse.
                2. When a second tester has it, FT claims the name over on Y-Browse, so it appears on the Y-Browse tool with its name.
                3. They do NOT change anything on the FTDNA site unless and until.
                4. Two men in the same part of the tree test +.
                5. At that point, FT puts it, with its new ID, in the block tree, in the match results, and in the FT SNP browser, and then uploads the info to Y-Browse.

                As I mentioned, BY227622 has other variants around it deemed 'garbage.' At some point they found it of interest - probably when two other testers in my group had it some time ago. We have another test on the way that should be closer to Jim's and equidistant from Ken's - the paper trail is solid. I also now understand where the Y can be affected by the matri-line, so I'm interested to see the five gen effect on the results....

                Anne

                Comment


                • #23
                  I just posted about the unnecessary replacement (with weird TMRCA estimates) for the Y-STR TiP in the Announcements and New Features section, in response to the original posting "Three New Features released." We are trying to edumacate our samplers as to why they should do Big Y, and those who do so cannot understand the "age estimates" in Tools offered at Y-STR matches vs. the Block Tree at Big Y. The TMRCA estimates at Y-STR "rank" men by genetic distance with exact matches (0 GD) but who are not from the same obvious paper-trial ancestor at the top of the match list, so that in Y-STR I am separated from my known cousins by the genetic distance calculated from our Y-STR mutations. I appear to be "closer" matches to my more (very distant) distant cousins who have fewer mutations. But the Big Y matches correctly rank my known cousins with me as "closer" matches, and we close cousins have a different terminal SNP from our very distant ones. Explaining these concepts is not helped when the TiP and TMRCA estimates are made more complicated by "new" presentations which, as mentioned by Mr McCoy above, seem to be based on algorithms that each haplo administrator takes an individual approach as to whether they will make any attempt to explain. As an ex-programmer myself (but not a statistician) I am not trying to get proprietary information that I probably would not understand, but I need some help in explaining the reasons for encouraging others to spend the money to participate at the genomic level (or to let us sponsor them), and TMRCA vs. SNP-aging baffles many of our samplers, most of whom are within subgroups whose previous assignments as I-M253 and I-P37 bound them together visually at Y-STR, but with Big Y, upstream and downstream SNPs no longer give that visual binding. My co-admin and I try to explain at every Zoom meeting we hold, but these "new features" and shifting estimates are not clear to us. We do not like saying "Trust us, because we will learn something that we cannot hope to explain to you, and our subgrouping has to stand in for any explanations" but that is usually the drift.

                  However, the Discover SNP aging for our I-M253 downstream SNPs is right on the money for what we were conjecturing from paper trail, and so are the I-P37 downstreams, so benowicz concern about reliability of the Discover aging methodology re Time Tool is troubling, because we were quite pleased with those estimates. Have we been blinded by (non) science?
                  Last edited by clintonslayton76; 24 February 2023, 11:49 PM.

                  Comment


                  • #24
                    Well, at least we can say that people closer to us on the Block Tree are more closely related.

                    It's easy to get lost in the weeds when considering the MRCA dates implied by the various technologies.

                    Y-STRs provide the least precision of all the technologies because we can only infer the occurrence of a mutation indirectly--by looking at the differences between the signatures of two descendant donors. Depending on the length of time elapsed since the birth of their most recent common ancestor, there is a greater or lesser chance that the occurrence of mutations will be hidden from us by convergent mutations--only coincidentally resulting in identical allele values for the two descendants. The reason being the very fast rate at which STR markers mutate in general.

                    I would consider an TMRCA estimate based on Y-STRs primary as an exclusionary tool--more useful for showing us who is NOT a recent patrilineal relative than who is or precisely how closely related they are. Yes, you do get a very loose idea of how closely related two cousins are, but it's not precise enough to make important decisions with. It's the matter of a "best" estimate in an extremely wide range.

                    Possibly, if you are very lucky and have a number of donors who are related at a similar degree of cousinship, you might be able to apply some aggregation techniques to improve precision of STR estimates. But there is an almost circular type of reasoning involved, in that a precondition for reliability is that you have already established the degree of cousinship, or at the very least verified that they all diverge from one another at the same point in the Block Tree. This is probably either not going to be available or provide no useful information since you already know the degree of cousinship of the donors.

                    SNP technology has one advantage in that at the very least it reliably indicates the relative degree of cousinship between three (3) or more people because the probability of a convergent mutation event is ridiculously small. If the donors' literal MRCA has not tested directly, these SNPs are maybe not, strictly speaking, a direct view of the MRCA's DNA signature, but functionally it is very close to it. The probability of obscuring mutation events is practically nil.

                    Like STRs, the precision of SNP dating techniques is still fuzzy. At the end of the day, you're still only looking at the "best" estimate in a wide range.

                    That said, like any other highly variable process, the "relative precision" available is significantly different based on how much data you have available. The more data you have, the more precision you can achieve in your estimate. So ironically, you should be able to expect a higher degree of "relative" precision for TMRCA estimates between very remotely related SNP donors than you could for very closely related donors--each year or generation lapsed since the birth of their MRCA is in effect an additional Bernoulli trial for analysis. If it weren't for the confounding factor of potentially convergent STR mutations, the same principle would apply--it's just that you can't observe STR mutations directly.

                    St Dev as % of total mutation events - 111 STR marker array.png






                    Originally posted by clintonslayton76 View Post
                    I just posted about the unnecessary replacement (with weird TMRCA estimates) for the Y-STR TiP in the Announcements and New Features section, in response to the original posting "Three New Features released." We are trying to edumacate our samplers as to why they should do Big Y, and those who do so cannot understand the "age estimates" in Tools offered at Y-STR matches vs. the Block Tree at Big Y. The TMRCA estimates at Y-STR "rank" men by genetic distance with exact matches (0 GD) but who are not from the same obvious paper-trial ancestor at the top of the match list, so that in Y-STR I am separated from my known cousins by the genetic distance calculated from our Y-STR mutations. I appear to be "closer" matches to my more (very distant) distant cousins who have fewer mutations. But the Big Y matches correctly rank my known cousins with me as "closer" matches, and we close cousins have a different terminal SNP from our very distant ones. Explaining these concepts is not helped when the TiP and TMRCA estimates are made more complicated by "new" presentations which, as mentioned by Mr McCoy above, seem to be based on algorithms that each haplo administrator takes an individual approach as to whether they will make any attempt to explain. As an ex-programmer myself (but not a statistician) I am not trying to get proprietary information that I probably would not understand, but I need some help in explaining the reasons for encouraging others to spend the money to participate at the genomic level (or to let us sponsor them), and TMRCA vs. SNP-aging baffles many of our samplers, most of whom are within subgroups whose previous assignments as I-M253 and I-P37 bound them together visually at Y-STR, but with Big Y, upstream and downstream SNPs no longer give that visual binding. My co-admin and I try to explain at every Zoom meeting we hold, but these "new features" and shifting estimates are not clear to us. We do not like saying "Trust us, because we will learn something that we cannot hope to explain to you, and our subgrouping has to stand in for any explanations" but that is usually the drift.

                    However, the Discover SNP aging for our I-M253 downstream SNPs is right on the money for what we were conjecturing from paper trail, and so are the I-P37 downstreams, so benowicz concern about reliability of the Discover aging methodology re Time Tool is troubling, because we were quite pleased with those estimates. Have we been blinded by (non) science?

                    Comment


                    • #25
                      Thanks for the detailed reply and the spreadsheet. We are fairly certain about the level of cousinship for most of our largest subgroup members, but each group had some "brick wall" members (by lack of genealogical evidence). yDNA has been the major tool for subgrouping those mysteries because of a distinctive surname (albeit with a dozen variations) and their (two separate) "I" haplotrees, obviously not genetically related to one another, or to the few "R" samplers, and some "J" samplers, whose ancestor adopted the surname from a non-genetic connection.

                      Your comment re the value of exclusionary tools is right on point, as the initial 12-marker Y-STR had value just for that: getting haplos was a major issue with our project, because of postings claiming a single ancestor with similar surname by someone who wanted that to be true, but I suspected that that could not be correct and Y-STR proved that.. But I posted re problems with (highly recommended at the time) 37-marker tests giving incorrect Genetic Distance based on what I considered faulty weighting of mutations: counting "fast" mutated DYS locations at that level indicated no common ancestor with a sampler's subgrouping. But when the sample was expanded for Y-STR, the ratio of matches outweighed the non-matches. But the doubt was sown in the sampler's mind because the sampler's paper-trail could not be documented all the way back to the man shared by all of that subgroup. When the sampler went on to do Big Y his presence in our subgrouping assignment for this ancestor was obvious..

                      The lack of precision you mention with Y-STR is why we encourage Big Y, but because of the expense, most of our samplers in the two largest subbgroups have done only Y-STR, and many of those are transferred from Ancestry, which disallows matching at all, and several of those are deceased. So we encouraged Big Y because downstream SNPs might adjust as more men test. This goes back to your original point: we reviewed the Discover tools in Beta with samplers at a Zoom meeting, and one of our men shows a SNP (R-FCG71179) with TMRCA est at ~750BCE. He has 0 Big Y matches, but the SNP in Discovery shows 36 downstream descendants from his assignment. He is working to get a cousin to do Big Y, with the thought that perhaps that might yield a shared downstream SNP for each of them. The est gaps to five upstreams are 50 yrs, 1350 yrs, 650 yrs, 0 yrs, and 100 yrs. These look suspicious to me, because none of our "I" haplo SNPs suggest over 1000 yrs between mutations going back 5 levels. I realize that one day that 1350 yr estimate between SNPs might change, and also that these are averaged (mean) estimates based on the number of men who test (at FTDNA? or from a compiled database?), and thus very approximate. But when I read your original post I was reminded that one of our samplers was assigned a SNP that was uniquely "old" for TMRCA among 42 samplers sharing an upstream SNP of R-A1243 as of the date of this writing, most of whom show est TMRCA ranges from within Common Era. It seems remarkable to me that two or three men born this century, including our member, would be assigned SNPs with TMRCA that far before Common Era, but then, I might be an idiot. I think that the Discover Tools on FTDNA should show "Beta" instead of "New" because of the concerns you mentioned in the first place.​
                      Last edited by clintonslayton76; 26 February 2023, 05:57 PM.

                      Comment


                      • #26
                        To update my previous posting, our one R-FCG71179 offered to sponsor his 2nd cousin, and that match generated a "new" terminal SNP suubclade shared only by them, so far. and the age estimate went from ~800BCE to ~1900CE on Discover, based on the +30 mutations shared, shown on the Block Tree. Mr. Ian Williamson, co-admin on the R-U106 project, patiently explained this and lifted the fog of confusion in my mind during the time lag when the Discover Tool and Block Tree were not yet updated to show this "new" terminal. I have spent almost all of my time analyzing "I" Y haplo trees, not nearly as daunting as the "R."
                        Last edited by clintonslayton76; 8 September 2023, 02:30 PM.

                        Comment

                        Working...
                        X
                        😀
                        🥰
                        🤢
                        😎
                        😡
                        👍
                        👎