Announcement

Collapse
No announcement yet.

Relationship Range estimates

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Relationship Range estimates

    Relationship estimates on here don't seem to be as accurate as other sites. While estimates for actual relatives are good, it puts way too many matches into the Close Cousin (2nd-3rd) range. FTDNA doesn't filter out segments under 5 cM, so it often puts too much importance on total shared cM and not enough on how big those segments are. As someone with 100% Ashkenazi Jewish heritage, everyone comes up as a match and it's hard to filter out the noise. It would be great if we could set a lower limit on matching segment length when sorting our matches.

  • #2
    As a descendant of mainly Colonial families, with only a small fraction of later immigrant ancestry and a very small Jewish component, I too have found that FTDNA's estimates of relationship are wildly optimistic. The actual links, when they can be found at all, tend to be far more distant than predicted. However, even on GEDmatch, which appears to me to be far more transparent and neutral on this topic, predicted relationships beyond about 4 generations show the same bias. As the estimated connection becomes more remote, the probability curve quickly becomes less symmetrical. The uncertainty for close relationships is far less than the uncertainty on the other end of the curve. It is difficult to express that uncertainty in a simple "range" such as "3rd to 5th cousins", a formula that most people will likely interpret as "probably a 4th cousin, but possibly a 3rd or 5th cousin" (with the underlying assumption being that the uncertainty is symmetrical!). For me, the problem doesn't seem to be in the reported total shared cM, but rather in the nature of the uncertainty associated with the estimate.

    There is another difficulty that could distort the process of estimating the degree of relationship. When we see a statement such as "50 percent of 4th cousins don't match", there is an assumption that we actually know who our 4th cousins are, among our Family Finder matches. But I'm fairly sure there must be many actual 4th cousins among our matches that we aren't aware of. Very few genealogists will actually know every living descendant of their ancestors that far back. To the extent that the statistics are based on reports from individuals who have researched their matches, there will be biases of some kind in the data due to incomplete reporting.

    In practice, then, I usually give little weight to the estimates of relationship beyond about 2nc cousins, and far more weight to the total shared cM. I keep a copy of the "DNA Detectives Autosomal Statistics Chart" next to my computer. Most useful genetic genealogy chart ever!

    Comment


    • #3
      I've seen some online presentations by Blaine Bettinger, who blogs as The Genetic Genealogist. His take on relationship estimates by any of the DNA-testing companies is to ignore them. He uses the DNA range charts instead, such as the DNA Detectives Autosomal Statistics Chart (with explanatory image), and his own Shared cM Project chart. Further, at FTDNA, he advises subtracting all segments less than 5 cM, then use the remaining amount of cM with the charts.

      Right now I'm having trouble with the FTDNA website, but using the FTDNA Chromosome Browser, you need to get to the screen that shows the shared segment data with your match, and discard those segments under 5cM.

      Comment


      • #4
        My research is showing that Family Finder's 1.0 cM segment matches are not false positives.

        Comment


        • #5
          Jack, I admire you and your knowledge greatly and always enjoy reading your posts; but my eyes roll when you talk about 1cMs being of any value. Most/many genetic genealogists say we are ALL related at that level and consider 1cM to be a "population match".

          So, okay, convince me you are correct. Please give some examples.

          Comment


          • #6
            Originally posted by georgian1950 View Post
            My research is showing that Family Finder's 1.0 cM segment matches are not false positives.
            I will admit to seeing some 1cm segment matches that I have reason to believe they're valid, but I've also seen FTDNA, and even Ancestry, tack on additional "segment match length" for matches that belong to me or a sibling vs matches that belong to either of my parents(total match usually declines, but not always). I'm not a closer relation to that person than either one of my parents, so the only conclusion to be made is the matching algorithm is "Being greedy" and throwing an erroneous result by doing an a'la cart grab of my DNA sequence and stringing very short sequences from my DNA which respectively belonged to both of my parents. (As it has no way of knowing, short of throwing a LOT more processing at it when/if a person has had their parents/other close family tested, where those GATC letters came from)

            Yes, some of those results will be valid, but just as many, or more, of them can be invalid. Length is one of the primary ways of discriminating, as it is going to be rare(but not unheard of) for "invalid strings" to run for very long. Which is why the "false positive" rate for autosomal is so high when you get below 10cm, but it starts dropping like a rock as the length increases.

            Comment


            • #7
              The larger matching segment size with child rather than with a parent does not prove a segment false. The parents share a common ancestor somewhere back there and both contribute enough DNA of that common ancestor to get the matching segment size of the child larger than a parent's (or causes a match to appear with the child when neither parent meets the threshold to show a match).

              Jack

              Comment


              • #8
                Originally posted by georgian1950 View Post
                The larger matching segment size with child rather than with a parent does not prove a segment false. The parents share a common ancestor somewhere back there and both contribute enough DNA of that common ancestor to get the matching segment size of the child larger than a parent's (or causes a match to appear with the child when neither parent meets the threshold to show a match).
                In some of the cases I've seen, there were "valid matches" in play, but that doesn't mean the "extra segment length" on the child is valid in any way shape or form.

                More likely than not, that extra length is IBC(inheritance by chance) rather than IBD(inheritance by descent). Particularly when you consider the human genome only consists of 4 letters expressed in 4 letter "words" which the matching algorithms are only looking for a 2 letter match on...(with any given "word") Granted, even a partial CM is longer than "1 word" in this context, but that is where the probabilities come into play as to the odds of getting "2 letter matches" on the first/last portion of a series of sequential "words" in the mix.

                More likely in the most extreme case I've seen where +9cm on a largest segment match happened was that a "IBC event" from the paternal line bridged the gap on small IBD segments on the maternal line and "glued" the segments back together again, but it was very much a dumb-luck kind of event which had zero relevance as to the relatedness of the paternal line with that particular maternal line match.

                Comment


                • #9
                  I recently transferred my family's results to MyHeritage and was struck by how much better a job it did of estimating relationships and sorting important matches to the top of the list. They even let you review a list of common matches with each person, sorted by distance and number of matching segments to both of you, and highlight triangulated segments. After a week I already broke through one of my brick walls by tracing back a couple of those matches, because their interface did an excellent job of helping me figure out where I should be looking and who was related to who. Clearly FTDNA can do better.

                  I'm not saying 1 cM segments should be tossed out entirely, but we can certainly let the computers do more of the work of sorting important matches for us instead of having to review every single one in the chromosome browser. I normally have to do research for every one of my matches to figure out how we're related. That's a lot of work in and of itself. If we can get better sorting algorithms to tell us who we should be looking at it'll save huge amounts of time on this site.

                  Comment


                  • #10
                    Originally posted by honbadger View Post
                    I recently transferred my family's results to MyHeritage and was struck by how much better a job it did of estimating relationships and sorting important matches to the top of the list.
                    I find their site to be all but worthless.

                    Comment


                    • #11
                      Originally posted by Jim Barrett View Post

                      I find their site to be all but worthless.
                      What does FTDNA give you that MyHeritage doesn't? If anything I've found more useful features on there- a triangulation tool paired with the chromosome browser that FTDNA doesn't have, and more matches with family trees.

                      Comment


                      • #12
                        Originally posted by Jim Barrett View Post

                        I find their site to be all but worthless.
                        Jim, is that due to a lack of matches for you, or because you don't like their tools or other features?

                        Comment


                        • #13
                          Of the four companies with my autosomal data I have the most matches at Ancestery with MyHeritage number 2, but I have more identifiable matches at 23andMe, with their lack of tools, than I have at MyHeritage.

                          Comment


                          • #14
                            Originally posted by honbadger View Post

                            What does FTDNA give you that MyHeritage doesn't? If anything I've found more useful features on there- a triangulation tool paired with the chromosome browser that FTDNA doesn't have, and more matches with family trees.
                            I'm not a fan of MyHeritage's interface either. The inability to notate matches or otherwise leverage a lot of the information it provides tends to turn it into an exercise in frustration. The absence of matches closer than 3rd cousin also causes annoyance for other more obvious reasons. That said, my father's family lines being relatively recent immigrants to the US, MyHeritage does seem to have a better array of DNA tests for people from the parts of Europe his family comes from. His match list is bigger over there than on Ancestry, but next to nobody close in.

                            When you pair that with MyHeritage's change this past November with their matching service, and it is hard to justify getting people to copy their Ancestry or FTDNA results over there. I'd just as soon direct them here to FTDNA.

                            Comment


                            • #15
                              FTDNA made a major change to how they classified folks a few years ago -- before then, practically everyone in the 2nd-3rd Cousin Range was a second cousin or second once removed. It was great. Then they changed it, and lots and lots of folks got tossed into that category. Undoubtedly means some folks who belong and would have not been included before are now included - BUT!
                              Moreover, there must be something beside total cM that they are looking at, since two folks with the same shared cM may be given different Relationship Ranges, but I have no idea what -
                              clearly not simply longest segment since have seen 131 cM shared, 8 cM longest included in the 2nd-3rd Cousin Range.

                              One of the Kits that I manage has two matches that, based on the names, and how they match the Kit I manage in exactly the same places, are almost certainly the same person --
                              both with 149 cM shared, longest 9 cM -- to one, Relationship Range is 2nd-3rd -- to the other it is 5th-Remote

                              Also - did FTDNA just make a change? -- each person I have checked today has the count in Close and Immediate having dropped, despite one or two new matches since I last looked.

                              Comment

                              Working...
                              X