Announcement

Collapse
No announcement yet.

Total shared vs. Longest block

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Total shared vs. Longest block

    I can't find any explanation of whether and how the "total shared cm" is used to determine the strength of a match. Can someone help me understand, please? There are several people on my FF results with total shared cm in the 50-60 range, which I gather would be a somewhat strong match if it were the "longest block" value, but the longest blocks are quite short in most of these cases, around 11-15. And when I look at the Chromosome browser, the longest block is the only one that shows up at all.

    If the longest block is 11, and the total shared is 60, does that mean the real identical-by-descent match is only about 11 cm, and other 49 cm of matches are just coincidental/noise? If so, what is the purpose of including this value on the FF results?

    Thank you!

  • #2
    FTDNA counts all secondary segments of over 1 cM toward the total shared. 23andMe only counts segments of over 5, which most people in the field think is a more prudent approach. I would just recalculate the total based on segments of 5 cM or greater. Chromosome browser defaults to a viewing resolution of 5 cM. If you're only seeing that longest block, it's a pretty remote match. That's usually true of single segment matches like these even when they're over 20 cM.

    Originally posted by Wojciechowski View Post
    I can't find any explanation of whether and how the "total shared cm" is used to determine the strength of a match. Can someone help me understand, please? There are several people on my FF results with total shared cm in the 50-60 range, which I gather would be a somewhat strong match if it were the "longest block" value, but the longest blocks are quite short in most of these cases, around 11-15. And when I look at the Chromosome browser, the longest block is the only one that shows up at all.

    If the longest block is 11, and the total shared is 60, does that mean the real identical-by-descent match is only about 11 cm, and other 49 cm of matches are just coincidental/noise? If so, what is the purpose of including this value on the FF results?

    Thank you!

    Comment


    • #3
      Originally posted by Wojciechowski View Post
      I can't find any explanation of whether and how the "total shared cm" is used to determine the strength of a match. Can someone help me understand, please?
      FTDNA prefers to keep that information proprietary at the present.

      They use a multi-variable reading to make predictions: longest block, second block and sum. It works out that anything matching will need to meet a minimum sum (something near 20cM) to be considered, and the other variables are used to determine the strength. The sum is also used to determine if the match is close, like a 2nd cousin or closer, vs. distant, like a 3rd cousin or farther.
      Last edited by mkdexter; 9th September 2013, 10:50 AM.

      Comment


      • #4
        Both the 20 cM minimum and the counting of under 5 cM segments are serious flaws that do a disservice to the customer, imo – the former because it favors endogamous matches over non-endogamous ones and leads to the exclusion of potentially informative, albeit remote matches, the latter because it frequently leads to badly inflated relationship estimates.

        To illustrate, I have 4 predicted 2nd-4th cousins on my non-Jewish side. 2 are predicted as 3rd, and 2 as 4th. One of the predicted 4th cousins shares a second segment of over 7 cM. None of the others has one over 5. The longest segment lengths range from 19.29-24.36. I know with certainty that two of these 4, the predicted 3rd cousins, cannot be any closer than 6th and are probably even more remote. There's nothing to suggest that the other 2 are as close as 4th, but that's as close as they could possibly be based on where the paper trail peters out.

        Originally posted by mkdexter View Post
        FTDNA prefers to keep that information proprietary at the present.

        They use a multi-variable reading to make predictions: longest block, second block and sum. It works out that anything matching will need to meet a minimum sum (something near 20cM) to be considered and the other variables are used to determine the strength.
        Last edited by NYMark; 9th September 2013, 11:04 AM.

        Comment


        • #5
          I hope one day they have some ways for the user to play with the match characteristics, depending on the goal of the research.

          For me, I did a graph that mapped total cM with closeness of relationship and showed longest shared segment as size of bubble. Total was the x axis, relationship was the y axis. I have done it with different scales to focus on different parts.

          The close relations (parent, siblings, uncle/aunt/niece/nephew, kids, grandkids, cousins, 2nd cousins) follow a shallow curve. Then it drops off and you get more widely scattered 3rd and 4th and 5th cousins around the steep part of the curve.

          If I add unknown relations, I have none that are closer than 3rd cousin prediction, and those closest ones are on that messy part of the steep part of the curve.

          Then when I get to 5th and remote the curve starts to edge back to the right with more total cM! That is where the folks who are 8th cousins but share lots of common ancestry show up.

          I fish for distant cousins to confirm my paper trail or get past roadblocks in that bottom pool. I have confirmed relationships out to 10th cousin (but mostly 5th to 7th, I think). Some I have been able to cross check with Y results.

          I have found 4th and 5th cousins in the steep part of the curve, including pushing some of my goals for finding missing great great great great grandparents (and thence further - there is always a new missing great great right after you resolve one.)

          My biggest remaining research goals would be in the 3rd cousin and 4th cousin range, but not as many of those are out there and testing.

          I think different screening and parameters might help we look at those three parts of the curve. Flat top, Steep ascent, and messy pool.

          Comment

          Working...
          X