Announcement

Collapse
No announcement yet.

Maximum centimorgans

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • JOlson
    replied
    The biggest variable that I've seen is the cM conversion table being used. Everybody uses a different one, it seems. That means you may be comparing apples to oranges, to some extent. Below is a sampling I just took of parent/child values for 22 chromosomes:

    FTDNA 3384.3
    23andMe 3546
    GEDmatch 3587.1

    You may see minor variations in the values, but these represent a reasonable comparison between the different conversion tables.

    After the parent/child values, the total cM will reduce approximately half by each generation. That's a logarithmic relationship. As you get further away from the first generation, you will see more and more variability (as demonstrated in John Walden's graph), due to random inheritance differences, noise, and even multiple inheritance paths.
    Last edited by JOlson; 16 June 2012, 12:36 PM.

    Leave a comment:


  • Lklundin
    replied
    Originally posted by mkdexter View Post
    I guess you guys really do know you can not use a shared sum cM for any relationship past a second cousin using FTDNA raw data right?
    Well, yes. One would at the latest know it from the plot kindly contributed by John S Walden earlier in the thread. It clearly shows that the observed spread of segment length for those more distant relations causes the intervals to have significant overlaps.

    What I would like is to be able to quantify what probability a given observed match has of being caused by a certain relationship.

    For smaller matches (taking into account all the shared segments, if someone can explain how) I would still consider it useful to be able to say something like: This match has a 50% probability of corresponding to a relationship of no more than X births (using Waldens terminology) - and the 90% probability lies within Y such births (i.e. Y > X).

    Statements like that would carry some value when assessing the likelihood of an NPE for a given match.

    Originally posted by mkdexter View Post

    I could write a whole chapter of a book on why this is an issue and still not cover it all.
    I promise that whatever you would write on this topic would find at least one interested reader.

    Leave a comment:


  • blejerh
    replied
    I don't have a lot of data points yet, but they support the argument that the maximum number of centimorgans is not predictable with other than very close relatives. Notice the difference between my 5th1R and 6th1R, the latter being larger in both the total cM and in the largest segment (31 cM versus 10 cM).

    3rd cousin 148
    3rd cousn 1R (his daughter) 104
    5th cousin 1R 40
    6th cousin 1R 87
    7th cousin 1R 38

    Leave a comment:


  • mkdexter
    replied
    I guess you guys really do know you can not use a shared sum cM for any relationship past a second cousin using FTDNA raw data right?

    You also can not use a longest block length in certain situations either.

    This is why FTNDA uses multi-variable algorithms in their predictions and even then it is not always right.

    There are way too many variables in the mix

    I could write a whole chapter of a book on why this is an issue and still not cover it all.

    Matt.

    Leave a comment:


  • Lklundin
    replied
    Originally posted by Lklundin View Post
    I have the following relationships (known prior to FF-testing):
    births cM
    1: 3380
    2: 2559
    3: 1722
    6: 269, 205, 196, 148, 137
    7: 99, 99
    There is a caveat here and it applies to anyone else doing statistics on segment lengths from FF-results from a group of related people:

    The 10 values I quoted are not independent.

    For example, the two 7-birth values are from a child of one of the persons that matches via two 6-birth relations.

    So those 6- and 7-birth values are not independent, but instead correlated. This makes the interpretation of what is a (a)typical segment length for a given relation more difficult.

    Leave a comment:


  • Lklundin
    replied
    Originally posted by JSW View Post
    Attached is a graph of somewhat the same thing
    Would you be interested in more data points for your plot ?

    Using your definition I have the following relationships (known prior to FF-testing):
    births cM
    1: 3380
    2: 2559
    3: 1722
    6: 269, 205, 196, 148, 137
    7: 99, 99

    Leave a comment:


  • T E Peterman
    replied
    All of the connections I show on the chart are confirmed connection & all are are within the range that you suggest for each degree of kinship.

    Thanks again Matt & JSW for your help & suggestions with this.

    Timothy Peterman
    Administrator, FF_Peterman_Timothy project

    Leave a comment:


  • mkdexter
    replied
    FF = 3387cm, I was off by 2 going by memory.
    RF = 3586cm

    you can also use some averages such as:

    siblings share 2600-2800cM
    first cousins share 700-1100cM
    second cousins share 250-400cM

    and these may not even be what other people see, just some I have seen from confirmed connections.

    Matt

    Leave a comment:


  • T E Peterman
    replied
    Very cool chart. I'm going to try the gedmatch method to see if I can come up with a more precise number than 3385.

    Timothy Peterman

    Leave a comment:


  • JSW
    replied
    Matching cM

    Tim
    Attached is a graph of somewhat the same thing you are charting
    Note that by 1st cousin once removed there is a wide range in
    real matches and by 3rd cousin it is really difficult to predict
    the relationship based on matches.
    The sum of the length of the cM of the 22 chromosomes is about
    3500 cM but some of that is excluded from FTDNA matching.
    You can get the number by going to gedmatch.com and using the
    match two people utility and just give it the same ID. That is match
    to yourself - there are no mutations in that answer.
    The few mismatches you see in parent/child compare is more likely do
    to errors in the data from the measurement process. They are mostly not
    mutations.
    Attached Files

    Leave a comment:


  • T E Peterman
    replied
    I have attached a jpg image showing what I've been doing. The names of participants have been reduced to initials to protect their identities.

    Timothy Peterman
    Attached Files

    Leave a comment:


  • T E Peterman
    replied
    Thank you Matt. I appreciate your response. You did give me a reasonable divisor to work with (3385) & I am recalculating kinships.

    The only kinships that I am applying this to are known kinships:

    My father, his brothers, their first cousins, my maternal uncle & aunt, two of my mother's first cousins, several of her second cousins, my quadruple second cousin, twice removed, etc.

    I already know the kinships for each. I think it is nice, in follow up, to show them how much DNA they share with each other. It sort of enhances the interest & may encourage more participants. Most second cousins, especially those who have known each other since childhood, are surprised to learn that, on average, they share only 3.125% DNA that is identical by descent.

    I think second cousins only add meaningful results if one has a thick core that is already tested (a set of siblings & all of their first cousins). They match against a far greater pool & might pull out 10 or 15% of total matches, instead of about 3% or so.

    Timothy Peterman

    Leave a comment:


  • mkdexter
    replied
    ok so I just explained how it actually works, and you didn't really want to know that.. no problem. hope you have tons of luck in getting more recruits anyway.

    Your % estimates won't work after the 4th generation because FF doesn't use sum cM after that to calculate relationships.


    Matt
    Last edited by mkdexter; 16 May 2012, 06:14 PM.

    Leave a comment:


  • T E Peterman
    replied
    In the chart that I'm creating, I have estimated percentages shared on one side & actual on the other. With the exception of parent child (which is always 50%), the actual always varies from the estimated. That is why I'm taking the time to put this together.

    I think I will work with a 3385 divisor & share those results. Yes, my intent is to use this data as a means of recruiting additional people into Family Finder.

    As I stated in my initial message, I am well aware of the fact that siblings have to be handled different from the rest. Of course, the estimate for parent/ child is going to be 50% & the estimate for grandparent/ grandchild is going to be 25%, etc.

    The estimate I use for half siblings, aunts, uncles is alsio 25%.
    The estimate for first cousins is 12.5% & for first cousins 1R is 6.25%

    The estimate for second cousins is 3.125%, for second cousins 1R is 1.5625%

    The estimate for 3rd cousins is .78125%

    And so forth. Of course, the further the degree of kinship, the more the actual varies from the estimated.

    Timothy Peterman

    Leave a comment:


  • mkdexter
    replied
    Originally posted by T E Peterman View Post
    I am trying to determine the percentages of shared DNA between various relatives. I need to divide the total shared centimorgans for each into a divisor. Does anyone know the correct value for the divisor?
    There isn't one. Different comparisons will have different results, for example parent to child vs child to child is not the same comparison. One is based on matching alleles in a linear fashion and one is based on matching alleles in a shared pool fashion.

    Originally posted by T E Peterman View Post
    My father I share 3380.25 centimorgans. However, we have tiny gaps in chromosomes 6, 8, 9, & 12, presumably caused by mutations in me.

    I know of another parent/ child comparion in which the shared centimorgans are 3383.52, but they have tiny gaps in chromosomes 1 & 6.
    Those are just testing artifacts. The total cM in the test is near 3385cM however this is not a 50% inherited DNA quantity, it is a sum of the matching alleles, in this case a 100% match on the map. FTDNA takes out some results, ignores others so the amount of cM matching is not the exact amount the chip sees.

    Originally posted by T E Peterman View Post
    Once I know the divisor, for siblings, I will divide the total shared centimorgans into the divisor; take that result times 2/3 to establish the percentage.

    For aunts/ uncles, first cousins & beyond, I will divide the total shared centimorgans into the divisor; take that result & divide it by 2.
    That works because an aunt, uncle, nephew or niece is simply about half the match of the siblings, meaning one recombination event occurred in the comparison.

    Originally posted by T E Peterman View Post
    This seems to work fairly well. Using the 3380.25 value, my aunt & uncles are all about 25% plus or minus a percent or two; my first cousins are all about 12.5% plus or minus a percent or two.

    Cousins have two recombination events compared to the sibling ancestors but its not an exact 12.5% as you are trying to make it match the amount inherited on the chart and that is not what the test is looking at. It is looking at the amount shared, not the amount inherited. It may end up being close but it doesn't always have to be.

    Timothy Peterman
    Normally you would expect this because of the fact that DNA is passed down 50% from the parents however recombination from parent to child affects the actual matching amounts passed down so they may sway to one side or the other.

    The real issue in what you are wanting to do is the fact that there are two types of matches in the FF test; the match going down a linear line such as grandchild to grandparent, and the match that started as a shared pool of DNA between two full siblings. In the linear match you can pretty much calculate most numbers but in the shared pool match the entire match is based on how two full siblings shared DNA in the first place. From there you see aunts, uncles, etc., cousins all sharing a portion of the original pool. Recombination at each stage will determine how much matches and how much doesn't. In this case it is not inherited alone, but shared DNA that gives you the match. Shared could be from many things, for example inherited DNA, DNA from inter-relationships, double cousins, compounded segments, and much more.

    The numbers can be close but won't be perfect like a chart because a chart looks at 50% losses on down the lines while the FF test looks at alleles influenced by variations in recombination that match between two tests

    Matt.
    Last edited by mkdexter; 16 May 2012, 12:07 PM.

    Leave a comment:

Working...
X
😀
🥰
🤢
😎
😡
👍
👎