Go Back   Family Tree DNA Forums > Universal Lineage Testing (Autosomal DNA) > Family Finder Advanced Topics

Family Finder Advanced Topics Advanced discussion about Family Tree DNA's Family Finder Product.

Reply
 
Thread Tools Display Modes
  #1  
Old 7th July 2017, 04:02 PM
Frederator Frederator is offline
FTDNA Customer
 
Join Date: Jul 2010
Posts: 754
The matching probability function

In my last thread, I asked about a specific point on the matching probability curve (i.e., the probability that any two descendants of MRCAs at X generations remove will register as matches to one another in Family Finder). That discussion started to veer into a discussion of the overall probability function, which was interesting, but only tangentially related to my original question. I thought I'd like to establish a separate thread to discuss the overall function in more detail.

http://forums.familytreedna.com/show...587#post441587

I've tried to reverse-engineer a calculator based on FTDNA's published figures through the 5th cousin relationship.

https://www.familytreedna.com/learn/...finder-detect/

I've had a little success. I'm pretty sure I have the basic form of the equation (exponential) correct, and I have replicated FTDNA's results for the relationships presented.

I want to customize my calculator to extend the probability curve indefinitely beyond the 6th cousin relationship, and tailor the prediction for the specific gender of ancestors in each donor's line of descent and the likely effects of endogamy.

One thing that would really help is to understand precisely how the gender-specific recombination rates feed into the overall function. I now know this from ISOGG:

-For each child, females typically experience 41 crossovers over 22 autosomal chromosomes. I guess that's a rate of ~1.8636.

-Males typically experience 27 crossovers, for a rate of ~1.2273, or ~2/3 or ~65.8537% of the female rate.

https://isogg.org/wiki/Recombination

Until today, I only knew the approximate ratio between the male and the female rates, not the specific female rate itself. That was good enough for my original purpose, which was limited to applying different gender assumptions to a single point on the curve, but not really good enough to work outside that point.

Before I knew this, and simply to get started with the project, I used a trial female rate of 2, and just backed into the male and intersex average rates proportionately. I was more preoccupied with working through the detailed logic of predicting the typical size of specific segments shared by both donors than the level of precision of the result.

Amazingly, for the few data points that FTDNA has published the probability of matching, my working model was spot on. Well, as spot on as it is possible to be for the vague figures of ">50%" or ">10%".

But on the face of it, this shouldn't be. Using ISOGG's actual rate of ~1.8636 instead of 2 takes me way far away from FTDNA's #s.

I have an intuition as to why this may be so, but I'm hoping that some reader can point me to some authoritative literature than can help me make sure and get more precise. My guess is that a female recombination rate of 2 is more appropriate to the Family Finder product specifically based on the particular array of SNPs tested.

I think it's pretty widely acknowledged that FTDNA doesn't test the whole genome because only a portion of it is diverse enough to be useful for genealogical applications. So it makes logical sense to me that a higher crossover rate may apply to the SNPs tested by FTDNA as compared to the entire genome as a whole.

If that is so, does anybody have an idea as to how I can get a more precise figure for the female recombination rate for the region tested by FTDNA? I'm sure that I can do better than the totally blind stab in the dark I took at the beginning of this process.

Last edited by Frederator; 7th July 2017 at 04:05 PM.
Reply With Quote
  #2  
Old 7th July 2017, 10:39 PM
Frederator Frederator is offline
FTDNA Customer
 
Join Date: Jul 2010
Posts: 754
I still would like some direct documentary background on the specific female crossover rate used in FTDNA's calculation, but I think I've found a way to cross-check my reverse engineering of it.

Back in the original post I examined one very specific data point for FTDNA's published matching probability table, 4th cousins.

http://forums.familytreedna.com/showthread.php?t=42035


Everything there was identical with a real-life case I encountered except that the real-life cousins were both direct male line descendants of the MRCAs. So because I had the ratio of the male rate to the female rate I could calculate the ratio of the male rate to the intersex average rate, and use Algebra to estimate the probability for my scenario. For that I didn't need to know the precise rates used by FTDNA.

I came up with ~122%, limited to 100%, which I believe is due to an expected largest segment size > than the minimum 7 cM to register as a match.

That was a bit of an understatement, actually. Because BOTH donors were direct male line descendants of the common ancestors, I should have squared my ratio. That's how you calculate the union of independent probabilistic events.

I consciously didn't go into that during my original explanation because I wanted to make the discussion as simple as possible, plus it would only have reinforced my conclusion that two direct male line 4th cousins would have a 100% chance to match at the minimum 7cM level.

Anyhow, I went back to my calculator at looked at the un-limited probability of two 4th cousins matching, one being a direct male line descendant and the other using the intersex-average rates implied by the ratio between the rates and my naive 'default' of 2 for the female rate. I got ~125%.

So, ~122% using Algebra for this specific point from the FTDNA published data, vs. ~125% for a full-bore mock-up for the whole curve, using a completely naively selected female recombination rate of 2. Pretty darned good. It should be very easy for me now to arrive at a much closer estimate of the female recombination rate used by FTDNA in its calcs.

Still would like some confirmation as to why. Intuitively it makes sense that the portion of the genome deemed diverse enough for Family Finder testing would have a higher crossover rate than the genome as a whole, but it would be awesome to have some more direct authority for the specific number.
Reply With Quote
  #3  
Old 8th July 2017, 09:48 AM
Ann Turner Ann Turner is offline
FTDNA Customer
 
Join Date: Apr 2003
Posts: 1,117
All companies use the sex-averaged recombination rate for their calculations, since they have no knowledge of the proportion of males and females in the lines of descent.

SNPs are selected to provide sampling points distributed across the entire genome. It's not so much that SNPs are selected "for" a high recombination rate, but the recombination rate can be calculated more accurately if there are SNPs reasonably close to the actual cross-over points found in the sample used to create the look-up tables.
Reply With Quote
  #4  
Old 8th July 2017, 10:14 AM
Frederator Frederator is offline
FTDNA Customer
 
Join Date: Jul 2010
Posts: 754
Thanks, Ann.

I guess I should be using the term 'sex-averaged' from now on.

Your points are well taken, but I'd like an opinion on one of the latent implications of this process, namely that in a limited number of circumstances it might be, at least "close to possible", to prove a direct male line NPE using autosomal data alone.

I realize that this project of mine can't have the same authority as a company-issued calculator, but if the theory and execution are correct, some relationships as remote as 4th cousins should have a very nearly 100% chance of registering as Family Finder matches, provided both donors claim direct male line descent from the MRCAs.
Reply With Quote
  #5  
Old 8th July 2017, 10:35 AM
Ann Turner Ann Turner is offline
FTDNA Customer
 
Join Date: Apr 2003
Posts: 1,117
That's an interesting thought, but I think the lower recombination rate is a two-edged sword here. Male transmissions will result in longer -- but fewer -- segments. The smaller number of segments would be more subject to complete disappearance. See this blog post; it just discusses grandparents, but the logic can be extended to more generations.

https://gcbias.org/2013/10/20/how-mu...r-grandparent/
Reply With Quote
  #6  
Old 8th July 2017, 11:01 AM
georgian1950 georgian1950 is offline
FTDNA Customer
 
Join Date: Jun 2012
Posts: 621
I suspect the recombination rate is a whole lot higher than what most researchers believe. Who knows if it actually varies by sex since we have a major factor completely unknown to researchers. I have found a huge source of common ancestry for most of us within the past three hundred years. The large matching segments a long way back which some experts say are undoubtedly IBD are anything but. They actually are what I call reconstructed segments. They match a segment of the common ancestors's DNA, but they are formed from the kits being compared having multiple paths back to the common ancestor. The different paths fill in the gaps to make the matching segment appear as if it had been passed down intact.

Jack Wyatt
Reply With Quote
  #7  
Old 8th July 2017, 11:21 AM
Frederator Frederator is offline
FTDNA Customer
 
Join Date: Jul 2010
Posts: 754
Interesting. I'll have to think about this before I come to any firm conclusions.

But here also is a thought: the distributions demonstrated are symmetrical. That implies that the paternal grandmother's contribution is just as likely to be short-changed as the paternal grandfather's.

If that is so, it doesn't change my expected value (i.e., average) contribution from the paternal grandfather, but only increase the deviation of paternal grandfathers' contribution among different individual observations. It would change the standard deviation statistic, but not the mean, which is the statistic returned by my calculation.

Ideally, I would have liked to create a confidence interval for my calculator, but I highly doubted the required information would be available. Maybe the information lies in this article.


Quote:
Originally Posted by Ann Turner View Post
That's an interesting thought, but I think the lower recombination rate is a two-edged sword here. Male transmissions will result in longer -- but fewer -- segments. The smaller number of segments would be more subject to complete disappearance. See this blog post; it just discusses grandparents, but the logic can be extended to more generations.

https://gcbias.org/2013/10/20/how-mu...r-grandparent/
Reply With Quote
  #8  
Old 8th July 2017, 11:59 AM
Frederator Frederator is offline
FTDNA Customer
 
Join Date: Jul 2010
Posts: 754
Quote:
Originally Posted by Ann Turner View Post
All companies use the sex-averaged recombination rate for their calculations, since they have no knowledge of the proportion of males and females in the lines of descent. . .
One possible interpretation of the article you showed me is that the companies don't care what proportion of males and females are in the line of descent because the difference in recombination rates has no effect on the average contribution from each grandparent, only on the variation between grandparents' contribution passed on to individual grandchildren.

I'm not sure I've reached a super firm conclusion yet, but my calculator's use of gender-specific recombination rates may only return the upper or lower ends of the confidence interval. An interesting statistic to be sure, but probably not one you can draw terribly sharp inferences from.
Reply With Quote
  #9  
Old 8th July 2017, 12:26 PM
Frederator Frederator is offline
FTDNA Customer
 
Join Date: Jul 2010
Posts: 754
At this point I can accept quite easily that there will be greater volatility in the total volume of DNA contributed by each paternal grandparent as compared to each maternal grandparent.

But that doesn't change the fact that that paternal lines of descent still favor larger individual segments, and that's what triggers matching.

I may wonder about the precise degree to which this favors, on average, matching for paternal lines of descent. Probably not to the full extent of the differential between male and female or even sex-averaged recombination rates. But, on average, it does still seem to favor paternal lines.
Reply With Quote
  #10  
Old 8th July 2017, 05:19 PM
Frederator Frederator is offline
FTDNA Customer
 
Join Date: Jul 2010
Posts: 754
After a lot of thought, I don't think there is any reasonable adjustment that can be made to my calculator for the variation in paternal grandparents' contribution. At least based on this information.

True, the variation in paternal grandparents' contribution is noticeably higher than for the maternal grandparents, but not enough to reasonably cause significant change to my calculated likelihood of direct paternal cousins matching through the 7th cousin relationship.

The tails on that distribution give a less than 20% contribution from the paternal grandfather in only 0.5% of cases. The tails of 0.5% imply no occurrences in 199 of 200 cases, and even then there is an equal chance that the paternal grandfather's contribution could be higher than 80%. Compare this to the mere 8 recombination events in the direct male line of a 7th cousin relationship and any impact seems extremely unlikely.

So, based on what I know now, in the specific case I cited, where two direct paternal 4th cousins don't register as a Family Finder match, I think the correct conclusion is that there is indeed a heightened chance of NPE in one of the lines. I think people could argue whether the chance is very very high or merely very high, but at least it is high.

Last edited by Frederator; 8th July 2017 at 05:21 PM.
Reply With Quote
Reply

Bookmarks


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Surname probability fostert Paternal Lineage (Y-DNA STR) Advanced 5 29th June 2016 09:07 PM
Matching Probability % for 3rd Cousins Songbill DNA and Genealogy for Beginners 2 10th January 2015 02:18 PM
Understanding Y haplotype matching probability PNGarrison Scientific Papers 2 15th November 2014 11:07 AM
A question on probability... constant_d Paternal Lineage (Y-DNA STR) Advanced 5 16th July 2006 07:18 AM
Probability of 37/37 match between brothers gpenner DNA and Genealogy for Beginners 3 16th April 2006 12:23 AM


All times are GMT -5. The time now is 12:23 PM.


Family Tree DNA - World Headquarters

1445 North Loop West, Suite 820
Houston, Texas 77008, USA

Phone: (713) 868-1438 | Fax: (832) 201-7147
Copyright 2001-2010 Genealogy by Genetics, Ltd.
Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.