
Family Finder Advanced Topics Advanced discussion about Family Tree DNA's Family Finder Product. 

Thread Tools  Display Modes 
#1




The matching probability function
In my last thread, I asked about a specific point on the matching probability curve (i.e., the probability that any two descendants of MRCAs at X generations remove will register as matches to one another in Family Finder). That discussion started to veer into a discussion of the overall probability function, which was interesting, but only tangentially related to my original question. I thought I'd like to establish a separate thread to discuss the overall function in more detail.
http://forums.familytreedna.com/show...587#post441587 I've tried to reverseengineer a calculator based on FTDNA's published figures through the 5th cousin relationship. https://www.familytreedna.com/learn/...finderdetect/ I've had a little success. I'm pretty sure I have the basic form of the equation (exponential) correct, and I have replicated FTDNA's results for the relationships presented. I want to customize my calculator to extend the probability curve indefinitely beyond the 6th cousin relationship, and tailor the prediction for the specific gender of ancestors in each donor's line of descent and the likely effects of endogamy. One thing that would really help is to understand precisely how the genderspecific recombination rates feed into the overall function. I now know this from ISOGG: For each child, females typically experience 41 crossovers over 22 autosomal chromosomes. I guess that's a rate of ~1.8636. Males typically experience 27 crossovers, for a rate of ~1.2273, or ~2/3 or ~65.8537% of the female rate. https://isogg.org/wiki/Recombination Until today, I only knew the approximate ratio between the male and the female rates, not the specific female rate itself. That was good enough for my original purpose, which was limited to applying different gender assumptions to a single point on the curve, but not really good enough to work outside that point. Before I knew this, and simply to get started with the project, I used a trial female rate of 2, and just backed into the male and intersex average rates proportionately. I was more preoccupied with working through the detailed logic of predicting the typical size of specific segments shared by both donors than the level of precision of the result. Amazingly, for the few data points that FTDNA has published the probability of matching, my working model was spot on. Well, as spot on as it is possible to be for the vague figures of ">50%" or ">10%". But on the face of it, this shouldn't be. Using ISOGG's actual rate of ~1.8636 instead of 2 takes me way far away from FTDNA's #s. I have an intuition as to why this may be so, but I'm hoping that some reader can point me to some authoritative literature than can help me make sure and get more precise. My guess is that a female recombination rate of 2 is more appropriate to the Family Finder product specifically based on the particular array of SNPs tested. I think it's pretty widely acknowledged that FTDNA doesn't test the whole genome because only a portion of it is diverse enough to be useful for genealogical applications. So it makes logical sense to me that a higher crossover rate may apply to the SNPs tested by FTDNA as compared to the entire genome as a whole. If that is so, does anybody have an idea as to how I can get a more precise figure for the female recombination rate for the region tested by FTDNA? I'm sure that I can do better than the totally blind stab in the dark I took at the beginning of this process. Last edited by Frederator; 7th July 2017 at 05:05 PM. 
#2




I still would like some direct documentary background on the specific female crossover rate used in FTDNA's calculation, but I think I've found a way to crosscheck my reverse engineering of it.
Back in the original post I examined one very specific data point for FTDNA's published matching probability table, 4th cousins. http://forums.familytreedna.com/showthread.php?t=42035 Everything there was identical with a reallife case I encountered except that the reallife cousins were both direct male line descendants of the MRCAs. So because I had the ratio of the male rate to the female rate I could calculate the ratio of the male rate to the intersex average rate, and use Algebra to estimate the probability for my scenario. For that I didn't need to know the precise rates used by FTDNA. I came up with ~122%, limited to 100%, which I believe is due to an expected largest segment size > than the minimum 7 cM to register as a match. That was a bit of an understatement, actually. Because BOTH donors were direct male line descendants of the common ancestors, I should have squared my ratio. That's how you calculate the union of independent probabilistic events. I consciously didn't go into that during my original explanation because I wanted to make the discussion as simple as possible, plus it would only have reinforced my conclusion that two direct male line 4th cousins would have a 100% chance to match at the minimum 7cM level. Anyhow, I went back to my calculator at looked at the unlimited probability of two 4th cousins matching, one being a direct male line descendant and the other using the intersexaverage rates implied by the ratio between the rates and my naive 'default' of 2 for the female rate. I got ~125%. So, ~122% using Algebra for this specific point from the FTDNA published data, vs. ~125% for a fullbore mockup for the whole curve, using a completely naively selected female recombination rate of 2. Pretty darned good. It should be very easy for me now to arrive at a much closer estimate of the female recombination rate used by FTDNA in its calcs. Still would like some confirmation as to why. Intuitively it makes sense that the portion of the genome deemed diverse enough for Family Finder testing would have a higher crossover rate than the genome as a whole, but it would be awesome to have some more direct authority for the specific number. 
#3




All companies use the sexaveraged recombination rate for their calculations, since they have no knowledge of the proportion of males and females in the lines of descent.
SNPs are selected to provide sampling points distributed across the entire genome. It's not so much that SNPs are selected "for" a high recombination rate, but the recombination rate can be calculated more accurately if there are SNPs reasonably close to the actual crossover points found in the sample used to create the lookup tables. 
#4




Thanks, Ann.
I guess I should be using the term 'sexaveraged' from now on. Your points are well taken, but I'd like an opinion on one of the latent implications of this process, namely that in a limited number of circumstances it might be, at least "close to possible", to prove a direct male line NPE using autosomal data alone. I realize that this project of mine can't have the same authority as a companyissued calculator, but if the theory and execution are correct, some relationships as remote as 4th cousins should have a very nearly 100% chance of registering as Family Finder matches, provided both donors claim direct male line descent from the MRCAs. 
#5




That's an interesting thought, but I think the lower recombination rate is a twoedged sword here. Male transmissions will result in longer  but fewer  segments. The smaller number of segments would be more subject to complete disappearance. See this blog post; it just discusses grandparents, but the logic can be extended to more generations.
https://gcbias.org/2013/10/20/howmu...rgrandparent/ 
#6




I suspect the recombination rate is a whole lot higher than what most researchers believe. Who knows if it actually varies by sex since we have a major factor completely unknown to researchers. I have found a huge source of common ancestry for most of us within the past three hundred years. The large matching segments a long way back which some experts say are undoubtedly IBD are anything but. They actually are what I call reconstructed segments. They match a segment of the common ancestors's DNA, but they are formed from the kits being compared having multiple paths back to the common ancestor. The different paths fill in the gaps to make the matching segment appear as if it had been passed down intact.
Jack Wyatt 
#7




Interesting. I'll have to think about this before I come to any firm conclusions.
But here also is a thought: the distributions demonstrated are symmetrical. That implies that the paternal grandmother's contribution is just as likely to be shortchanged as the paternal grandfather's. If that is so, it doesn't change my expected value (i.e., average) contribution from the paternal grandfather, but only increase the deviation of paternal grandfathers' contribution among different individual observations. It would change the standard deviation statistic, but not the mean, which is the statistic returned by my calculation. Ideally, I would have liked to create a confidence interval for my calculator, but I highly doubted the required information would be available. Maybe the information lies in this article. Quote:

#8




Quote:
I'm not sure I've reached a super firm conclusion yet, but my calculator's use of genderspecific recombination rates may only return the upper or lower ends of the confidence interval. An interesting statistic to be sure, but probably not one you can draw terribly sharp inferences from. 
#9




At this point I can accept quite easily that there will be greater volatility in the total volume of DNA contributed by each paternal grandparent as compared to each maternal grandparent.
But that doesn't change the fact that that paternal lines of descent still favor larger individual segments, and that's what triggers matching. I may wonder about the precise degree to which this favors, on average, matching for paternal lines of descent. Probably not to the full extent of the differential between male and female or even sexaveraged recombination rates. But, on average, it does still seem to favor paternal lines. 
#10




After a lot of thought, I don't think there is any reasonable adjustment that can be made to my calculator for the variation in paternal grandparents' contribution. At least based on this information.
True, the variation in paternal grandparents' contribution is noticeably higher than for the maternal grandparents, but not enough to reasonably cause significant change to my calculated likelihood of direct paternal cousins matching through the 7th cousin relationship. The tails on that distribution give a less than 20% contribution from the paternal grandfather in only 0.5% of cases. The tails of 0.5% imply no occurrences in 199 of 200 cases, and even then there is an equal chance that the paternal grandfather's contribution could be higher than 80%. Compare this to the mere 8 recombination events in the direct male line of a 7th cousin relationship and any impact seems extremely unlikely. So, based on what I know now, in the specific case I cited, where two direct paternal 4th cousins don't register as a Family Finder match, I think the correct conclusion is that there is indeed a heightened chance of NPE in one of the lines. I think people could argue whether the chance is very very high or merely very high, but at least it is high. Last edited by Frederator; 8th July 2017 at 06:21 PM. 
Bookmarks 
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)  
Thread Tools  
Display Modes  


Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Surname probability  fostert  Paternal Lineage (YDNA STR) Advanced  5  29th June 2016 10:07 PM 
Matching Probability % for 3rd Cousins  Songbill  DNA and Genealogy for Beginners  2  10th January 2015 03:18 PM 
Understanding Y haplotype matching probability  PNGarrison  Scientific Papers  2  15th November 2014 12:07 PM 
A question on probability...  constant_d  Paternal Lineage (YDNA STR) Advanced  5  16th July 2006 08:18 AM 
Probability of 37/37 match between brothers  gpenner  DNA and Genealogy for Beginners  3  16th April 2006 01:23 AM 