
Family Finder Advanced Topics Advanced discussion about Family Tree DNA's Family Finder Product. 

Thread Tools  Display Modes 
#11




And just as a reminder, per a model that has completely agreed with all the sexaveraged matching stats published by the company, the unlimited probability of two direct male line 4th cousins matching at the 7 cM level or higher is ~747%. That in the context of a less than 2% chance that one of the ancestors in one donor's line of descent received 20% or less of his paternal contribution from his grandfather, and an equal chance that he received more than 80% from that same grandfather.
I would be a little less curious about the result if the unlimited probability were only slightly above 100% or we were talking about more remote cousins. Last edited by Frederator; 8th July 2017 at 09:29 PM. 
#12




Just out of curiosity, I ran my calculator assuming one of the donors experienced the 2% scenario where one of the people in the line of descent gets as little as 20% of their paternal contribution from their grandfather. Still got a limited probability of 100% to match the other direct paternal 4th cousin. No doubt this would have more impact at a more remote relationship.
Last edited by Frederator; 9th July 2017 at 02:50 PM. Reason: Clarification 
#13




How does your computation relate to the widely quoted rule of thumb (and I don't know whose thumb it was, or if the figure is correct or even close) that about 50% of 4th cousins are not detectable as autosomal matches? I've been wondering if the observations and the mathematical models are on the same page yet.

#14




That is almost the exact figure that I get using the sexaverage recombination rate for both donor lines. Pretty much spot on for all the published figures.
The difference in results between the sexaverage and male recombination rates is very striking, isn't it? 
#15




The matching probability function seems to be based on the expected volume of "intact" DNA from the target ancestor as a % of the total DNA tested. From there it's moreorless a standard union of probabilistic events calculation, conditioned by a target segment size.
The volume of intact DNA for a given ancestor is determined primarily by the gender of the ancestors in the direct line of descent. The typical recombination rate for men being ~2/3 that of women, expected segment size for primarily male lines of descent is larger than for mixed or female lines, and the differential increases exponentially over the span of generations. For each child there's more variability in the % of DNA inherited from each paternal grandparent as compared to each maternal grandparent, but over the length of a line of descent they all seem to average out to a standard 25%. There is always the chance of an unlikely event having a significant impact on the volume of intact DNA inherited from a target ancestor. But for relatively recent target ancestors, the low odds of such an occurrence typically mitigate against a significant impact. So the likelihood of matching related to ancestors from whom you descend in a primarily male line seems very much more likely than for ancestors of a mixed or primarily female line. Never absolutely 100%, but sometimes pretty darn close. The fact that these socalled maleline segments are larger means that there will be fewer of them than there would be mixed or femaleline segments over a similar span of cM. But that doesn't reduce your chance for a match related to a specific male line ancestor. It just means that most of your ancestry will be reflected in small, hardtomatch segments related to mixed or femaleline ancestors, segments that are much more likely to drop out than maleline segments just because they're smaller. Which you could have intuited just by looking at your pedigree. Barring any cousinmarriages, your direct male line is 50% of all ancestors at the level of your parents, 25% at the level of your grandparents, 12.5% at the level of your great grandparents and soonandsoforth up to ~0.1953% at the level of your 7th great grandparents. Of course there are fewer segments relating to them. I'm still wondering about the most appropriate way to reflect the fact that there is not an absolute 100% probability to match at any level of relationship past the parent level. But I'm also not terribly troubled by the specific case of two direct male line 4th cousins. It seems too recent to attach a significant concern. At least according to the sensitivity analysis I performed. Last edited by Frederator; 10th July 2017 at 09:26 PM. 
#16




Okay, I think I finally see the way in which the FTDNA formula, or at least my reverseengineering of it, may be deficient. I mean with respect to assumptions surrounding the level of variation around the average grandparent's contribution.
Here is the vexing article again: https://gcbias.org/2013/10/20/howmu...rgrandparent/ First, I performed a rough calculation of the lower and upper bounds of the typical grandparent's contribution percentage at the 95% confidence level. I derived separate figures for paternal grandparents vs. maternal grandparents. I was only able to perform a very rough calculation due to the highlevel at which the data were summarized. Then I spent some time performing alternate scenario analyses where the paternal grandfather's contribution % varied at each generation in the chain of descent, but strictly within the upper and lower bounds and always averaging to 50% of the paternal contribution. I noticed that in all cases the volume of the target ancestor's DNA inherited by the donor was LESS than if, by some miracle, every ancestor in the line of descent had gotten a perfect 50% of his paternal contribution from his grandfather. I guess this has to do with the exponential downstream effect of deviation from the average at every stage. You can't just "make up" for getting only 40% of your paternal contribution from your grandfather by allocating 60% of your contribution to your son from your father's DNA. You'd have to pass on 62.5%. Which would violate the assumption that the average paternal contribution percentage value in our scenario should be 50%. So it looks like there may have been something fundamentally wrong with FTDNA's calculation. Maybe "wrong" is too strong a word. Maybe "less than optimal", would be a better phrase. The scenario that each and every grandparent within the chain of descent should hit a perfect 25% contribution level with respect to all of their grandchildren is extremely improbable, although taken as a whole they will average 25%. I think the traditional solution in cases like this, with extreme variation around the average, would be to present a confidence interval, with a minimum and a maximum expected value which should cover a large percentage, say 95%, of cases. I had originally thought of doing that from the very beginning, although I did not know then where I would be able to obtain the figures on variation that I needed. But I think the result would be very difficult for the typical user to interpret. "Chances about chances? What are you talking about?! Just give me your single best number!" I think that's why the FTDNA formula, as far as I can tell, settled on using a simple average. However, I think I have a better answer. Run the calculation assuming that half the ancestors in the chain of descent are at the bottom of the confidence interval and half at the upper half of the confidence interval. The average of their contribution % to their grandchildren will still approximate 25%, but the resulting expected volume of the target ancestor's DNA at the donor's generation will be appropriately conservative. The final result is a single number, a predicted probability to match a cousin in this specific descent scenario. Maybe not as technically accurate as informing the user of the full range of possibilities within the confidence interval, but much easier to understand and I think fairly representative of the true probabilities. At least more representative than assuming each grandparent contributed a strict 25% at every point in the chain of descent. Using these assumptions in my probability calculator, the specific case that I presented, of only one brother out of two matching a 4th cousin, all descending in direct male lines from the common ancestor, still appears to be very nearly impossible. The returned probability that we all match is still 100%. But there are plausible scenarios within the 95% confidence interval where the paternal grandfathers do NOT average 25% contribution, and the final matching probability is less than 100%. Not much less than 100%, but enough that a miss here or there would not raise an eyebrow. So I guess my conclusion is still that there is a high likelihood of a nonpaternity event in one of these lines. Maybe a little less high than I originally thought, but still pretty high. Last edited by Frederator; 12th July 2017 at 12:21 AM. 
#17




The degree to which the grandparental contribution varies from the average of 25% is an interesting question! I can imagine a future study in which statistics are gathered directly from a set of families consisting of grandparents and grandchildren from a large autosomal database.
There are enough odd observations from other species in the classical genetics literature to support the possibility of phenomena (if that is the right word) such as "meiotic drive" (in which one homologue ends up preferentially in the gametes), epigenetic features of chromosomes, and who knows what other oddities that, while not changing the average contribution, could easily change the probability distribution. 
#18




Agreed.
I think it should always be acknowledged that an absolute probability of 100% is never possible. Probability calculations by definition are based on only the most likely scenario within a range of scenarios. While not perfect, I think that's still a pretty useful analysis. I think that link that Ann Turner originally posted may meet a good many of your needs. The data is summarized at a pretty high level, but it allowed me to construct a rough confidence interval. 
#19




Quote:
http://thegeneticgenealogist.com/201...edcmproject/ 
#20




Here's another way of putting it . . .
I just added a feature to my calculator that returns the probability distribution for grandparental contribution in each specific descent scenario presented, based on the figures published in that blog post.
So it looks like there's at least a ~67% chance of an NPE in the specific scenario that I originally cited. As compared to a ~75% chance that some of the probabilities are no lower than FTDNA published in its generic matching probability chart. Maybe room for doubt, but I think enough to cause one to carefully consider the rest of the available evidence. I mean, it would be great to have definitive Y chromosome data, but no qualified donors are willing, so what are you gonna do? Last edited by Frederator; 14th July 2017 at 03:11 PM. 
Bookmarks 
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)  
Thread Tools  
Display Modes  


Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Surname probability  fostert  Paternal Lineage (YDNA STR) Advanced  5  29th June 2016 10:07 PM 
Matching Probability % for 3rd Cousins  Songbill  DNA and Genealogy for Beginners  2  10th January 2015 03:18 PM 
Understanding Y haplotype matching probability  PNGarrison  Scientific Papers  2  15th November 2014 12:07 PM 
A question on probability...  constant_d  Paternal Lineage (YDNA STR) Advanced  5  16th July 2006 08:18 AM 
Probability of 37/37 match between brothers  gpenner  DNA and Genealogy for Beginners  3  16th April 2006 01:23 AM 