Go Back   Family Tree DNA Forums > Universal Lineage Testing (Autosomal DNA) > Family Finder Advanced Topics

Family Finder Advanced Topics Advanced discussion about Family Tree DNA's Family Finder Product.

Reply
 
Thread Tools Display Modes
  #11  
Old 8th July 2017, 08:22 PM
Frederator Frederator is offline
FTDNA Customer
 
Join Date: Jul 2010
Posts: 754
And just as a reminder, per a model that has completely agreed with all the sex-averaged matching stats published by the company, the un-limited probability of two direct male line 4th cousins matching at the 7 cM level or higher is ~747%. That in the context of a less than 2% chance that one of the ancestors in one donor's line of descent received 20% or less of his paternal contribution from his grandfather, and an equal chance that he received more than 80% from that same grandfather.

I would be a little less curious about the result if the un-limited probability were only slightly above 100% or we were talking about more remote cousins.

Last edited by Frederator; 8th July 2017 at 08:29 PM.
Reply With Quote
  #12  
Old 9th July 2017, 01:38 PM
Frederator Frederator is offline
FTDNA Customer
 
Join Date: Jul 2010
Posts: 754
Just out of curiosity, I ran my calculator assuming one of the donors experienced the 2% scenario where one of the people in the line of descent gets as little as 20% of their paternal contribution from their grandfather. Still got a limited probability of 100% to match the other direct paternal 4th cousin. No doubt this would have more impact at a more remote relationship.

Last edited by Frederator; 9th July 2017 at 01:50 PM. Reason: Clarification
Reply With Quote
  #13  
Old 9th July 2017, 03:27 PM
John McCoy John McCoy is offline
FTDNA Customer
 
Join Date: Nov 2013
Posts: 516
How does your computation relate to the widely quoted rule of thumb (and I don't know whose thumb it was, or if the figure is correct or even close) that about 50% of 4th cousins are not detectable as autosomal matches? I've been wondering if the observations and the mathematical models are on the same page yet.
Reply With Quote
  #14  
Old 9th July 2017, 03:41 PM
Frederator Frederator is offline
FTDNA Customer
 
Join Date: Jul 2010
Posts: 754
That is almost the exact figure that I get using the sex-average recombination rate for both donor lines. Pretty much spot on for all the published figures.

The difference in results between the sex-average and male recombination rates is very striking, isn't it?
Reply With Quote
  #15  
Old 10th July 2017, 07:35 PM
Frederator Frederator is offline
FTDNA Customer
 
Join Date: Jul 2010
Posts: 754
The matching probability function seems to be based on the expected volume of "intact" DNA from the target ancestor as a % of the total DNA tested. From there it's more-or-less a standard union of probabilistic events calculation, conditioned by a target segment size.

The volume of intact DNA for a given ancestor is determined primarily by the gender of the ancestors in the direct line of descent. The typical recombination rate for men being ~2/3 that of women, expected segment size for primarily male lines of descent is larger than for mixed or female lines, and the differential increases exponentially over the span of generations.

For each child there's more variability in the % of DNA inherited from each paternal grandparent as compared to each maternal grandparent, but over the length of a line of descent they all seem to average out to a standard 25%.

There is always the chance of an unlikely event having a significant impact on the volume of intact DNA inherited from a target ancestor. But for relatively recent target ancestors, the low odds of such an occurrence typically mitigate against a significant impact.

So the likelihood of matching related to ancestors from whom you descend in a primarily male line seems very much more likely than for ancestors of a mixed or primarily female line. Never absolutely 100%, but sometimes pretty darn close.

The fact that these so-called male-line segments are larger means that there will be fewer of them than there would be mixed or female-line segments over a similar span of cM. But that doesn't reduce your chance for a match related to a specific male line ancestor. It just means that most of your ancestry will be reflected in small, hard-to-match segments related to mixed or female-line ancestors, segments that are much more likely to drop out than male-line segments just because they're smaller.

Which you could have intuited just by looking at your pedigree. Barring any cousin-marriages, your direct male line is 50% of all ancestors at the level of your parents, 25% at the level of your grandparents, 12.5% at the level of your great grandparents and so-on-and-so-forth up to ~0.1953% at the level of your 7th great grandparents. Of course there are fewer segments relating to them.

I'm still wondering about the most appropriate way to reflect the fact that there is not an absolute 100% probability to match at any level of relationship past the parent level. But I'm also not terribly troubled by the specific case of two direct male line 4th cousins. It seems too recent to attach a significant concern. At least according to the sensitivity analysis I performed.

Last edited by Frederator; 10th July 2017 at 08:26 PM.
Reply With Quote
  #16  
Old 11th July 2017, 11:12 PM
Frederator Frederator is offline
FTDNA Customer
 
Join Date: Jul 2010
Posts: 754
Okay, I think I finally see the way in which the FTDNA formula, or at least my reverse-engineering of it, may be deficient. I mean with respect to assumptions surrounding the level of variation around the average grandparent's contribution.

Here is the vexing article again:

https://gcbias.org/2013/10/20/how-mu...r-grandparent/

First, I performed a rough calculation of the lower and upper bounds of the typical grandparent's contribution percentage at the 95% confidence level. I derived separate figures for paternal grandparents vs. maternal grandparents. I was only able to perform a very rough calculation due to the high-level at which the data were summarized.

Then I spent some time performing alternate scenario analyses where the paternal grandfather's contribution % varied at each generation in the chain of descent, but strictly within the upper and lower bounds and always averaging to 50% of the paternal contribution.

I noticed that in all cases the volume of the target ancestor's DNA inherited by the donor was LESS than if, by some miracle, every ancestor in the line of descent had gotten a perfect 50% of his paternal contribution from his grandfather.

I guess this has to do with the exponential downstream effect of deviation from the average at every stage. You can't just "make up" for getting only 40% of your paternal contribution from your grandfather by allocating 60% of your contribution to your son from your father's DNA. You'd have to pass on 62.5%. Which would violate the assumption that the average paternal contribution percentage value in our scenario should be 50%.

So it looks like there may have been something fundamentally wrong with FTDNA's calculation. Maybe "wrong" is too strong a word. Maybe "less than optimal", would be a better phrase. The scenario that each and every grandparent within the chain of descent should hit a perfect 25% contribution level with respect to all of their grandchildren is extremely improbable, although taken as a whole they will average 25%.

I think the traditional solution in cases like this, with extreme variation around the average, would be to present a confidence interval, with a minimum and a maximum expected value which should cover a large percentage, say 95%, of cases. I had originally thought of doing that from the very beginning, although I did not know then where I would be able to obtain the figures on variation that I needed.

But I think the result would be very difficult for the typical user to interpret. "Chances about chances? What are you talking about?! Just give me your single best number!" I think that's why the FTDNA formula, as far as I can tell, settled on using a simple average.

However, I think I have a better answer. Run the calculation assuming that half the ancestors in the chain of descent are at the bottom of the confidence interval and half at the upper half of the confidence interval. The average of their contribution % to their grandchildren will still approximate 25%, but the resulting expected volume of the target ancestor's DNA at the donor's generation will be appropriately conservative.

The final result is a single number, a predicted probability to match a cousin in this specific descent scenario. Maybe not as technically accurate as informing the user of the full range of possibilities within the confidence interval, but much easier to understand and I think fairly representative of the true probabilities. At least more representative than assuming each grandparent contributed a strict 25% at every point in the chain of descent.

Using these assumptions in my probability calculator, the specific case that I presented, of only one brother out of two matching a 4th cousin, all descending in direct male lines from the common ancestor, still appears to be very nearly impossible. The returned probability that we all match is still 100%.

But there are plausible scenarios within the 95% confidence interval where the paternal grandfathers do NOT average 25% contribution, and the final matching probability is less than 100%. Not much less than 100%, but enough that a miss here or there would not raise an eyebrow.

So I guess my conclusion is still that there is a high likelihood of a non-paternity event in one of these lines. Maybe a little less high than I originally thought, but still pretty high.

Last edited by Frederator; 11th July 2017 at 11:21 PM.
Reply With Quote
  #17  
Old 12th July 2017, 09:49 AM
John McCoy John McCoy is offline
FTDNA Customer
 
Join Date: Nov 2013
Posts: 516
The degree to which the grandparental contribution varies from the average of 25% is an interesting question! I can imagine a future study in which statistics are gathered directly from a set of families consisting of grandparents and grandchildren from a large autosomal database.

There are enough odd observations from other species in the classical genetics literature to support the possibility of phenomena (if that is the right word) such as "meiotic drive" (in which one homologue ends up preferentially in the gametes), epigenetic features of chromosomes, and who knows what other oddities that, while not changing the average contribution, could easily change the probability distribution.
Reply With Quote
  #18  
Old 12th July 2017, 09:55 AM
Frederator Frederator is offline
FTDNA Customer
 
Join Date: Jul 2010
Posts: 754
Agreed.

I think it should always be acknowledged that an absolute probability of 100% is never possible.

Probability calculations by definition are based on only the most likely scenario within a range of scenarios. While not perfect, I think that's still a pretty useful analysis.

I think that link that Ann Turner originally posted may meet a good many of your needs. The data is summarized at a pretty high level, but it allowed me to construct a rough confidence interval.
Reply With Quote
  #19  
Old 12th July 2017, 10:49 AM
Ann Turner Ann Turner is offline
FTDNA Customer
 
Join Date: Apr 2003
Posts: 1,117
Quote:
Originally Posted by John McCoy View Post
The degree to which the grandparental contribution varies from the average of 25% is an interesting question! I can imagine a future study in which statistics are gathered directly from a set of families consisting of grandparents and grandchildren from a large autosomal database.
There is already a crowd-sourcing project:

http://thegeneticgenealogist.com/201...ed-cm-project/
Reply With Quote
  #20  
Old 14th July 2017, 01:57 PM
Frederator Frederator is offline
FTDNA Customer
 
Join Date: Jul 2010
Posts: 754
Here's another way of putting it . . .

I just added a feature to my calculator that returns the probability distribution for grandparental contribution in each specific descent scenario presented, based on the figures published in that blog post.

So it looks like there's at least a ~67% chance of an NPE in the specific scenario that I originally cited. As compared to a ~75% chance that some of the probabilities are no lower than FTDNA published in its generic matching probability chart.

Maybe room for doubt, but I think enough to cause one to carefully consider the rest of the available evidence. I mean, it would be great to have definitive Y chromosome data, but no qualified donors are willing, so what are you gonna do?

Last edited by Frederator; 14th July 2017 at 02:11 PM.
Reply With Quote
Reply

Bookmarks


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Surname probability fostert Paternal Lineage (Y-DNA STR) Advanced 5 29th June 2016 09:07 PM
Matching Probability % for 3rd Cousins Songbill DNA and Genealogy for Beginners 2 10th January 2015 02:18 PM
Understanding Y haplotype matching probability PNGarrison Scientific Papers 2 15th November 2014 11:07 AM
A question on probability... constant_d Paternal Lineage (Y-DNA STR) Advanced 5 16th July 2006 07:18 AM
Probability of 37/37 match between brothers gpenner DNA and Genealogy for Beginners 3 16th April 2006 12:23 AM


All times are GMT -5. The time now is 01:53 AM.


Family Tree DNA - World Headquarters

1445 North Loop West, Suite 820
Houston, Texas 77008, USA

Phone: (713) 868-1438 | Fax: (832) 201-7147
Copyright 2001-2010 Genealogy by Genetics, Ltd.
Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.