Go Back   Family Tree DNA Forums > Paternal Lineages (Y-DNA) > BIG Y and SNP Discovery

BIG Y and SNP Discovery This area is for talk about BIG Y results.

Reply
 
Thread Tools Display Modes
  #1  
Old 20th November 2017, 08:05 AM
Svein Davidsen Svein Davidsen is offline
FTDNA Customer
 
Join Date: Jun 2005
Location: Switzerland and France
Posts: 129
YFull definition of "formed" & TMRCA

I need help to make sure I understand YFull’s definitions of “formed” age and TMRCA.

The definition as per their YFull FAQ is: Subclade "formed" age: The TMRCA (time to most recent common ancestor) of a subclade is used as the "formed" age of each branch of the subclade. Stated otherwise, the formed age of a branch is the same as the TMRCA of the "parent" subclade of that branch.

My interest is in the 7 sub-clade of Hg N-L550 phylogenetic tree, especially the N-FGC14542 sub-clade. From the YFull N-L550 tree, v5.08, the N-L550 TMRCA = 2900 ybp, and all the 7 sub-clades have “formed”= 2900 ybp, as per the above definition. The TMRCA of the sub-clades are, in date order: 2900, 2800, 2800, 2700, 2700, 2400 and 1450 ybp (my FGC14542 being 2800 ybp).

I cannot believe, and it must be statistically impossible, that all 7 sub-branches were formed, i.e. N-L550 mutated 7 times, 2900 ybp, so I must be wrong in assuming “formed” = mutated. How do I interpret "formed"? And how do I get the age?

My real quest is to find when the N-FGC14542 mutation happened - what, if anything, can I conclude on this from the YFull data? Are there any other sources of data that can provide this information?
Reply With Quote
  #2  
Old 20th November 2017, 09:30 AM
Armando Armando is offline
FTDNA Customer
 
Join Date: Jun 2009
Posts: 1,680
The formed and TMRCA dates are just estimates and they are based on an average mutation of one SNP every 144.41 years and an assumed age of 60 years for living providers of YFull samples. On average one male generation is 32.5 years which is about 4.44 generations per mutation. So if a person born 3044 years ago that had the N-L550 mutation had 4 male children, 16 male grandchildren, 64 male great-grandchildren, and 256 great-great-grandchildren then it isn't unreasonable that there were 7 separate SNP mutations by 2,900 years ago within those 4.44 generations and that they all had descendants that are alive now that have also had BigY DNA testing.

Last edited by Armando; 20th November 2017 at 09:32 AM.
Reply With Quote
  #3  
Old 21st November 2017, 07:10 AM
Svein Davidsen Svein Davidsen is offline
FTDNA Customer
 
Join Date: Jun 2005
Location: Switzerland and France
Posts: 129
Well argued Armando.

I knew the ages were estimates, but I had not followed through the way you did. However, I still believe the estimates, while theoretically possible, are wrong. Average mutation rates are perfectly applicable when used over a large timescale and many clades, but down at a small sub-clade, over a relatively small timescale, the "average mutation rate" methodology is not applicable.
Reply With Quote
  #4  
Old 21st November 2017, 11:01 AM
John McCoy John McCoy is offline
FTDNA Customer
 
Join Date: Nov 2013
Posts: 533
The key question here, I think, is the range of variation of mutation rates and the accumulation of mutations observed in a population of Y chromosomes -- and I don't think there is enough data to say very much about that. In order to validate the TMRCA estimates, we would need a large group of "ancient" samples of known ages from which to assess the antiquity of the SNP's that define major branches of the Y chromosome haplotree. The idea of a mutational clock, such that mutations inevitably accumulate at a constant rate, is an attractive one, and possibly the only available basis for estimating when genetic lineages diverged where the dates cannot be measured directly from, say, the fossil record, but it is only a hypothesis, based on untested assumptions that seem way too good to be true.
Reply With Quote
  #5  
Old 21st November 2017, 12:18 PM
dna dna is offline
FTDNA Customer
 
Join Date: Aug 2014
Posts: 2,673
Quote:
Originally Posted by John McCoy View Post
The key question here, I think, is the range of variation of mutation rates and the accumulation of mutations observed in a population of Y chromosomes -- and I don't think there is enough data to say very much about that. In order to validate the TMRCA estimates, we would need a large group of "ancient" samples of known ages from which to assess the antiquity of the SNP's that define major branches of the Y chromosome haplotree. The idea of a mutational clock, such that mutations inevitably accumulate at a constant rate, is an attractive one, and possibly the only available basis for estimating when genetic lineages diverged where the dates cannot be measured directly from, say, the fossil record, but it is only a hypothesis, based on untested assumptions that seem way too good to be true.
Yes, ancient DNA from populations known to have living descendants would need to be analyzed. And preferably tracked over millennia.

Mr. W

P.S.
myOrigins would also benefit from the ancient DNA analysis.
Reply With Quote
  #6  
Old 21st November 2017, 03:52 PM
Svein Davidsen Svein Davidsen is offline
FTDNA Customer
 
Join Date: Jun 2005
Location: Switzerland and France
Posts: 129
What do you all think about this suggestion?

I don't have access to the much larger FTDNA database of results for Hg N-L550 sub-clades, but I counted all the YFull sub-clade results posted on their Hg N-L550 tree and found a wide range of results: 3, 5, 8, 25, 97, 9, and 4. 97 is the N-L1025 sub-clade.

Can we propose that this could be a "proxy" for the formation/mutation of the sub-clade, i.e. N-L1025 being the oldest, by a wide margin? We cannot get an actual date, but at least relative dates for the sub-clades formation.
Reply With Quote
  #7  
Old 21st November 2017, 04:35 PM
JMAisHere JMAisHere is offline
FTDNA Customer
 
Join Date: Oct 2017
Location: Sweden
Posts: 23
Quote:
Originally Posted by John McCoy View Post
... we would need a large group of "ancient" samples of known ages from which to assess the antiquity of the SNP's that define major branches of the Y chromosome haplotree. The idea of a mutational clock, such that mutations inevitably....
Maybe we have that in some years from now, this project will run over 5 years. 1,000 prehistoric individuals to be genetically mapped
Reply With Quote
  #8  
Old 21st November 2017, 08:52 PM
Armando Armando is offline
FTDNA Customer
 
Join Date: Jun 2009
Posts: 1,680
Quote:
Originally Posted by Svein Davidsen View Post
Well argued Armando.

I knew the ages were estimates, but I had not followed through the way you did. However, I still believe the estimates, while theoretically possible, are wrong. Average mutation rates are perfectly applicable when used over a large timescale and many clades, but down at a small sub-clade, over a relatively small timescale, the "average mutation rate" methodology is not applicable.
It's not the clades that matter. It's the number of mutations each sample that matter. Of course, anything that is variable can have a much wider margin of error in smaller groups when applying an average calculated from a large dataset as a constant in formula but that is implied with estimates and averages and should be understood without having to be said. However, the dataset isn't so small for N-L550 but it is for some of the downstream subclades. To see the number of mutations per subclade go to an SNP such as L550 then click on info next to the TMRCA or you can open it in a new page or new tab which takes you to https://www.yfull.com/branch-info/N-L550/.You will see a table of all kits that are downstream from L550 and next to each sample id the number of reliable SNPs is next to the sample id. Those are the number of SNPs each kit has downstream from the SNP you choose to look at the info on. We can see that the number of SNPs varies for two reasons which are the variability of test results and the variability of the number of mutations each lineage has. There is anywhere between 17 and 27 SNPs. YFull corrects the number of SNPs probably based on assumed positives that didn't appear in the test result of the sample. Then the averaged mutation rate is multiplied against each sample to get the average age of the common SNP, or group of SNPs in other cases, and then they add up the samples ages and divide them by the number of samples. The formula is shortened since they had already averaged the subclades. The formula is then (3381+3344+2683+3041+2373+2732+3106)/7

In a separate academic study in 2009 by Yali Xue and Chris Tyler-Smith the average number of SNPs was also found to be about 1 every 4 generations with NGS testing similar to BigY in people with well documented genealogies. Two Y chromosomes from a deep-rooting pedigree were genotyped and resequenced. They showed zero Y-STR differences after typing 67 Y-STRs, but four base substitutions after comparing ~10 Mb DNA sequence. See https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3312576/ That mutation rate is what is found with YFull's mutation rate which is based off the study by Adamov et al. 2015.

Poznik et al. 2016 used the mutation rate from Balanovsky et al. 2015 and the ages they have for the subclades that they calculated are somewhat similar to those of YFull. You can download the supplementary file of you want to see if a subclade of N is calculated in there.

Dr. Iain McDonald has been calculating P312 subclade ages and they are at http://www.jb.man.ac.uk/~mcdonald/ge...312/table.html You can compare them with YFull to see how they differ.

The estimated age of a subclade that is several thousands of years old isn't going to be miscalculated to be thousands of years older or younger than it actually is. It will be off by a few hundred and that is as close as we can get with current testing and participation rates.

Quote:
Originally Posted by Svein Davidsen View Post
What do you all think about this suggestion?

I don't have access to the much larger FTDNA database of results for Hg N-L550 sub-clades, but I counted all the YFull sub-clade results posted on their Hg N-L550 tree and found a wide range of results: 3, 5, 8, 25, 97, 9, and 4. 97 is the N-L1025 sub-clade.

Can we propose that this could be a "proxy" for the formation/mutation of the sub-clade, i.e. N-L1025 being the oldest, by a wide margin? We cannot get an actual date, but at least relative dates for the sub-clades formation.
The subclades are based on how branches show based on the participation rate which is also based on survivors. There is no way to tell if N-L1025 just happened to have more survivors or coincidentally has a much higher participation rate or is actually older. I have great-uncles that had a lot of male children and I have great-uncles that had very few male children. If all of the male children of my great-uncles have the same number of male children and that continues for 16 generations then over time the lineages from my great-uncles with more male children will have more male descendants and therefore there will be more branches since on average they will all have about 1 new SNP mutation per 4 generations. If you apply that scenario to N-L550 but with one son having a lot of descendants and the other sons having very few then there will be a lot more participants, if there is an equal participation rate, for the son with a lot of descendants. If the the mutation rate were a true constant, and not an average, all of the descendants of N-L550 would have the exact same number of mutations unless it's an extremely unlikely case of a lineage of only the youngest child of every single generation meaning there would be more time between the marriage of Mr. N-L550 and that descendant. Since N-L550 is about 88 generations between now and Mr. N-L550 the average number of years per generation for N-L550 should be close to that of other lineages. Using the number of participants in a branch is even less scientific than calculating an average rate of SNP mutations in the Y-chromosome since there are even more variables that can't be measured by using the number of participants in branches. We don't know how many children each generation Mr. N-L550 had and we don't know the participation rate of all of the branches of all of his descendants.
Reply With Quote
  #9  
Old 21st November 2017, 08:56 PM
Armando Armando is offline
FTDNA Customer
 
Join Date: Jun 2009
Posts: 1,680
Quote:
Originally Posted by John McCoy View Post
The key question here, I think, is the range of variation of mutation rates and the accumulation of mutations observed in a population of Y chromosomes -- and I don't think there is enough data to say very much about that. In order to validate the TMRCA estimates, we would need a large group of "ancient" samples of known ages from which to assess the antiquity of the SNP's that define major branches of the Y chromosome haplotree. The idea of a mutational clock, such that mutations inevitably accumulate at a constant rate, is an attractive one, and possibly the only available basis for estimating when genetic lineages diverged where the dates cannot be measured directly from, say, the fossil record, but it is only a hypothesis, based on untested assumptions that seem way too good to be true.
No is saying that the mutation rate is an exact constant. The estimates are just that because the mutation rate is an average and not a constant. The number of SNPs each person has since any of the subclades appeared in the fossil record will always vary. So even if we have thousands of well sequenced fossil records that have tight C14 dating the descendant SNP variability will always cause the mutation rate to be questioned. Another problem with a lot of fossils is that it is very common that they can't be fully sequenced so we can only get the lower bound date of the appearance of specific SNPs and not necessarily the exact age. On top of that the C14 dating is also variable so the range of dates that the specimen could have lived in can be larger than what a lot of people desire.

Last edited by Armando; 21st November 2017 at 09:09 PM.
Reply With Quote
  #10  
Old 22nd November 2017, 04:17 PM
Svein Davidsen Svein Davidsen is offline
FTDNA Customer
 
Join Date: Jun 2005
Location: Switzerland and France
Posts: 129
Thanks to everyone, specially Armando, for comprehensive and enlightening inputs.

I now need to follow-up on all the references to see if I can come any nearer to my aim of locating "Where and When" for the birth of N-FGC14542! Ancient DNA where are you?!
Reply With Quote
Reply

Bookmarks


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Comments please, on "half" and "true" siblings clintonslayton76 Family Finder Basics 2 25th October 2016 08:06 AM
What is ybp? And the "formed" and "TMRCA"-difference? Yde DNA and Genealogy for Beginners 8 5th March 2015 10:54 AM
Haplogroup N in a sea of "vikings" and TMRCA. Svein Davidsen DNA and Genealogy for Beginners 21 19th September 2010 10:16 PM
"DiMaggio" coming from the Albanian name "Dhima" jr76x DNA and Genealogy for Beginners 12 10th May 2008 10:27 PM


All times are GMT -5. The time now is 04:29 PM.


Family Tree DNA - World Headquarters

1445 North Loop West, Suite 820
Houston, Texas 77008, USA

Phone: (713) 868-1438 | Fax: (832) 201-7147
Copyright 2001-2010 Genealogy by Genetics, Ltd.
Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.