Announcement

Collapse
No announcement yet.

Using Big Y for TMRCA

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adam
    started a topic Using Big Y for TMRCA

    Using Big Y for TMRCA

    Dear Reseachers,
    I would like to know if FTDNA provides a TMRCA between two persons who have both done the Big Y and are in the same terminal hg. Please don't start feeding me formulas and strategies to figure this out. I am a novice to Big Y and do not have the gumption to get into the weeds. I need a computer to look at the data and do the math for me.
    Thanks, Adam

  • bartarl260
    replied
    Originally posted by wkauffman View Post
    Several items...
    1) FTDNA has had problems reporting "no-calls" and filtering them out of results. If you see Reference = Genotype it is a bad call and should be ignored. You need to open the BAM files from both results and check the quality of any SNP results being compared. The "BED" region file is important in that it provides a measure of the length of good quality call regions. Seeing calls on your list from BED regions that are less than 2 reads (150 * 2 bases) long most likely will not be confirmable using other sequencing technologies or be the same if a complete reassembly of the raw data occurs.
    I know how to check for the no calls. I know how they display in FTDNA's system. The results weren't "no calls," for my father or myself. They made a call, my result is different.

    Leave a comment:


  • wkauffman
    replied
    Several items...
    1) FTDNA has had problems reporting "no-calls" and filtering them out of results. If you see Reference = Genotype it is a bad call and should be ignored. You need to open the BAM files from both results and check the quality of any SNP results being compared. The "BED" region file is important in that it provides a measure of the length of good quality call regions. Seeing calls on your list from BED regions that are less than 2 reads (150 * 2 bases) long most likely will not be confirmable using other sequencing technologies or be the same if a complete reassembly of the raw data occurs.
    2) FTDNA doesn't filter the SNPs provided to you or in the haplogroup levels which are restricted to the combed region which is the current basis of consistent age estimation. The region used for age estimation can be technically slightly larger in a Y700 result compared to a Y500 result. The coverage differences between the tests affects how many SNPs are identified within the age estimation region. Mixing gala and golden delicious apples. Similar but different.
    3) you can read some of the statistical details underlying the calculations as done my McDonald - and seen in Ytree.net - at http://www.jb.man.ac.uk/~mcdonald/genetics/build37.html This is similar to what YFull uses. Within the statistical error range of the age estimates the values provided by Yfull and McDonald are in agreement. In the case of this older run data the error estimates were tighter for McDonald compared to YFull since there significantly more samples used in the process. If FTDNA provides time estimates they should be decent based upon having >95% of the available results available as input. We will have to see if they can do an appropriate job of calibration and retention of parent-child levels in whatever they may deliver in the future.

    Leave a comment:


  • bartarl260
    replied
    They're not missing, they're straight up mismatches.

    Me:
    Marker 1 Reference: A Genotype: G
    Marker 2 Reference: G Genotype: C
    Marker 3 Reference: T Genotype: G
    My Father:
    Marker 1 Reference: A Genotype: A
    Marker 2 Reference: G Genotype: G
    Marker 3 Reference: T Genotype: T

    Leave a comment:


  • The_Contemplator
    replied
    Originally posted by bartarl260 View Post

    Pretty much, I've been told that if my Father sent in his BigY results, we'd probably get our own subclade, but we'd also see a MRCA estimate by Yfull in the 350-ish years timeframe. (3 different SNP's, and different STR's, although I don't think YFull honors 2 of the offending STR's in our case) Contacting them about the issue would in turn result in other different issues entering the mix. The obvious part of their formula is they're going to bake in a minimum of 60 years per mutation, the question from there is what the multiplier is going to be(and their secret ingredients for determining that value), but it's a safe bet it will be a value greater than 2.
    I've been meaning to ask you something. You've mentioned you and your father's Big Y results show 3 SNP difference. Have you checked the VCF of the kit missing the SNPs just in case they were low reads?

    Leave a comment:


  • bartarl260
    replied
    Originally posted by vinnie View Post
    It's my understanding that each Big Y 500 SNP, on average, represents about 130 years.
    Pretty much, I've been told that if my Father sent in his BigY results, we'd probably get our own subclade, but we'd also see a MRCA estimate by Yfull in the 350-ish years timeframe. (3 different SNP's, and different STR's, although I don't think YFull honors 2 of the offending STR's in our case) Contacting them about the issue would in turn result in other different issues entering the mix. The obvious part of their formula is they're going to bake in a minimum of 60 years per mutation, the question from there is what the multiplier is going to be(and their secret ingredients for determining that value), but it's a safe bet it will be a value greater than 2.

    Leave a comment:


  • vinnie
    replied
    It's my understanding that each Big Y 500 SNP, on average, represents about 130 years.

    Leave a comment:


  • bartarl260
    replied
    In regards to their dating methodology, particularly as it relates to FTDNA test transfers and instances like my own(mutation event between close family members) is to NOT alert them to the nature of the relationship between the two persons tested.

    You can expect them to over-estimate the MRCA in some cases, as they predict about 60 years per SNP, and some additional time per STR as well. So if my father ever uploaded his results to Yfull(he refuses to after being alerted to this), we'd likely have a MRCA prediction in the 180+ years ago range (3 STRs and 3 SNPs different), and alerting their admins to a confirmed atDNA father/son relationship is just going to make a bigger mess, as per reported/observed actions on Yfull's part when they have been told of such things. (They declare FTDNA's testing data to be invalid and "correct" it, then manually set the MRCA to 50 years)

    Yfull is useful for many things, identifying very close family isn't one of them. They're not interested in anything that will challenge their age estimation models.

    Leave a comment:


  • dna
    replied
    @Adam, your good questions can probably be only answered in a forum devoted to archaeogenetics. But, did you try asking in the relevant J group at YFull? However, I am expecting that you are asking about YFull's proprietary know-how (and YFull [Russia] is not owned by FTDNA [USA]).


    Mr. W.

    Leave a comment:


  • Adam
    replied
    Okay. Thanks for all of this but I see I'm going to have to provide some background to my question. Instead of speaking about TMRCAs, how about if I rephrase to ask what is the 'age estimate of the branch'. There is a site called yfull.com that has its own set of haplotrees. When I go to my sub-branch of the J1 tree is see something like this:

    J-Y5400_yfull_13Mar2019.jpg
    If you look at J-Y89545 for example, you see next to that the words "formed 700ybp, TMRCA 375ybp info". When accessing 'info' I see:

    J-Y5400_yfull_Y89545detail_13Mar2019.jpg

    This is giving me an age estimate of 366 for the branch (I know that TMRCAs and age estimates are based on probablities and % of certainty so you needn't go into all of that).

    Unfortunately, the YFull J1 tree is not the same as the FTDNA tree so there are branches from my FTDNA haplotree that do not appear on YFull. This means that I cannot obtain a similar age estimation as is given above for the terminal hg (J- S15852) I am looking for, as well as for several of the clades upstream and downstream of S15852.

    When I go to the 'What is YFull's age estimation methodology?", I am taken to this page which I will not reprint here: https://www.yfull.com/faq/what-yfull...n-methodology/

    What I don't know how to do is not fully explained in the methodology, is how to do the "Formula to Correct SNPS Number". How do I obtain these figures from the Big Y results? I get the number of SNPs part. I don't get how they obtained the coverage BP figure nor where they came up with the number 8467165 (is this just a constant?).

    I would rather if FTDNA would just do all of this automatically for each clade, but if I have to I'll DIY if can know how to obtain those two figures in the formula for to 'Correct SNPS Number'

    If you can illuminate me on that I thank you.




    Last edited by Adam; 13th March 2019, 06:59 PM.

    Leave a comment:


  • bartarl260
    replied
    Originally posted by dna View Post
    @bartarl260Did you mean to say three mutated STRs, since you mentioned Y500 ?


    Mr. W.
    Both actually. I'm GD1 @y25, GD2 @Y37, and have another STR mismatch on Y500.

    I also have 3 private SNP mutations that I do not share with my father's Y700 test.

    Leave a comment:


  • dna
    replied
    Originally posted by Adam View Post
    [----] TMRCA between two persons who have both done the Big Y and are in the same terminal hg. [----]
    Are you asking about private SNPs ?

    Unlike STRs where all men (OK, most) have the same set of STRs just with different values, private SNPs of men at the end of one of the branches are very likely to be very rare, if not unique to those individuals. Consequently, there are seldom any good estimates for those particular private SNPs and only general rules (as outlined in the post by bartarl260) for SNPs and STRs can be applied.

    Originally posted by Adam View Post
    [----] I need a computer to look at the data and do the math for me. [----]
    Not all STR mutation rates were published. Some rates are probably proprietary to FTDNA. Beyond TiP (as above in the post by bartarl260), there is no data to crunch, yet. There might be something in the years to come, when STR mutations rates become available and it turns out that many SNPs can be found in multiple branches and it can be proven that they are good TMRC predictors.


    Mr. W.


    P.S.
    Output from TiP should be more complicated. In particular, it should indicate intervals. But quite likely that would be too complicated to many users.

    Leave a comment:


  • dna
    replied
    @bartarl260
    Originally posted by bartarl260 View Post
    [----]
    That said, cycling back to BigY, going to that "Do it yourself" option you don't like, in general, the long term average should be no more than 1 mutated SNP per generation. (although exceptions exist, on Y500, I have three mutated SNPs relative to my father) [----]
    Did you mean to say three mutated STRs, since you mentioned Y500 ?


    Mr. W.

    Leave a comment:


  • bartarl260
    replied
    BigY itself? No.

    Y-12 through Y-111? Yes. It's called the TiP report. Not particularly useful for Y-12 through Y-37 IMO, unless you get a 0% chance. Y-67 and Y-111 TiPs are likely to be more realistic in their predictions. Although those can be off the mark by more than a bit depending on various factors(like recent mutations, such as between myself and my father). If both people have tested Y111 and they don't match at that level, the odds of a "genealogical time-frame" common ancestor is going to be pretty bad even without running a TiP on them.

    That said, cycling back to BigY, going to that "Do it yourself" option you don't like, in general, the long term average should be no more than 1 mutated SNP per generation. (although exceptions exist, on Y500, I have three mutated SNPs relative to my father) From there, depending on how much you know about mutations or lack-thereof on that family line(because you're in a successful surname project... or not) you can extrapolate which lines mutated, and how often.

    In a "high mutation scenario" you could consider both lines are averaging 1 SNP mutation per generation, which means 1 for you, AND 1 for him. So a SNP difference of 6 could mean you have a common Great-grandfather. Or if your family line happens to be one of the more stable lines, it could end up the MRCA is 15+ generations back, because some lines have now been documented as being that stable.

    Until you, or somebody else, knows more about the specifics of what has been going on genetically within your own family (male) line, trying to predict proximity of a match to a specific ancestor by Y-DNA alone is like trying to use a toilet blindfolded while standing up. You know (generally) which direction to aim in, but not much more.

    Leave a comment:

Working...
X