Announcement

Collapse
No announcement yet.

FTDNATiP, Y-Matches, setup

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • JC399
    replied
    Originally posted by vineviz
    FTDNATiP has as inputs: 1) some algorithm for calculating genetic distance (that takes into account RecLOH events at the multi-copy markers); and 2) 38 mutation rate estimates (one each for the first 37 markers and a single estimate for the final panel).

    The GD algorithm would be easiest to crack, I think, but in order to reverse engineer the FTDNATiP mutation rate estimates you would need access to at least to 39 haplotypes which varied from each other at exactly one marker each. Leaving aside the problem that these 39 haplotypes don't exist, you still have to hope that there is a single solution to the problem of trying to simultaneously solve for the 38 mutation rate estimates.

    Good luck.

    I really don't want to get into all that. I was planning on focusing more on r1b1 since it applies to me personally and there is more data on that than perhaps any of the others. If I can get closer to FTDNATiP than using the public calculators, I'll be very happy.
    Last edited by JC399; 12 January 2007, 05:26 PM.

    Leave a comment:


  • JC399
    replied
    Originally posted by vineviz
    It would probably help if you were to clarify you mean by "differential", I think. But I'll try to explain it anyway.

    Using more markers in a TMRCA calculation will change the TMRCA estimate produced by the calculation, even if there are no mutations in the additional markers tested.

    And unless two different calculators use the same mutation rate estimate for the additional markers, each will change the TMRCA estimate by a different amount when the additional markers are added.

    The TMRCA estimate is a non-linear function of the mutation rate and the number of mutations per marker. A change of zero is informative and impacts the estimate produced by the model described by Walsh.

    The process of actually building a calculator is highly informative, and will allow you to see how the inputs interact. The whole excercise become much more intuitive once you expend the energy to understand it.
    I think it's pretty clear we are talking about something altogether different. By differential I mean what's causing a difference in TMRCA time going from one individual to the next. Yes, the actual values are affected by the upper markers, but the difference between each individual is being caused by the lower markers because those are the ones that are different. If the lower markers were the same too there would be no difference. That's all I meant.

    Leave a comment:


  • vineviz
    replied
    Originally posted by JC399
    As for the impossibility of matching FTDNATiP, I guess I'll have to see that for myself.
    FTDNATiP has as inputs: 1) some algorithm for calculating genetic distance (that takes into account RecLOH events at the multi-copy markers); and 2) 38 mutation rate estimates (one each for the first 37 markers and a single estimate for the final panel).

    The GD algorithm would be easiest to crack, I think, but in order to reverse engineer the FTDNATiP mutation rate estimates you would need access to at least to 39 haplotypes which varied from each other at exactly one marker each. Leaving aside the problem that these 39 haplotypes don't exist, you still have to hope that there is a single solution to the problem of trying to simultaneously solve for the 38 mutation rate estimates.

    Good luck.

    Leave a comment:


  • vineviz
    replied
    Originally posted by JC399
    With regard to markers that haven't exibited a mutation playing a roll, that's not the issue. It's whether it's playing a roll in the "differential" TMRCA. If all markers are the same, for each individual, they can't contribute to a "differential" calculation, regardless of whatever values are used.
    It would probably help if you were to clarify you mean by "differential", I think. But I'll try to explain it anyway.

    Using more markers in a TMRCA calculation will change the TMRCA estimate produced by the calculation, even if there are no mutations in the additional markers tested.

    And unless two different calculators use the same mutation rate estimate for the additional markers, each will change the TMRCA estimate by a different amount when the additional markers are added.

    The TMRCA estimate is a non-linear function of the mutation rate and the number of mutations per marker. A change of zero is informative and impacts the estimate produced by the model described by Walsh.

    The process of actually building a calculator is highly informative, and will allow you to see how the inputs interact. The whole excercise become much more intuitive once you expend the energy to understand it.
    Last edited by vineviz; 12 January 2007, 04:52 PM.

    Leave a comment:


  • JC399
    replied
    Originally posted by vineviz
    The mutation rate for markers that haven't exhibited a mutation plays the same role in TMRCA calculations as the rate for markers that have exhibited a mutation.

    Unless you know exactly the inputs used by FTDNATip, replicating the results in an independent calculator will be impossible.

    Also, it is relatively easy to construct a TMRCA calculator using Walsh's method with individual marker mutation rates in Excel.
    As I said before, I don't use Excel or any other Microsoft products for that matter, nor do I have any desire to. As for the impossibility of matching FTDNATiP, I guess I'll have to see that for myself.

    With regard to markers that haven't exibited a mutation playing a roll, that's not the issue. It's whether it's playing a roll in the "differential" TMRCA. If all markers are the same, for each individual, they can't contribute to a "differential" calculation, regardless of whatever values are used.
    Last edited by JC399; 12 January 2007, 03:48 PM.

    Leave a comment:


  • vineviz
    replied
    Originally posted by JC399
    Those last 30 markers are the ones where there were no mutations - at least for the people I know about so the variation in TMRCA wasn't coming from there. It was mainly from the lower 37. In particular, mostly from the 26-37 region.
    The mutation rate for markers that haven't exhibited a mutation plays the same role in TMRCA calculations as the rate for markers that have exhibited a mutation.

    Unless you know exactly the inputs used by FTDNATip, replicating the results in an independent calculator will be impossible.

    Also, it is relatively easy to construct a TMRCA calculator using Walsh's method with individual marker mutation rates in Excel.
    Last edited by vineviz; 12 January 2007, 03:31 PM.

    Leave a comment:


  • JC399
    replied
    Originally posted by vineviz
    The 90% confidence interval with those conditions is something like 8 to 32 generations, so a shift of one or two generations at the 50% point is probably not statistically significant. Certainly it isn't genealogically meaningful.

    Additionally, the TMRCA calculations are very dependent on the average mutation rate being used. Given that no one has a very accurate estimate of which rate should be used for markers 38-67, including FTDNA to the best of my knowledge, slight variations in the TMRCA estimates are not particulary informative.
    Those last 30 markers are the ones where there were no mutations - at least for the people I know about so the variation in TMRCA wasn't coming from there. It was mainly from the lower 37. In particular, mostly from the 26-37 region.

    Furthermore, I never meant to imply I thought it was making a huge difference, I was just pointing out if someone wanted to take that into account they could and the method for doing this is described in that paper. When I get some free time I was thinking of writing a Mathematica program to see if I can match the FTDNATiP results. Though that may be difficult given the exact mutation rates FTDNATiP uses are not known. I have seen some estimates, however.
    Last edited by JC399; 12 January 2007, 02:50 PM.

    Leave a comment:


  • vineviz
    replied
    Originally posted by JC399
    In my case I'm seeing as much as a 2 generation variation at 67 markers and 6 mutations. I can think of some rare cases where it might be more, though typically I am sure it's less.
    The 90% confidence interval with those conditions is something like 8 to 32 generations, so a shift of one or two generations at the 50% point is probably not statistically significant. Certainly it isn't genealogically meaningful.

    Additionally, the TMRCA calculations are very dependent on the average mutation rate being used. Given that no one has a very accurate estimate of which rate should be used for markers 38-67, including FTDNA to the best of my knowledge, slight variations in the TMRCA estimates are not particulary informative.

    Leave a comment:


  • JC399
    replied
    Originally posted by vineviz
    Practically speaking, the impact of using individual marker mutation rates in the TMRCA is likely to be relatively minor. It's a nice refinement but it doesn't add a whole lot to the accuracy of the estimate, especially when you are comparing large numbers of markers (say, more than 30).
    In my case I'm seeing as much as a 2 generation variation at 67 markers and 6 mutations. I can think of some rare cases where it might be more, though typically I am sure it's less. Here is a paper that describes how to do this calculation with individual mutation rates.

    http://www.genetics.org/cgi/reprint/158/2/897.pdf

    Of course, it requires knowing what those mutation rates are, and in the case of the upper 30, that might be hard to do. But there's nothing preventing someone from using constant mutation rates in that region, while using individual mutation rates in the rest.

    Leave a comment:


  • vineviz
    replied
    Originally posted by JC399
    Thanks. I've seen those public domain calculators before. It kind of looks like they are not not using individual mutation rates. It looks more like they are using a single rate that depends on which test you took. Is that true? I think Sorenson, and a few other sites have those too. It's certainly an option if you don't mind entering the data manually.
    Practically speaking, the impact of using individual marker mutation rates in the TMRCA is likely to be relatively minor. It's a nice refinement but it doesn't add a whole lot to the accuracy of the estimate, especially when you are comparing large numbers of markers (say, more than 30).

    Leave a comment:


  • JC399
    replied
    Originally posted by vineviz
    It is not possible, I think, to "download" the database. Most poeple that use it do the extraction manually, usually by haplogroup.

    You also asked about another TMRCA calculator. A good one is here:

    http://www.scs.uiuc.edu/~mcdonald/tmrca.htm
    Thanks. I've seen those public domain calculators before. It kind of looks like they are not not using individual mutation rates. It looks more like they are using a single rate that depends on which test you took. Is that true? I think Sorenson, and a few other sites have those too. It's certainly an option if you don't mind entering the data manually.

    With regard to the download, the other option would be to write a program to do it for you. I'm not sure if I have a good enough reason to go through all that trouble though. Maybe one day when I'm extremely bored I'll give it a shot.

    Leave a comment:


  • vineviz
    replied
    Originally posted by JC399
    I have noticed there are some people who have downloaded the entire ysearch database and done calculations on it. How did they do that?
    It is not possible, I think, to "download" the database. Most poeple that use it do the extraction manually, usually by haplogroup.

    You also asked about another TMRCA calculator. A good one is here:

    http://www.scs.uiuc.edu/~mcdonald/tmrca.htm

    Leave a comment:


  • JC399
    replied
    Originally posted by MMaddi
    You can do something like what you want to do with Internet Explorer and Excel.

    Do your search on ysearch and then do a haplotype comparison between the people. This will give you a table with the haplotypes for each on the screen. Right click on the page. You will see an option on the menu to "Export to Excel" or something like that. Click on that. As long as you have Excel on your computer, it will automatically open Excel and download the table to a new spreadsheet.

    Then you can save the spreadsheet and sort by whatever markers you want to sort by. And possibly write a program to manipulate the data further.

    Mike
    Thanks. I was talking more about downloading "everything", but what you describe would be useful. I normally don't use Internet Explorer so there's no export to Excel for me. I don't use Excel anyway. But I can copy and paste the tables into a text editor and it turns out to be tab delimited text that can be read into an application like Excel or something else. You can do something similar with the project page data here and I have done this too. I was just wondering how they downloaded the entire database. With that you get the haplogroup and a lot of other information too you wouldn't get from a table, not to mention the possibility of doing your own custom searches ysearch can't do.

    Leave a comment:


  • MMaddi
    replied
    Originally posted by JC399
    I have already done that. Even ysearch has restrictions on how you can search, but it's much better than here. One problem though is there is no FTDNATiP calculator there.

    I have noticed there are some people who have downloaded the entire ysearch database and done calculations on it. How did they do that? I can think of ways I would do it if I wanted to write a software program or a shell script to read and parse the data, but that would take a bit of work, and a long time running. Is there a way to FTP it or something?
    You can do something like what you want to do with Internet Explorer and Excel.

    Do your search on ysearch and then do a haplotype comparison between the people. This will give you a table with the haplotypes for each on the screen. Right click on the page. You will see an option on the menu to "Export to Excel" or something like that. Click on that. As long as you have Excel on your computer, it will automatically open Excel and download the table to a new spreadsheet.

    Then you can save the spreadsheet and sort by whatever markers you want to sort by. And possibly write a program to manipulate the data further.

    Mike

    Leave a comment:


  • JC399
    replied
    Originally posted by efgen
    FTDNA doesn't provide the capability to do customized searches/comparisons in their database. However, they have provided you with names and email addresses of your matches, so you can always email your matches to discuss details.

    You can also upload your results to Ysearch (from your Y-DNA Matches tab) and do the type of searches/comparisons that you are interested in. The downside is that participation in Ysearch is voluntary, so you probably won't find all your matches there. The upside is that you may find matches who tested with companies other than FTDNA.
    I have already done that. Even ysearch has restrictions on how you can search, but it's much better than here. One problem though is there is no FTDNATiP calculator there.

    I have noticed there are some people who have downloaded the entire ysearch database and done calculations on it. How did they do that? I can think of ways I would do it if I wanted to write a software program or a shell script to read and parse the data, but that would take a bit of work, and a long time running. Is there a way to FTP it or something?
    Last edited by JC399; 11 January 2007, 06:31 PM.

    Leave a comment:

Working...
X