Announcement

Collapse
No announcement yet.

BIG Y Order Counts by Project

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • mwwalsh
    replied
    Originally posted by mwwalsh View Post
    ...
    I've downloaded the ISOGG Ybrowse tool SNP database. Since I am only working on R1b today I filtered out everything that did not have the consecutive characters "R1b" or "M269" somewhere listed. If SNPs are properly entered that should get the R1b data set. I'm a little worried that some SNPs are not properly listed in Ybrowse. I thought Thomas Krahn said he entered all of the S series SNPs into Ybrowse but I have noticed one or two from the Chromo 2 anonymized 2000 file that I couldn't find. Maybe he didn't all of them and just the ones he felt were most relevant... or I'm just misreading the data. Here is this YBrowse "R1b" subset reformatted.
    ....
    Oh boy, I think this is the case. It appears that many of the S series SNPs added to Ybrowse don't have any haplogroups designated. Here are a couple of examples. Unfortunately I think there are a whole ton of S series SNPs entered without any haplogroup information. I don't have time to worry about the Ybrowse tool being correct.

    ID=Sequence:S10014;allele_anc=G;allele_der=A;prime r_f=TBD;primer_r=TBD;YCC_haplogroup=not+listed;
    ISOGG_haplogroup=not+listed;mutation=G+to+A;count_ tested=0;count_derived=0;ref=Jim+Wilson+(2014)

    ID=Sequence:S10015;allele_anc=C;allele_der=T;prime r_f=TBD;primer_r=TBD;YCC_haplogroup=not+listed;ISO GG_haplogroup=not+listed;mutation=C+to+T;count_tes ted=0;count_derived=0;ref=Jim+Wilson+(2014)

    Here is one I'm intimate with because its in me.
    ID=Sequence:S5196;allele_anc=T;allele_der=C;primer _f=TBD;primer_r=TBD;YCC_haplogroup=not+listed;ISOG G_haplogroup=not+listed;mutation=T+to+C;count_test ed=0;count_derived=0;ref=Jim+Wilson;comments=Aka.+ CTS5396

    Here is the synonym:
    ID=Sequence:CTS5396;allele_anc=T;allele_der=C;prim er_f=TBD;primer_r=TBD;YCC_haplogroup=not+listed;IS OGG_haplogroup=not+listed;mutation=T+to+C;count_te sted=0;count_derived=0;ref=Chris+Tyler-Smith+(2011);comments=Extracted+from+1000+genomes+ data.+Not+qualified.

    S5196/CTS5396 are clearly downstream of L21 and so far equivalent to R1b-L513. It looks like I'll have to weigh down the spreadsheet with all of these unknowns.

    Leave a comment:


  • mwwalsh
    replied
    Originally posted by Ann Turner View Post
    Have you developed a systematic way to use the VCF files? The lack of data in the ID column(which should contain SNP names when known) and INFO column (which could contain tags for "novel", "on the tree", etc) makes it more difficult than necessary.
    Ann, yes I have.

    I've downloaded the ISOGG Ybrowse tool SNP database. Since I am only working on R1b today I filtered out everything that did not have the consecutive characters "R1b" or "M269" somewhere listed. If SNPs are properly entered that should get the R1b data set. I'm a little worried that some SNPs are not properly listed in Ybrowse. I thought Thomas Krahn said he entered all of the S series SNPs into Ybrowse but I have noticed one or two from the Chromo 2 anonymized 2000 file that I couldn't find. Maybe he didn't all of them and just the ones he felt were most relevant... or I'm just misreading the data. Here is this YBrowse "R1b" subset reformatted.
    https://dl.dropboxusercontent.com/u/...P_Cleanup.xlsm

    I took this R1b filtered SNP list and consolidated it so that one GRCh37 position/allele change variant will only appear once in a table with all of its labels beside it.
    Ex: 07340450C>T DF1/L513/S215

    For the Big Y .vcf files, I wrote two quick little macros for the .vcf files that automatically reformat the derived and passed variants into the same GRCh37 position/allele format and filter them to the top. I can then copy/paste those into another spreadsheet where I'm doing comparisons.

    I use the FTDNA kit number appended (actually prefixed) to each derived mutation in the above format to accumulate a large database (large # rows but only a couple of columns) of derived test results by individual. In that spreadsheet, on another tab, I have a summary/comparison table with the SNPs down one side of the table and the kit #s/surnames/Variety STR signatures on the 2nd dimension (across the top) of the table. Because I have the R1b Ybrowse SNP reference also in the same spreadsheet I add the labels (Ex: DF1/L513/S215) in the comparison table. Most of the new Big Y variants are unlabeled, which is good. We are discovering new stuff.

    I did have to compare two brother clades, L21 and U152, in my L21 comparative analysis to eliminate novel variants that must be upstream of either.

    I can now easily add .vcf to this comparative analysis and sort and filter them, etc. I've got 39 L21 .vcf files in the comparative analysis, but I had to have a system so the next 300 won't sink me.

    I imagine the U106 guys are doing something very similar. However, this only gets you a comparison of derived results. If you want to prove something is ancestral versus no call I think we'll have to use those .bam file. On the other hand, with a lot of individuals tested, you can overpower that need to some extent other than for final proofs.

    I'm trying to set this up so other project administrators can use it. I think it is inevitable this kind of stuff will be done at more youthful and more youthful subclade levels.
    Last edited by mwwalsh; 17 March 2014, 09:13 AM.

    Leave a comment:


  • dtvmcdonald
    replied
    I have a working program that annotates the .VCF file
    for SNPs, not indels. I have a program for single-location
    dels and indels, but I've not tried it on the .vcf file.
    This uses Thomas Krahn's SNP database.

    However, I have no idea if each and every annotation is
    CORRECT.

    Ann, if you would send me a vcf file or two I can
    run it on them and send you the results. I don't have permission to share the (disgustingly few) Clan Donald VCF files.

    Doug McDonald

    Leave a comment:


  • Ann Turner
    replied
    Originally posted by Rebekah Canada View Post
    Hi,

    Are you using David Pike's utility?
    http://www.math.mun.ca/~dapike/FF23u...igYvcf2csv.php
    It is still in testing, but...
    Thanks for the pointer. I didn't realize David was working on anything. His utility appears to create a new column for genotype, filling in the REF value if there is no ALT value. It doesn't handle indels, though -- it just lists one base. That's useful, but I really want more annotations in the VCF file, or even better, a downloadable file of what the user sees on his results page.

    Leave a comment:


  • dtvmcdonald
    replied
    Originally posted by Ann Turner View Post
    Have you developed a systematic way to use the VCF files? The lack of data in the ID column(which should contain SNP names when known) and INFO column (which could contain tags for "novel", "on the tree", etc) makes it more difficult than necessary.
    I have not done that for those VCF files but I have
    done so for the caller program I wrote. If you have
    a file in order of locations one simply has a database list
    of known SNPs in the same order and checks each entry
    in in new file against the database list. My list
    is only the current ISOGG database. So all are,
    of course, "on the tree". It would be easy to add the
    Chromo2 ones. I wrote this checker in such a way that,
    if the equivalent were used on these VCF files, thos
    two columns would simply be filled in and a new VCF file
    written out. My program is about 140 lines of code.

    If you want and give me a list of the lists of SNPs
    to be included, I can write a program to do individual VCF files.

    I intend to do something like that when the BigY R1a files FINALLY
    ACTUALLY ARRIVE. I intend to work from the .bam files, but will look
    at the .vcf ones too. With only 18 files it not a huge job. This case
    is easier than most because all are CTS4179+ and Chromo2 has nothing
    downstream.

    Doug McDonald
    Last edited by dtvmcdonald; 11 March 2014, 10:01 AM.

    Leave a comment:


  • Rebekah Canada
    replied
    Hi,

    Are you using David Pike's utility?
    http://www.math.mun.ca/~dapike/FF23u...igYvcf2csv.php
    It is still in testing, but...
    Originally posted by Ann Turner View Post
    Have you developed a systematic way to use the VCF files? The lack of data in the ID column(which should contain SNP names when known) and INFO column (which could contain tags for "novel", "on the tree", etc) makes it more difficult than necessary.

    Leave a comment:


  • Ann Turner
    replied
    Originally posted by mwwalsh View Post
    The R L21 and subclades project has 33 Big Y results in. There are 270 orders pending.

    ...

    This is not difficult, but it does take a lot of work to manage all of the detailed data. I encourage you to get help from your haplogroup project administrators.
    Have you developed a systematic way to use the VCF files? The lack of data in the ID column(which should contain SNP names when known) and INFO column (which could contain tags for "novel", "on the tree", etc) makes it more difficult than necessary.

    Leave a comment:


  • mwwalsh
    replied
    Originally posted by Rebekah Canada View Post
    I am going to post BIG Y orders in my projects by project and order number. Please join in.
    The R L21 and subclades project has 33 Big Y results in. There are 270 orders pending.

    There are about 20 more haplogroup projects that are subclades of L21. Some of these people are not in the L21 project but I've had almost 50 additional L21 people of one type or another post that they have Big Y on order.

    Anyone is L21+ or positive for some downstream SNP - please join the master project. This should not detract from any projects you are currently in. This is just in addition.
    http://www.familytreedna.com/public/R-L21/

    It is particularly important for you to join if you have Big Y on order as that will allow us to automatically see your results when they come in and do comparative analysis.

    This is critical. SNPs are just markers. To be useful, test results must be shared across groups of potentially related people. What is shared can be compared by phylogenetic comparative methods.
    http://en.wikipedia.org/wiki/Phyloge...rative_methods

    This is not difficult, but it does take a lot of work to manage all of the detailed data. I encourage you to get help from your haplogroup project administrators.

    Leave a comment:


  • ikennedy
    replied
    S668

    Originally posted by maolalai View Post
    Pending
    181933 HiYNGS 543 3/28/2014
    B1871 HiYNGS 545 3/28/2014

    Back
    none

    A small project with only two Big Y's, but an interesting pair in that they're father and son (mine and my father's). Since I'm M222+, DF85+, DF97-; from the vcf I'm most interested in whether S668 and S673 are covered now that the GRCh37 positions of both have been disclosed, although a couple of equivalents could prove just as good. What I'm really waiting for though is the BAM file for each kit, between SAMtools and lobSTR that should keep me busy for months .

    The surname family history has effectively been rewritten already by Y-DNA testing given that no Lally's or Mullally's match the purported broader clann affiliation in modern genealogies, and the co-chiefs of the original family territory also appear to be R-M222+ (but with a high STR divergence and no common off M222 modal markers), which prompted me to take a deeper dive into old surname references. A lenited F in the older references to my surname suggest to me that "alaidh" was not a correct interpretation of the latter part, and given the apparent Tir Chonaill connection for DF85 the use of Cenn instead of Maol to denote "head" in old Ulster Gaelic at least from my gleanings points to the name of a clann that lived in a neighboring territory; it will be interesting if over the next few years NGS can support or refute a link.
    S668 is not sequenced, as you may have seen by now. This is confirmed by Dr. Mittelman.
    S673 is sequenced and appears to be working OK.

    Leave a comment:


  • Kathleen Carrow
    replied
    Originally posted by Steve in OV View Post
    Y receieved my Big Y on 2/27; I am in the Townsend Posey subgroup of Lower Delmarva..

    Steve in Oro Valley
    Hi Steve
    Sending you an e-mail and a PM on here.
    Kathleen ( Lower Delmarva)

    Leave a comment:


  • Steve in OV
    replied
    Big Y for Lower Delmarva

    Originally posted by Kathleen Carrow View Post
    My Lower Delmarva group has one kit received last week and 2 more pending. We have 181 kits ( +-).
    We have an R1,R1b and an I2b1 ordered.
    Y receieved my Big Y on 2/27; I am in the Townsend Posey subgroup of Lower Delmarva..

    Steve in Oro Valley

    Leave a comment:


  • M.O'Connor
    replied
    There are 4 BigY orders in the Prince Edward Island Geographical Project. No results yet.

    Leave a comment:


  • Itai
    replied
    In the G-PF3146 we have:
    Pending:
    11125 542 12/11/2013
    45262 543 20/11/2013

    Back:
    None.

    Leave a comment:


  • Subwoofer
    replied
    Originally posted by John McCoy View Post
    Just a guess, but for the question of the order in which results are returned, I expect this is what happens: The samples enter the testing process in the order received. Then, it takes as long as it takes to complete the test. They come out when they are finished. Isn't that the best we can hope for?
    It always seems my results take a little longer than others, if my BigY results come in within the next 10 weeks they'll beat my WTY but I got my WTY and I'll get my BigY : )

    I definitely wouldn't insist anybody who ordered after my batch 542 order didn't get their results until mine came in and I might get a little miffed if FTDNA held up my processed results because they hadn't completed a prior order ?

    I think some folk need take a stress pill : )

    Leave a comment:


  • John McCoy
    replied
    Just a guess, but for the question of the order in which results are returned, I expect this is what happens: The samples enter the testing process in the order received. Then, it takes as long as it takes to complete the test. They come out when they are finished. Isn't that the best we can hope for?

    Leave a comment:

Working...
X