Announcement

Collapse
No announcement yet.

Novel variants per generation = 1?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • morrisondna
    replied
    Originally posted by JohnG View Post
    It also means there may be uncertainty in how a strand was read, which can lead to differences between which SNPs are read in a given kit, and to differences in data quality for parts of the read.
    JohnG,

    That's my understanding as well. So, to some extent there will be an equal percentage of Known SNPs and Novel Variants that are present, but not read for any particular test.

    The "no calls" are reported for the "Known SNP" list, but for "Novel Variants", we don't know if a variant not reported is not present or a "no call" from the basic FTDNA reports. Then, until further investigation of detailed results is done, we need to keep in mind that either case could be true.

    Leave a comment:


  • JohnG
    replied
    Originally posted by morrisondna View Post
    I appreciate all of the discussion around my question, and I agree with most of it, but I what I am trying to ask is whether there was any extra effort during Big Y testing to obtain results for certain specified Known SNPs.

    Was there a list of important Known SNPs where extra efforts were made to make sure results for them were included in the Big Y results?

    Or was the Big Y test just run and if certain Known SNPs happened to be among the results, they were reported as such, and if not, so be it?

    Thanks for any further insights.
    The Big Y test used a different technology. Older tests, as I understand it, had a chip with tests for specific known SNPs. Every test using a specific chip looked at the same SNPs. Very standard for everyone. It was a matching rather than a sequencing.

    The Big Y uses next generation sequencing and really reads DNA strands. That opens the door to discover new SNPs, novel variants not in the list of known SNPs. It also means there may be uncertainty in how a strand was read, which can lead to differences between which SNPs are read in a given kit, and to differences in data quality for parts of the read.

    Leave a comment:


  • Tourist
    replied
    Greetings, fellow Morrison,

    It might be worth considering that FTDNA could as a business need to speculate on global needs, where what might apply to one global region might not apply to another region, and so perhaps FTDNA could have then set its focus broadly, to worldwide. And that could then have lead to some local/regional/specific confusion and consternation, as where so many of us could be so focused on our local concerns.

    But it seems likely that FTDNA will work toward eventually getting those wrinkles ironed out, too.

    Yet the more who test, the better and the sooner the overall picture will become!

    Best,

    Doug

    Leave a comment:


  • morrisondna
    replied
    I appreciate all of the discussion around my question, and I agree with most of it, but I what I am trying to ask is whether there was any extra effort during Big Y testing to obtain results for certain specified Known SNPs.

    Was there a list of important Known SNPs where extra efforts were made to make sure results for them were included in the Big Y results?

    Or was the Big Y test just run and if certain Known SNPs happened to be among the results, they were reported as such, and if not, so be it?

    Thanks for any further insights.

    Leave a comment:


  • GarethH
    replied
    There are also about 50 "novel" SNPs where the reference sequence has the derived variant - almost everyone tested will find each of these SNPs on their novel list, apart from the small minority who match the reference sequence at that point (or maybe noone at all if the SNP is private or was a glitch in the process which establised the reference). Hopefully when these SNPs are removed from the novel category it is done consistently for everyone.

    Leave a comment:


  • efgen
    replied
    Originally posted by felix View Post
    The difference is in the frequency of occurrence in population. SNP is widely distributed while Novel Variants is restricted to either you and/or your close relatives.
    Felix, that's not the case, at least not right now. Earl has the correct answer:

    Originally posted by Earl Davis View Post
    My understanding is that if the SNP was in the FTDNA SNP database then it appears in the SNP list. Otherwise it's in CURRENTLY on the novel variants list.
    Further to Earl's answer:

    There are many SNPs currently on the Novel Variants list that are actually high up on the tree and will be moved to Known SNPs eventually. There are also Novel Variants that will be found to define new subclades, so they'll be assigned SNP names and will be moved to Known SNPs as well. Of course, there will also be SNPs in Novel Variants that are "private" or only found in a family or small group of people.

    Elise

    Leave a comment:


  • Earl Davis
    replied
    My understanding is that if the SNP was in the FTDNA SNP database then it appears in the SNP list. Otherwise it's in CURRENTLY on the novel variants list.

    Earl.

    Leave a comment:


  • morrisondna
    replied
    Felix, so are you saying that the Big Y test is run and then whatever results are found are then separated into Known SNPs and Novel Variants?

    In other words, is there any special effort put forth to find the values of Known SNPs as part of Big Y testing, or are the Known SNPs just as likely to be missed as Novel Variants?

    Thanks...

    Leave a comment:


  • felix
    replied
    Originally posted by morrisondna View Post
    What is the difference in Big Y testing between Known SNPs and Novel Variants in the testing that is done and in the way that the calls are made? Is it simply that some variants are on the known SNP list and are reported that way, or is there more to it?
    The difference is in the frequency of occurrence in population. SNP is widely distributed while Novel Variants is restricted to either you and/or your close relatives.

    Leave a comment:


  • morrisondna
    replied
    Known SNPs vs. Novel Variants

    What is the difference in Big Y testing between Known SNPs and Novel Variants in the testing that is done and in the way that the calls are made? Is it simply that some variants are on the known SNP list and are reported that way, or is there more to it?

    Leave a comment:


  • JohnG
    replied
    Originally posted by felix View Post
    So, for Y chromosome alone, it is approx ~ 130* 59 mil/3.2 bil = 2.396875. So, there should be around 2 to 3 novel variants from father to son.
    So my single data point and your calculation agree. Not bad for starters.

    If a mutation happened 2000 years ago it has had time to spread to lots of descendants and show up in a large region.

    If it happened 600 years ago it would not have spread so much but it might be a proto-genealogical link to location of ancestors in a time before good records and current surname systems.

    To me the Big Y and similar tests function on several levels - they can fill in the big picture of haplotypes over thousands of years, they can fill in some regional change and population movement, and they can help us identify and probe clans and families.

    When I can come up with the novel variant number for my more distant surname cousin if I come up with say 4 variants per generation with the presumed common ancestor in the 1500s, it might mean the surname is older and the ancestors spread out maybe 200-400 years earlier - the presumption would be unlikely. That might make for interesting genealogical research. If it comes up around 2, the estimate is more likely to be right.

    Leave a comment:


  • Ann Turner
    replied
    Originally posted by felix View Post
    The below is based on my understanding:

    There should be 130 mutations per generation, that's from father to son. So, for Y chromosome alone, it is approx ~ 130* 59 mil/3.2 bil = 2.396875. So, there should be around 2 to 3 novel variants from father to son. Novel variants are not SNPs but mutations specific to each person or within a family. To consider a mutation as a SNP, I think it must be around 0.05% in the population of 500k.


    Given the fact I have 450 Novel Variants compared to you who just have around ~106 Novel variants, many of my Novel variants are potential SNPs yet to be discovered - as the database don't have significant Asian/Indian population to slip into the 0.5% frequency.
    There's been a trend away from using the term polymorphism, with its connotation of a certain frequency in some population. (For autosomal DNA, 1% was a typical number.) In fact, dbSNP now uses the term SNV (Single Nucleotide Variant). They haven't changed the name of the database, though, and we'll probably continue to use the term SNP indefinitely, too.

    The ISOGG criteria for adding a SNP to the tree is in the process of revision. The frequency criterion is very difficult to demonstrate. The focus will be on demonstrating a certain amount of variability within the new subclade, so you won't be able to add a SNP that is found only in you and your closest relatives.

    Your calculation about the number of variants arising between father/son is based on the entire length of the Y. Not all of the Y has even been mapped, and we won't be able to observe that number of mutations (at least for the foreseeable future).

    And yes, you're right -- you have more "novel" variants because the database doesn't contain a large enough sample of Asian/Indian populations.

    Leave a comment:


  • felix
    replied
    The below is based on my understanding:

    There should be 130 mutations per generation, that's from father to son. So, for Y chromosome alone, it is approx ~ 130* 59 mil/3.2 bil = 2.396875. So, there should be around 2 to 3 novel variants from father to son. Novel variants are not SNPs but mutations specific to each person or within a family. To consider a mutation as a SNP, I think it must be around 0.05% in the population of 500k.

    Given the fact I have 450 Novel Variants compared to you who just have around ~106 Novel variants, many of my Novel variants are potential SNPs yet to be discovered - as the database don't have significant Asian/Indian population to slip into the 0.5% frequency.

    Leave a comment:


  • JohnG
    replied
    Originally posted by dtvmcdonald View Post
    Vcf and bed files are both plain Excel files. Open
    Excel and then load them in it.
    Note that comparing vcf files is not sufficient ...
    if two vcf files both call a particular location,
    that's it. But if one has a call and the other has
    no info on a location, you need to look in the bed
    file to see if it is a no-call or a call of the reference allele. The bed file has lines like

    7601335 7602044

    in it. This means that it successfully read locations
    7601336 through 7602044. Note the difference in the first number. This means it did NOT read 7601335. If it
    successfully read a location and that location is not
    in the vcf file, its the reference allele.

    Doug McDonald

    Of the 6 locations I have data and my 5th cousin does not

    1 is in a gap between BED file entries
    3 are the first position of a Bed entry and therefore not read
    2 are in the middle of a BED segment but not in the VCF, therefore a reference allele value.

    So 2 'real' changes and 4 that might be related to the reading?

    Of the 9 locations my cousin has in the VCF and I do not have

    5 are in my VCF as Rejected -
    1 is in a gap between BED file entries
    1 are the first position of a BED file entry
    2 are in the middle of a BED segment but not in the VCF

    So again 2 'real' changes.

    I guess the 'real' changes could either be becoming a novel variant or losing a novel variant.

    Maybe when I have the 11th and 12th cousin results that will be clearer?

    Right now, would it be true to say that each of the two descendant lines has had 2 changes since 1743, the birth of the common ancestor?

    Leave a comment:


  • JohnG
    replied
    thanks Doug

    I note I can also read the vcf file with wordpad and with the Integrative Genomics Viewer, which is cool but not much handier for this task.

    http://www.broadinstitute.org/software/igv/download

    I have not looked at the bed file yet.

    What I have is 6 cases where I have a variant and my cousin does not, 4 cases where my cousin has a variant and I do not, and 5 cases where I have a rejected and my cousin has a variant.

    I am puzzled by the quality value. Is there a cut off to reject? Most of the values seem to be 500 or less but some are as high as the pass values. Or I may be reading this wrong.

    Leave a comment:

Working...
X