Announcement

Collapse
No announcement yet.

How do we judge the stability of a novel SNP in Big Y?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How do we judge the stability of a novel SNP in Big Y?

    In the past, one of the major concerns when finding new SNPs is their stability. I don't know ISOGG's or FTDNA's current policies but at one point I was told that if an SNP occurred in more than two lineages it might subject to being dropped from a formal Y tree like the FTDNA haplotree.

    The webinar a week ago Friday described that Big Y as targeting the "gold standard" parts of the Y chromosome. My understanding is that relates to a 2013 paper - "Sequencing Y Chromosomes Resolves Discrepancy in Time to Common Ancestor of Males Versus Females" by David Poznik et al.
    https://www.sciencemag.org/content/341/6145/562

    It's quite technical so I don't really understand it. Is it generally FTDNA's position that Big Y SNP/variant results are by default of high quality because of the areas of the Y chromosome targeted?

    I notice different ratings in the Big Y .vcf files. Under the heading "Qual" are set of data values like 10.9469, 124.318 and 1484.13. What do those values mean?

    Under the "Filter" column it has either "PASSED" or "REJECTED". I assume we should ignore those that are "REJECTED". Is that right? Will those particular locations be re-read of will they always be rejected? Does rejected mean the machine couldn't read the location or does it mean something else?

    FTDNA also provides a la carte SNPs through advanced orders and support for Geno 2. Is there any place I can find general guidance on how to ascertain stability and readability?

    We'll have a lot of novel SNPs now with little information about conflicts (irregularities) in the tree so the first cut has to be biological instability or machine readability.

  • #2
    Will we be able to tell more about quality from the .bam files? Are the .bam files actually being made available yet?

    Doug McDonald

    Comment


    • #3
      I just noticed that some of the project administrators are marking Big Y novel SNPs, in some cases, as "coverage boundary" SNPs. The thinking is that SNPs that are found in coverage boundary areas are likely to be read less reliability.

      Does this have significant impact on the reliability? or it is accounted for in those values under "Qual" somehow? or is it really not a concern?

      Comment


      • #4
        Originally posted by mwwalsh View Post
        I just noticed that some of the project administrators are marking Big Y novel SNPs, in some cases, as "coverage boundary" SNPs. The thinking is that SNPs that are found in coverage boundary areas are likely to be read less reliability.

        Does this have significant impact on the reliability? or it is accounted for in those values under "Qual" somehow? or is it really not a concern?
        As we get access to the BAM files we will be able to make improved judgement calls for and improve the definitions for the various labels that are being applied to the released lists of novel SNPs.

        Comment

        Working...
        X