Announcement

Collapse
No announcement yet.

Ranking the value of variants in Big Y raw results

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ranking the value of variants in Big Y raw results

    Within the U106 project David Carlisle's last comparison of BigY results would seem to show that there is some cross sample contamination appearing in the results. The following SNP results are seem to be OK under L47 but they now they show up in a single L259+ sample. Barcoding failure?

    14598808 cb
    CTS6009
    22802901
    CTS2913 cb

  • #2
    Bonk - they could be upstream......

    They could be real and upstream of say U106 since the others are no-calls. need to see how they are called in other haplogroup regions.

    Comment


    • #3
      Originally posted by wkauffman View Post
      They could be real and upstream of say U106 since the others are no-calls. need to see how they are called in other haplogroup regions.
      Wayne, I have found that Big Y L21 .vcf files typically have around 300 derived variants. I run a cross check with known SNPs upstream and find that the reference used for ancestral must be knowledgeable of L21's tree position because the known (official tree) upstream SNPs are not included in the .vcf files.

      However, there are a whole series of other derived variants that apparently have been unknown to this point, or novel some might say. I run a compare with a U152 (brother to L21) and find about 100 common derived variants. I assume they are also upstream and throw those out for L21 tree purposes.

      Does that seem like a reasonable approach?

      Comment


      • #4
        Originally posted by mwwalsh View Post
        Wayne, I have found that Big Y L21 .vcf files typically have around 300 derived variants. I run a cross check with known SNPs upstream and find that the reference used for ancestral must be knowledgeable of L21's tree position because the known (official tree) upstream SNPs are not included in the .vcf files.

        However, there are a whole series of other derived variants that apparently have been unknown to this point, or novel some might say. I run a compare with a U152 (brother to L21) and find about 100 common derived variants. I assume they are also upstream and throw those out for L21 tree purposes.

        Does that seem like a reasonable approach?
        Yes. We are relying on David Carlisle's program to weed out the shared upstream SNPs and identify those which may not be reliable. We will have to be vigilant to properly identify any recurrent SNPs.

        Comment


        • #5
          Ranking the value of variants in Big Y raw results

          I've been consolidating derived, passed variants from the .vcf files into a phylogenetic comparison spreadsheet to look for shared versus single individual only variants and to look for patterns.

          I'm not able to adequately assess true biological instability and potential for recurrence (in multiple parallel lineages) for these variants.

          However, on many of these tested individuals we have prior testing and know what current subclades they fit in. We also have Y STR results.

          I'm trying to take a pass at ranking the phylogenetic value of the novel passed variants.

          I've found the terminology of private, novel, etc. to be not deep enough to handle the various statuses possible in phylogenetic research. I've been playing with this for a couple of weeks and here are the statuses I'm currently using. Any comments?

          -3 upstream confirmed
          -2 upstream unsure
          -1 unstable
          0
          1 single individual
          2 single family
          3 multi-family
          4 public unsure
          7 public consistent
          8 tree-draft
          9 tree-official

          I can explain more about the classification criteria I'm using but I do make some subjective judgements in the process.

          Comment

          Working...
          X