Announcement

Collapse
No announcement yet.

Some SNPs previously tested not availble in Big Y Known Snps

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Some SNPs previously tested not availble in Big Y Known Snps

    Last year through testing both for Walk the Y and individual SNP testing, several new SNPs were discovered and reported for several e1a1 men.

    One e1a1 member, 1003, has received his big y results.

    Last year he was Walk the Y tested and showed positive for L631. Six other E1a1 men were tested positive for L631. Yet when we view the Known SNPS/Show All, L631 shows as a question mark under “Derived”. Why is it a question since they tested positive for it last year?

    Last year SNP testing of other e1a1 men showed positive for L1241, yet the Known SNPS/Show All shows “no matching records found” Why is this snp missing altogether?

    Last year SNP testing for several e1a1 men showed positive for L1238 , yet, the Known SNPs/Show All shows L1238 with a question mark. Why is it a question since several previously tested positive for it last year?

  • #2
    No call?

    Perhaps the "?" signifies a "no-call"?

    Apparently no-calls are abundant in BigY even with 50X coverage.

    The WTY result is more accurate than the BigY.

    Comment


    • #3
      The Big Y results shown on the web page are not definitive.
      Even the results shown in the vcf/bed files are not definitive. You have to examine the bam file
      and use judgement to see the probabilities. This is unfortunate but true.

      Comment


      • #4
        totally meaningless answer for most

        Except for those with considerable expertise, your answer tells little. Explain "definitive" as is applies here. Let's say one examines the bam files, uses "judgement" and finds the same situation I described. What does that say - that the WTY testing was "definitive" and and the Big Y not? Or that I used poor "judgement"?

        Comment


        • #5
          Unfortunately (there's that word again) none of these tests
          may be 100% reliable in a given case. Even the WTY, even
          the $39 tests. As to the latter ... until the BigY came
          in I thought I was the only L175+ person ever tested.
          BigY found another. A look by FTDNA showed that the
          $39 test had been read wrong. The bam file agreed.

          I'll try to explain, at least for the BigY. The BigY
          generates traces, similar to the $39 tests but these are
          read entirely by machine and are not available to humans,
          at least not normally. The bam file contains large numbers
          of "reads" at each spot that the test covers. For Full Genomes that's most of the Y that has been sequenced
          by any method, for BigY, about half of that. The reads
          come in strips about 100 bases long, which are assembled
          by the computer into a vast overlapping array. In many places there is one or more strips starting at each and every base.

          Each strip is assigned a number which tells how sure the
          computer is that the strip is in the right place. It also
          assigns a number which assesses the probability that
          a given base in the strip was read right. These are
          very different ideas. I should add that the single
          strip read by Sanger sequencing ($39 test) has the same
          problems but being longer the location is surer.

          The computer looks at the pile of strips and sees that at
          one position all strips agree, all have both quality scores high, and assigns it confidently. It looks at another
          base and sees say 90% one allele and 10% another. It has
          to assess the quality scores to decide whether its the
          90% call or a no-call. At 90-10 this should be easy, but apparently its not since I see differences between the
          bam and vcf files for similar cases.

          Judgement of a human can also come in. For example,
          in the Clan Donald I found four mutations at
          22270062, 22270127, 22271724 and 22271726 that are
          calls in some people's vcfs and no-calls in other people's.
          In every case the ones that were called were either
          all ancestral or all derived in a given person. They appear in the genealogy tree at the same spot. I looked at them in
          all the bam files I have received so far. It turns out that
          all of the locations in all of the people with no-calls
          were very close to being calls of the allele I expected.
          I "judge" that in fact every person really IS either
          + or - for all four. This is using Bayesian statistics,
          done in my head. Its important to us because
          these occur at THE critical point in the genealogy,
          and verify it.

          Comment


          • #6
            Thanks for the background information, Doug. This reminds me of a poster presentation about low concordance between different variant calling pipelines. It may be dated now, and it was particularly about exome sequencing, but it made it abundantly clear that the raw data isn't as clear-cut as we might envision.

            http://lyonlab.cshl.edu/presentation...ng_poster2.pdf

            Comment


            • #7
              Originally posted by dtvmcdonald View Post
              Unfortunately (there's that word again) none of these tests
              may be 100% reliable in a given case.

              I'll try to explain, at least for the BigY. The BigY
              generates traces, similar to the $39 tests but these are
              read entirely by machine and are not available to humans,
              at least not normally. The bam file contains large numbers
              of "reads" at each spot that the test covers. For Full Genomes that's most of the Y that has been sequenced
              by any method, for BigY, about half of that. The reads
              come in strips about 100 bases long, which are assembled
              by the computer into a vast overlapping array. In many places there is one or more strips starting at each and every base.

              Each strip is assigned a number which tells how sure the
              computer is that the strip is in the right place. It also
              assigns a number which assesses the probability that
              a given base in the strip was read right. These are
              very different ideas. I should add that the single
              strip read by Sanger sequencing ($39 test) has the same
              problems but being longer the location is surer.

              .
              Doug McDonald
              Thank you for posting that..I think that was very important information

              Comment

              Working...
              X