Announcement

Collapse
No announcement yet.

1 base pair gap in sequence = unreported SNP?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 1 base pair gap in sequence = unreported SNP?

    I'm comparing Big Y results with FullGenomes results for two men who were previously R1b-DF27** but now have many new shared SNPs.

    There are 12,968 segments in the Big Y .bed file I'm looking at. Many of the segments are separated by 1 base gaps. In other words, there appears to be one unreported position in the middle of a longer sequence, which seems very strange. I wouldn't expect so many 1 base gaps in the data considering the read lengths that this technology uses. If the position was sequenced and it wasn't certain which base (C, T, A, or G) was at that position, I'd expect Big Y to be report it in the .vcf file as "REJECTED", rather than reporting it as a position that was not sequenced.

    I have found three cases so far where a new SNP reported by FullGenomes is at the same position as a 1 base gap in the Big Y results. This makes me suspect that the same SNP also exists in the person who had the Big Y test, but that there was something about that SNP that made the Big Y process not report that position. Two of the three cases involved transversion mutations, which are much more rare than transition mutations.

    There are also 1000 Genomes results for two of these three Big Y gaps for a man who is in the same subclade of DF27. They match the newly discovered SNPs in the FullGenomes results.

    Has anyone else seen cases where 1 base gaps in Big Y data might correspond to unreported SNPs? What causes Big Y to report 1 base gaps in sequences?

    Jim Turner
    Last edited by dbl hlx; 14 April 2014, 12:15 PM.

  • #2
    I've now seen a second set of FullGenomes results for another man in the same subclade of R1b-DF27**. He has the same SNPs in the Big Y gaps as the first set of FullGenomes results and the 1000 Genomes Project results. Plus I've found a fourth new SNP in the two FullGenomes results that is at the position of a 1 base gap in the Big Y results.

    Does anyone have any thoughts as to why the Big Y results have so many 1 base gaps? And why some of the gaps appear to be the sites of new SNPs that should be found in that person's Big Y results? Is there anyone I can contact at FTDNA that could answer these questions?

    Comment


    • #3
      Originally posted by dbl hlx View Post
      I've now seen a second set of FullGenomes results for another man in the same subclade of R1b-DF27**. He has the same SNPs in the Big Y gaps as the first set of FullGenomes results and the 1000 Genomes Project results.

      Does anyone have any thoughts as to why the Big Y results have so many 1 base gaps? And why some of the gaps appear to be the sites of new SNPs that should be found in that person's Big Y results?
      Yes. The answer seems to be purported heterzygosity.
      For example, in my bed file I have
      chrY 8597503 8600008
      chrY 8600009 8609432

      which means 8600009 is a one-base gap.

      The vcf file gives
      chrY 8600009 . C T 41.1612 REJECTED . GT 0/1

      which is a rejection due to purported heterozygosity.

      FTDNA may be using a different criterion for rejecting
      such things. I can't tell anything about this yet as
      I have no BigY bam files for people for whom I have
      the matching bed and vcf files.

      As soon as I get a matching bam file I will check
      to see how their caller compares to mine.
      Last edited by dtvmcdonald; 16 April 2014, 03:48 PM.

      Comment


      • #4
        Originally posted by dtvmcdonald View Post
        The vcf file gives
        chrY 8600009 . C T 41.1612 REJECTED . GT 0/1

        which is a rejection due to purported heterozygosity.
        Hmmm. That's not what I'm seeing for the 4 SNPs I mentioned. None of them are listed in the vcf file, so it doesn't look like they were sequenced and rejected. It looks like one position in the middle of a sequence was skipped.

        I'm not sure which is stranger, 1 skipped position in a sequence, or purported heterozygosity on the Y chromosome. Either way, I wish FTDNA would explain what's going on.
        Last edited by dbl hlx; 17 April 2014, 07:09 PM.

        Comment

        Working...
        X