I'm comparing Big Y results with FullGenomes results for two men who were previously R1b-DF27** but now have many new shared SNPs.
There are 12,968 segments in the Big Y .bed file I'm looking at. Many of the segments are separated by 1 base gaps. In other words, there appears to be one unreported position in the middle of a longer sequence, which seems very strange. I wouldn't expect so many 1 base gaps in the data considering the read lengths that this technology uses. If the position was sequenced and it wasn't certain which base (C, T, A, or G) was at that position, I'd expect Big Y to be report it in the .vcf file as "REJECTED", rather than reporting it as a position that was not sequenced.
I have found three cases so far where a new SNP reported by FullGenomes is at the same position as a 1 base gap in the Big Y results. This makes me suspect that the same SNP also exists in the person who had the Big Y test, but that there was something about that SNP that made the Big Y process not report that position. Two of the three cases involved transversion mutations, which are much more rare than transition mutations.
There are also 1000 Genomes results for two of these three Big Y gaps for a man who is in the same subclade of DF27. They match the newly discovered SNPs in the FullGenomes results.
Has anyone else seen cases where 1 base gaps in Big Y data might correspond to unreported SNPs? What causes Big Y to report 1 base gaps in sequences?
Jim Turner
There are 12,968 segments in the Big Y .bed file I'm looking at. Many of the segments are separated by 1 base gaps. In other words, there appears to be one unreported position in the middle of a longer sequence, which seems very strange. I wouldn't expect so many 1 base gaps in the data considering the read lengths that this technology uses. If the position was sequenced and it wasn't certain which base (C, T, A, or G) was at that position, I'd expect Big Y to be report it in the .vcf file as "REJECTED", rather than reporting it as a position that was not sequenced.
I have found three cases so far where a new SNP reported by FullGenomes is at the same position as a 1 base gap in the Big Y results. This makes me suspect that the same SNP also exists in the person who had the Big Y test, but that there was something about that SNP that made the Big Y process not report that position. Two of the three cases involved transversion mutations, which are much more rare than transition mutations.
There are also 1000 Genomes results for two of these three Big Y gaps for a man who is in the same subclade of DF27. They match the newly discovered SNPs in the FullGenomes results.
Has anyone else seen cases where 1 base gaps in Big Y data might correspond to unreported SNPs? What causes Big Y to report 1 base gaps in sequences?
Jim Turner
Comment