Go Back   Family Tree DNA Forums > Paternal Lineages (Y-DNA) > BIG Y and SNP Discovery

BIG Y and SNP Discovery This area is for talk about BIG Y results.

Reply
 
Thread Tools Display Modes
  #1  
Old 21st April 2014, 10:37 AM
sailingdeac sailingdeac is offline
FTDNA Customer
 
Join Date: Feb 2006
Posts: 21
Some SNPs previously tested not availble in Big Y Known Snps

Last year through testing both for Walk the Y and individual SNP testing, several new SNPs were discovered and reported for several e1a1 men.

One e1a1 member, 1003, has received his big y results.

Last year he was Walk the Y tested and showed positive for L631. Six other E1a1 men were tested positive for L631. Yet when we view the Known SNPS/Show All, L631 shows as a question mark under “Derived”. Why is it a question since they tested positive for it last year?

Last year SNP testing of other e1a1 men showed positive for L1241, yet the Known SNPS/Show All shows “no matching records found” Why is this snp missing altogether?

Last year SNP testing for several e1a1 men showed positive for L1238 , yet, the Known SNPs/Show All shows L1238 with a question mark. Why is it a question since several previously tested positive for it last year?
Reply With Quote
  #2  
Old 22nd April 2014, 11:19 AM
ajmr1a1 ajmr1a1 is offline
FTDNA Customer
 
Join Date: May 2012
Posts: 382
Blog Entries: 5
No call?

Perhaps the "?" signifies a "no-call"?

Apparently no-calls are abundant in BigY even with 50X coverage.

The WTY result is more accurate than the BigY.
Reply With Quote
  #3  
Old 22nd April 2014, 11:27 AM
dtvmcdonald dtvmcdonald is offline
mtDNA: | Big Y Pending
 
Join Date: Mar 2011
Posts: 178
The Big Y results shown on the web page are not definitive.
Even the results shown in the vcf/bed files are not definitive. You have to examine the bam file
and use judgement to see the probabilities. This is unfortunate but true.
Reply With Quote
  #4  
Old 24th April 2014, 07:33 AM
sailingdeac sailingdeac is offline
FTDNA Customer
 
Join Date: Feb 2006
Posts: 21
totally meaningless answer for most

Except for those with considerable expertise, your answer tells little. Explain "definitive" as is applies here. Let's say one examines the bam files, uses "judgement" and finds the same situation I described. What does that say - that the WTY testing was "definitive" and and the Big Y not? Or that I used poor "judgement"?
Reply With Quote
  #5  
Old 24th April 2014, 08:30 AM
dtvmcdonald dtvmcdonald is offline
mtDNA: | Big Y Pending
 
Join Date: Mar 2011
Posts: 178
Unfortunately (there's that word again) none of these tests
may be 100% reliable in a given case. Even the WTY, even
the $39 tests. As to the latter ... until the BigY came
in I thought I was the only L175+ person ever tested.
BigY found another. A look by FTDNA showed that the
$39 test had been read wrong. The bam file agreed.

I'll try to explain, at least for the BigY. The BigY
generates traces, similar to the $39 tests but these are
read entirely by machine and are not available to humans,
at least not normally. The bam file contains large numbers
of "reads" at each spot that the test covers. For Full Genomes that's most of the Y that has been sequenced
by any method, for BigY, about half of that. The reads
come in strips about 100 bases long, which are assembled
by the computer into a vast overlapping array. In many places there is one or more strips starting at each and every base.

Each strip is assigned a number which tells how sure the
computer is that the strip is in the right place. It also
assigns a number which assesses the probability that
a given base in the strip was read right. These are
very different ideas. I should add that the single
strip read by Sanger sequencing ($39 test) has the same
problems but being longer the location is surer.

The computer looks at the pile of strips and sees that at
one position all strips agree, all have both quality scores high, and assigns it confidently. It looks at another
base and sees say 90% one allele and 10% another. It has
to assess the quality scores to decide whether its the
90% call or a no-call. At 90-10 this should be easy, but apparently its not since I see differences between the
bam and vcf files for similar cases.

Judgement of a human can also come in. For example,
in the Clan Donald I found four mutations at
22270062, 22270127, 22271724 and 22271726 that are
calls in some people's vcfs and no-calls in other people's.
In every case the ones that were called were either
all ancestral or all derived in a given person. They appear in the genealogy tree at the same spot. I looked at them in
all the bam files I have received so far. It turns out that
all of the locations in all of the people with no-calls
were very close to being calls of the allele I expected.
I "judge" that in fact every person really IS either
+ or - for all four. This is using Bayesian statistics,
done in my head. Its important to us because
these occur at THE critical point in the genealogy,
and verify it.
Reply With Quote
  #6  
Old 24th April 2014, 09:32 AM
Ann Turner Ann Turner is offline
FTDNA Customer
 
Join Date: Apr 2003
Posts: 1,117
Thanks for the background information, Doug. This reminds me of a poster presentation about low concordance between different variant calling pipelines. It may be dated now, and it was particularly about exome sequencing, but it made it abundantly clear that the raw data isn't as clear-cut as we might envision.

http://lyonlab.cshl.edu/presentation...ng_poster2.pdf
Reply With Quote
  #7  
Old 24th April 2014, 10:26 AM
Kathleen Carrow Kathleen Carrow is offline
mtDNA: J2b1a1(a) | Y-DNA: I2a1a | Horse Person
 
Join Date: Apr 2006
Location: NC formerly NJ
Posts: 1,095
Quote:
Originally Posted by dtvmcdonald View Post
Unfortunately (there's that word again) none of these tests
may be 100% reliable in a given case.

I'll try to explain, at least for the BigY. The BigY
generates traces, similar to the $39 tests but these are
read entirely by machine and are not available to humans,
at least not normally. The bam file contains large numbers
of "reads" at each spot that the test covers. For Full Genomes that's most of the Y that has been sequenced
by any method, for BigY, about half of that. The reads
come in strips about 100 bases long, which are assembled
by the computer into a vast overlapping array. In many places there is one or more strips starting at each and every base.

Each strip is assigned a number which tells how sure the
computer is that the strip is in the right place. It also
assigns a number which assesses the probability that
a given base in the strip was read right. These are
very different ideas. I should add that the single
strip read by Sanger sequencing ($39 test) has the same
problems but being longer the location is surer.

.
Doug McDonald
Thank you for posting that..I think that was very important information
Reply With Quote
Reply

Bookmarks


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
NEW Y-DNA SNPs Rebekah Canada Announcements and New Features 1 30th December 2013 12:52 PM
snps andbro Family Finder Advanced Topics 1 30th July 2013 07:08 AM
PF SNPs 1798 The Genographic Project 14 21st February 2013 01:01 PM
List of SNPs tested by FTDNA? royfarnol DNA and Genealogy for Beginners 8 23rd May 2011 02:06 AM
Order of SNPs tested in deep clade test East-Iberian DNA and Genealogy for Beginners 5 24th July 2009 10:59 PM


All times are GMT -5. The time now is 09:41 AM.


Family Tree DNA - World Headquarters

1445 North Loop West, Suite 820
Houston, Texas 77008, USA

Phone: (713) 868-1438 | Fax: (832) 201-7147
Copyright 2001-2010 Genealogy by Genetics, Ltd.
Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.