Announcement

Collapse
No announcement yet.

Novel variants per generation = 1?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Novel variants per generation = 1?

    My 5th cousin just got his BIG Y results. Comparing our Novel Variants, we have 106 in common. Then there are 6 I have and he does not and 9 he has that I do not.

    Our common ancestor was born in 1743 and the sons we descend from were born in the early 1770s. This works out to around one change per generation for each of us. No way to know if these are all new or some cases where one of us reverted back to a prior variant.

    How does this sound?

    We think our common 11th and 12th cousins who tested also have results, but we have not seen them yet. Those would have the last common ancestor around 450 years ago. They are a father son pair and were 1 GD apart on previous Y tests.

  • #2
    John

    thanks - this is useful information for those of us doing mass comparisons at haplogroup/sub-group level.
    One question: have you cross checked your novel SNPs in your cousin's vcf file and vice versa? If one of you has a low quality ("rejected") result for one of the other's novel SNPs it may mean that the SNP in fact pre-dates your common ancestor.

    Gareth

    Comment


    • #3
      Its worse and better than what Gareth said.
      Worse because you really need to look at the bam file.
      This means you need to download and install a bam file
      viewer (I use Bamview) and download the 1 gigabyte bam file.

      Better because a look at the proper place in the files
      may show that they really match even if FTDNA is deeming
      one a no-call rather than just medium confidence.

      Their caller is being overly picky in some cases.
      If a snp, or worse indel, is near the end of a particular read segment , just by accident the remaining segment may
      realign with the "wrong" allele in place ... but a look
      at the bam will show that, in fact, the "right" allele
      also fits exactly equally well with the correct alignment:
      this means that you .. and the program .. should ignore
      that particular read. You can ignore, it can't retroactively.

      In other cases the caller is just being too picky
      for a haploid organism such as the Y chromosome.
      In the R1a Clan Donald I found five such SNPs that are phylogenetically equivalent at a critical moment in time,
      and some possibly represent one event. I did this
      by comparing all the vcf and bed files and examining
      by hand the bams of all cases of up to three nocalls out of
      19 people.

      Comment


      • #4
        Originally posted by GarethH View Post
        John

        thanks - this is useful information for those of us doing mass comparisons at haplogroup/sub-group level.
        One question: have you cross checked your novel SNPs in your cousin's vcf file and vice versa? If one of you has a low quality ("rejected") result for one of the other's novel SNPs it may mean that the SNP in fact pre-dates your common ancestor.

        Gareth
        That is a good idea! I need to look around the forum and figure out how to open the vcf in a readable format. I looked at the novel variants first because there are not so many of them. I presume looking at known SNPs comes next. When we get the more distant cousins results we may have an interesting timeline - 450 years, 250 years, and 25 years.

        Comment


        • #5
          Vcf and bed files are both plain Excel files. Open
          Excel and then load them in it.
          Note that comparing vcf files is not sufficient ...
          if two vcf files both call a particular location,
          that's it. But if one has a call and the other has
          no info on a location, you need to look in the bed
          file to see if it is a no-call or a call of the reference allele. The bed file has lines like

          7601335 7602044

          in it. This means that it successfully read locations
          7601336 through 7602044. Note the difference in the first number. This means it did NOT read 7601335. If it
          successfully read a location and that location is not
          in the vcf file, its the reference allele.

          Doug McDonald

          Comment


          • #6
            thanks Doug

            I note I can also read the vcf file with wordpad and with the Integrative Genomics Viewer, which is cool but not much handier for this task.

            http://www.broadinstitute.org/software/igv/download

            I have not looked at the bed file yet.

            What I have is 6 cases where I have a variant and my cousin does not, 4 cases where my cousin has a variant and I do not, and 5 cases where I have a rejected and my cousin has a variant.

            I am puzzled by the quality value. Is there a cut off to reject? Most of the values seem to be 500 or less but some are as high as the pass values. Or I may be reading this wrong.

            Comment


            • #7
              Originally posted by dtvmcdonald View Post
              Vcf and bed files are both plain Excel files. Open
              Excel and then load them in it.
              Note that comparing vcf files is not sufficient ...
              if two vcf files both call a particular location,
              that's it. But if one has a call and the other has
              no info on a location, you need to look in the bed
              file to see if it is a no-call or a call of the reference allele. The bed file has lines like

              7601335 7602044

              in it. This means that it successfully read locations
              7601336 through 7602044. Note the difference in the first number. This means it did NOT read 7601335. If it
              successfully read a location and that location is not
              in the vcf file, its the reference allele.

              Doug McDonald

              Of the 6 locations I have data and my 5th cousin does not

              1 is in a gap between BED file entries
              3 are the first position of a Bed entry and therefore not read
              2 are in the middle of a BED segment but not in the VCF, therefore a reference allele value.

              So 2 'real' changes and 4 that might be related to the reading?

              Of the 9 locations my cousin has in the VCF and I do not have

              5 are in my VCF as Rejected -
              1 is in a gap between BED file entries
              1 are the first position of a BED file entry
              2 are in the middle of a BED segment but not in the VCF

              So again 2 'real' changes.

              I guess the 'real' changes could either be becoming a novel variant or losing a novel variant.

              Maybe when I have the 11th and 12th cousin results that will be clearer?

              Right now, would it be true to say that each of the two descendant lines has had 2 changes since 1743, the birth of the common ancestor?

              Comment


              • #8
                The below is based on my understanding:

                There should be 130 mutations per generation, that's from father to son. So, for Y chromosome alone, it is approx ~ 130* 59 mil/3.2 bil = 2.396875. So, there should be around 2 to 3 novel variants from father to son. Novel variants are not SNPs but mutations specific to each person or within a family. To consider a mutation as a SNP, I think it must be around 0.05% in the population of 500k.

                Given the fact I have 450 Novel Variants compared to you who just have around ~106 Novel variants, many of my Novel variants are potential SNPs yet to be discovered - as the database don't have significant Asian/Indian population to slip into the 0.5% frequency.

                Comment


                • #9
                  Originally posted by felix View Post
                  The below is based on my understanding:

                  There should be 130 mutations per generation, that's from father to son. So, for Y chromosome alone, it is approx ~ 130* 59 mil/3.2 bil = 2.396875. So, there should be around 2 to 3 novel variants from father to son. Novel variants are not SNPs but mutations specific to each person or within a family. To consider a mutation as a SNP, I think it must be around 0.05% in the population of 500k.


                  Given the fact I have 450 Novel Variants compared to you who just have around ~106 Novel variants, many of my Novel variants are potential SNPs yet to be discovered - as the database don't have significant Asian/Indian population to slip into the 0.5% frequency.
                  There's been a trend away from using the term polymorphism, with its connotation of a certain frequency in some population. (For autosomal DNA, 1% was a typical number.) In fact, dbSNP now uses the term SNV (Single Nucleotide Variant). They haven't changed the name of the database, though, and we'll probably continue to use the term SNP indefinitely, too.

                  The ISOGG criteria for adding a SNP to the tree is in the process of revision. The frequency criterion is very difficult to demonstrate. The focus will be on demonstrating a certain amount of variability within the new subclade, so you won't be able to add a SNP that is found only in you and your closest relatives.

                  Your calculation about the number of variants arising between father/son is based on the entire length of the Y. Not all of the Y has even been mapped, and we won't be able to observe that number of mutations (at least for the foreseeable future).

                  And yes, you're right -- you have more "novel" variants because the database doesn't contain a large enough sample of Asian/Indian populations.

                  Comment


                  • #10
                    Originally posted by felix View Post
                    So, for Y chromosome alone, it is approx ~ 130* 59 mil/3.2 bil = 2.396875. So, there should be around 2 to 3 novel variants from father to son.
                    So my single data point and your calculation agree. Not bad for starters.

                    If a mutation happened 2000 years ago it has had time to spread to lots of descendants and show up in a large region.

                    If it happened 600 years ago it would not have spread so much but it might be a proto-genealogical link to location of ancestors in a time before good records and current surname systems.

                    To me the Big Y and similar tests function on several levels - they can fill in the big picture of haplotypes over thousands of years, they can fill in some regional change and population movement, and they can help us identify and probe clans and families.

                    When I can come up with the novel variant number for my more distant surname cousin if I come up with say 4 variants per generation with the presumed common ancestor in the 1500s, it might mean the surname is older and the ancestors spread out maybe 200-400 years earlier - the presumption would be unlikely. That might make for interesting genealogical research. If it comes up around 2, the estimate is more likely to be right.

                    Comment


                    • #11
                      Known SNPs vs. Novel Variants

                      What is the difference in Big Y testing between Known SNPs and Novel Variants in the testing that is done and in the way that the calls are made? Is it simply that some variants are on the known SNP list and are reported that way, or is there more to it?

                      Comment


                      • #12
                        Originally posted by morrisondna View Post
                        What is the difference in Big Y testing between Known SNPs and Novel Variants in the testing that is done and in the way that the calls are made? Is it simply that some variants are on the known SNP list and are reported that way, or is there more to it?
                        The difference is in the frequency of occurrence in population. SNP is widely distributed while Novel Variants is restricted to either you and/or your close relatives.

                        Comment


                        • #13
                          Felix, so are you saying that the Big Y test is run and then whatever results are found are then separated into Known SNPs and Novel Variants?

                          In other words, is there any special effort put forth to find the values of Known SNPs as part of Big Y testing, or are the Known SNPs just as likely to be missed as Novel Variants?

                          Thanks...

                          Comment


                          • #14
                            My understanding is that if the SNP was in the FTDNA SNP database then it appears in the SNP list. Otherwise it's in CURRENTLY on the novel variants list.

                            Earl.

                            Comment


                            • #15
                              Originally posted by felix View Post
                              The difference is in the frequency of occurrence in population. SNP is widely distributed while Novel Variants is restricted to either you and/or your close relatives.
                              Felix, that's not the case, at least not right now. Earl has the correct answer:

                              Originally posted by Earl Davis View Post
                              My understanding is that if the SNP was in the FTDNA SNP database then it appears in the SNP list. Otherwise it's in CURRENTLY on the novel variants list.
                              Further to Earl's answer:

                              There are many SNPs currently on the Novel Variants list that are actually high up on the tree and will be moved to Known SNPs eventually. There are also Novel Variants that will be found to define new subclades, so they'll be assigned SNP names and will be moved to Known SNPs as well. Of course, there will also be SNPs in Novel Variants that are "private" or only found in a family or small group of people.

                              Elise

                              Comment

                              Working...
                              X