Announcement

Collapse
No announcement yet.

SNP technical question

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SNP technical question

    If one looks at a sequence of SNPs (realizing of course that there are typically a lot of dna residues between the SNPs, which for the purposes of this discussion we can ignore), what are the details of how they are arranged?

    Let me explain my question a little better by example. Suppose a series of SNPS looks like this:

    ...AGATTACCCATAGT...
    ...ATATTCCCCATAGT...

    Are all of the bases on the top line from one chromosome from one chromosome and all of the bases on the bottom from the other chromosome? Or are they randomly distributed.

    When I say randomly distributed I mean the following, taking the top series: Let us say the first one, an "A", comes from chromosome 21M (where 21M means chromosome 21 inherited from the father), and the next residue, a "G" comes from 21F (inherited from the mother), perhaps the third one comes from 21F, and the fourth one comes from 21M, and so forth. That would be random.

    Non random would be all of the top residues come from, let us say 21M and all of the bottom ones come from 21F, or alternatively all of the top ones come from 21F and all of the bottom ones come from 21M.

    If my explanation of what I mean is not clear enough please comment, and I will try to explain it better.

    One reason why this can be significant is that if one is writing some do-it-yourself SNP comparison software it is important to know which method is used in reporting the SNPs.

    Thanks.

  • #2
    This article from DNAeXplained should answer some of your questions.

    https://dna-explained.com/2017/01/19...false-matches/

    More information is available by reading articles on autosomal DNA phasing.

    Comment


    • #3
      Originally posted by jimbirk View Post
      This article from DNAeXplained should answer some of your questions.

      https://dna-explained.com/2017/01/19...false-matches/

      More information is available by reading articles on autosomal DNA phasing.
      Thanks. I will read that article to see if it helps.

      Comment


      • #4
        Originally posted by massmanute View Post
        Thanks. I will read that article to see if it helps.

        This might be more like what you are looking for:


        https://segmentology.org/2018/05/02/...ut-base-pairs/

        Jack

        Comment


        • #5
          Originally posted by massmanute View Post
          Thanks. I will read that article to see if it helps.
          I do not think the article directly answers your question.

          I am just an amateur genealogist, however, it is my understanding that the "flipping" occurs not on individual pairs, but in small segments.

          It is probably one of the highly advanced technologies in common use, that only those who work in the area are able to have a full and clear understanding, since there just too many prerequisites.

          Decades ago popular science journals were trying to explain transistors, then computers and integrated circuits. Nowadays, a tiny percentage of users would be able to understand details of their smartphone internals (if they wanted to).


          Mr. W.

          Comment


          • #6
            Let's use some actual raw DNA data. Here is part of my raw DNA file for chromosome 1 from FTDNA (please don't clone me! ):

            "rs7526076","1","998395","AG"
            "rs3766192","1","1017197","CC"
            "rs3766191","1","1017587","CT"
            "rs9442372","1","1018704","AA"
            "rs10907177","1","1021346","AG"
            "rs3737728","1","1021415","AG"
            "rs10907178","1","1021583","AC"
            "rs9442398","1","1021695","AG"
            "rs9442400","1","1025301","CC"

            In each pair you cannot tell which allele is maternal or paternal.

            In position "998395" anyone with a "AA", "AC", "AG" or "AT" will match my "AG" because the "A" matches.

            In position "1017197" anyone with a "AC", "CC", "CG" or "CT " will match my "CC" because the "C" matches.

            This is why you want a minimum long run of matches to avoid false matches.

            The other way to minimize false matches is to first use an algorithm to phase the DNA into maternal and paternal alleles.

            Comment


            • #7
              Originally posted by jimbirk View Post
              Let's use some actual raw DNA data. Here is part of my raw DNA file for chromosome 1 from FTDNA (please don't clone me! ):

              "rs7526076","1","998395","AG"
              "rs3766192","1","1017197","CC"
              "rs3766191","1","1017587","CT"
              "rs9442372","1","1018704","AA"
              "rs10907177","1","1021346","AG"
              "rs3737728","1","1021415","AG"
              "rs10907178","1","1021583","AC"
              "rs9442398","1","1021695","AG"
              "rs9442400","1","1025301","CC"

              In each pair you cannot tell which allele is maternal or paternal.

              In position "998395" anyone with a "AA", "AC", "AG" or "AT" will match my "AG" because the "A" matches.

              In position "1017197" anyone with a "AC", "CC", "CG" or "CT " will match my "CC" because the "C" matches.

              This is why you want a minimum long run of matches to avoid false matches.

              The other way to minimize false matches is to first use an algorithm to phase the DNA into maternal and paternal alleles.
              That's very helpful. Thanks.

              Comment


              • #8
                Allele associations

                Originally posted by massmanute View Post
                That's very helpful. Thanks.
                I had the same question if the allele1 and allele 2 are each associated with a single chromosome (maternal/paternal or paternal/maternal). I have not been able to find any info on this and you seem to have the same question. I actually compared my FTDNA and Ancestry raw data and got the 2 sets of alleles to match, or match in reverse. When I tried it with 23andMe, it all fell apart. However, with FTDNA and Ancestry lining up, it is very suggestive that allele1 is either from the father or mother chromosome and allele2 is the reverse. Of course, this could vary from chromosome-to-chromosome, but did not change in my comparison. I found it a bit strange that they matched up with the data coming from different labs (and yes, I ignored SNPs that were not in common). So have you gotten any more info about this?

                Comment


                • #9
                  Originally posted by Bill_VT View Post
                  I had the same question if the allele1 and allele 2 are each associated with a single chromosome (maternal/paternal or paternal/maternal). I have not been able to find any info on this and you seem to have the same question. I actually compared my FTDNA and Ancestry raw data and got the 2 sets of alleles to match, or match in reverse. When I tried it with 23andMe, it all fell apart. However, with FTDNA and Ancestry lining up, it is very suggestive that allele1 is either from the father or mother chromosome and allele2 is the reverse. Of course, this could vary from chromosome-to-chromosome, but did not change in my comparison. I found it a bit strange that they matched up with the data coming from different labs (and yes, I ignored SNPs that were not in common). So have you gotten any more info about this?
                  Note, when comparing Different companies Raw Data files, compare with following in mind

                  1) Each SNP has a forward and a reverse orientation value.
                  Each single chromosomes of each pair is a double helix of DNA. If one side is "A" the other side is "T", if one side is "C" the other side is "G". One side is forward orientation, the other side reverse orientation.
                  Test only reads one side of Double Helix.
                  Sometimes the value is given in forward orientation, other times in Reverse Orientation.
                  ie)One company may list values as AG, the other company as TC. In this instance it is the same thing, just one is values in forward orientation and the other in Reverse orientation.

                  2) Use position numbers to compare, not the SNP name/RSID. Depending on the RSID Build number each company uses, the RSID names may vary. SNP/RSIDs are merged/renamed with each new RSID build (not referring to Build 36 or Build 37 which refers to positions in file)

                  3) Make sure you are using the same Build Number for each company when comparing the position numbers.
                  23andme and Ancestry Raw Data is Build 37
                  FamilyTreeDNA offers both Build 36 and Build 37 Raw DAta.
                  Gedmatch is Build 36
                  Genesis is Build 37

                  EDIT
                  In regards to allele 1 or 2 being maternal or paternal.

                  DNA when tested is broken into fragments, There may possibly be runs on Allele 1 or Allele 2 per chromosome that will be maternal or paternal, but not all will be, it just depends on size of fragment read. Generally it will flip back and forth between which is Allele is maternal vs paternal along each chromosome.
                  Test does not read each single chromosomes separately.

                  Only way to truly determine is to phase results with at least on parent.
                  Last edited by prairielad; 6th August 2018, 08:14 PM.

                  Comment


                  • #10
                    Ordered alleles?

                    prairielad, Your points about comparing are useful. However, for the FTDNA and Ancestry, the tests at all matched locations were identical. You point might address the 23andMe difference that I saw, saying they are not that different. At first I thought the order might always be alphabetical, but they were not. I found combinations of all orders of all 4 proteins. I will consider your point if I do more comparisons. A question might be is there a simple description of the measurement process used for the chromosomes online? That might at least answer the question in part for me. I am not as concerned about which allele might be associated with which parent, but rather if one allele is associated with one parent and the other with the other parent. A phasing in a sense. I might check the raw data files of my wife and MIL. That might shed some light on it for me since she should share half of her DNA with her mom.

                    If I understand your edit, fragments of the chromosomes are measured (can associate location by surrounding bps). Then the data is put together, but the results for the 2 chromosomes may get mixed as a results file. I may check if there are tours to demonstrate the measurement at the university. That might help my understanding (not details, just basic process that I have not been able to find).

                    Thanks for the feedback.

                    Comment


                    • #11
                      allele associations

                      I ran a comparison of my wife and MIL (my parents long dead) and have obtained a conclusion in part. The results are for chromosome 1 (the rest should be similar). Comparing allele 1 of my wife to both of those of my MIL showed a random match (no long segments). The result was similar for allele 3 of my wife. When I checked for the half-match, comparing both sets of alleles, the match was complete for basically all positions. SO: The alleles of your raw data are not each associated with mom or dad, but are related randomly. Why the ordering of my raw data from both FTDNA and Ancestry are essentially the same is not clear. They are not ordered alphabetically since I have all but 4 combinations of the potential orders of the proteins as non-zero (should be 6 if alphabetic). Here they are:

                      AA 114933
                      AC 19502
                      AG 84127
                      AT 136
                      CA 0
                      CC 134394
                      CG 191
                      CT 0
                      GA 0
                      GC 190
                      GG 133723
                      GT 0
                      TA 121
                      TC 84237
                      TG 19184
                      TT 115316

                      alpha 103956
                      alpha rev 103732
                      equal 498366
                      total 706054

                      I can't address they why of the results, but at least it answers the original question I think.

                      Here are the plots in the attachments. For the allele comparison the match is for x.5 and mismatch at x.0. For the match plot, the match is 1.0 and mismatch is 0.0. The conclusion is rather obvious that for whatever reason, the alleles are not ordered by a specific chromosome.
                      Attached Files

                      Comment

                      Working...
                      X