Announcement

Collapse
No announcement yet.

Gedmatch: what is phased data?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Gedmatch: what is phased data?

    I tried the new Gedmatch.com utility "Generate Phased Data" using the raw data uploaded there from my both my father's and my "Family Finder" test.
    The Gedmatch accounts for my father's data and my data each have numerous matches but the two resulting phased data "accounts" each have only one match. That single match is to the same individual.

    What is the significance of this?




    (For anyone interested: the two phased accounts are: PF6931P1,PF6931M1)

  • #2
    This will enable you tell whether a relative is maternal or paternal. The P1 and M1 designations are for paternal and maternal DNA.


    Originally posted by Bertp View Post
    I tried the new Gedmatch.com utility "Generate Phased Data" using the raw data uploaded there from my both my father's and my "Family Finder" test.
    The Gedmatch accounts for my father's data and my data each have numerous matches but the two resulting phased data "accounts" each have only one match. That single match is to the same individual.

    What is the significance of this?




    (For anyone interested: the two phased accounts are: PF6931P1,PF6931M1)

    Comment


    • #3
      I still do not understand.

      Why is there only one match to each of the phased accounts?
      why is this one match in both phased accounts to the SAME individual?

      EDIT:
      I rechecked and now there are now many more matches in each phased account. I suppose either intermediate results were initially posted or an error has been corrected
      Last edited by Bertp; 23rd August 2012, 08:01 AM.

      Comment


      • #4
        I am a novice here so I may be getting this totally wrong, but I am still seeing some data that I would not expect

        for instance:

        one of the matches from my Dad's data (F249972)

        (M163446) tot cM = 29.6 L cM = 17.8

        and in my data (F6931)
        (M163446) tot cM =34.8 L cM = 29.7

        because my values are larger in both instances than my father's I would think that this would suggest that both parents are related to this individual and because my largest cM is greater than my father's largest cM, that my mother has likely inherited a larger matching sequence than my father.

        now for the phased results

        paternal (PF6931P1)
        (M163446) tot cM = 29.6 L cM =29.6
        total cM is equal to the value in my father test's match BUT largest cM is much greater in the phased results! That shouldn't be, right?
        Also,the fact that this value is exactly equal to the tot cM makes this old programmer a bit uneasy

        maternal (PF6931M1
        a match for M163446 does not exist, which is unexpected for reason explained previously.

        Comment


        • #5
          Okay, I looked at this, and it appears that the shared segment crosses the centromere, and the unphased data shows a break in the segments there for your father but not for you.

          If you use the Chromosome browser, you will see that if you add up the two broken segments on your father's DNA, the totals are
          29.655 for you and 29.637 for your dad. The phasing has just eliminated the discrepancy and smoothed out the differences, which is exactly what it should do.

          The other segment you share with this person, 5.2 cM and only 582 SNPs (under 700 is frequently meaningless) is revealed as noise by the phasing, and since it doesn't show up as maternal, it is just that, noise. The shared DNA is unambiguously paternal.

          One caveat here, with a match like this, a single long segment and no other shared DNA, the relationship is very likely a good deal more distant than projected.



          Originally posted by Bertp View Post
          I am a novice here so I may be getting this totally wrong, but I am still seeing some data that I would not expect

          for instance:

          one of the matches from my Dad's data (F249972)

          (M163446) tot cM = 29.6 L cM = 17.8

          and in my data (F6931)
          (M163446) tot cM =34.8 L cM = 29.7

          because my values are larger in both instances than my father's I would think that this would suggest that both parents are related to this individual and because my largest cM is greater than my father's largest cM, that my mother has likely inherited a larger matching sequence than my father.

          now for the phased results

          paternal (PF6931P1)
          (M163446) tot cM = 29.6 L cM =29.6
          total cM is equal to the value in my father test's match BUT largest cM is much greater in the phased results! That shouldn't be, right?
          Also,the fact that this value is exactly equal to the tot cM makes this old programmer a bit uneasy

          maternal (PF6931M1
          a match for M163446 does not exist, which is unexpected for reason explained previously.

          Comment


          • #6
            Thanks so much for the help Mark!

            by the way, I (but not my father) have a match with a 23andme user where both total cM and Longest block both = 68.7 cM! It does seem that I have more significant sized "Longest cM = Total cM" cases with data from other companies. Perhaps the different methods for handling segments crossing the centromere is part of the explanation?

            Comment


            • #7
              You're welcome. I honestly don't know what to make of a single segment that long and no other shared segments. Perhaps someone more knowledgeable can weigh in.

              Originally posted by Bertp View Post
              Thanks so much for the help Mark!

              by the way, I (but not my father) have a match with a 23andme user where both total cM and Longest block both = 68.7 cM! It does seem that I have more significant sized "Longest cM = Total cM" cases with data from other companies. Perhaps the different methods for handling segments crossing the centromere is part of the explanation?

              Comment


              • #8
                [QUOTE=Bertp;346258] ... the two resulting phased data "accounts" each have only one match. That single match is to the same individual.

                Do these two phased data "accounts" represent half the genome of your father that you share AND half the genome of your mother that you share (that can be deduced from your sharing with your father)?

                Comment


                • #9
                  Originally posted by tomcat View Post
                  Do these two phased data "accounts" represent half the genome of your father that you share AND half the genome of your mother that you share (that can be deduced from your sharing with your father)?
                  Yes that appears to be what was generated

                  Comment


                  • #10
                    Gedmatch.com

                    RE "I rechecked and now there are now many more matches in each phased account. I suppose either intermediate results were initially posted or an error has been corrected "

                    The gedmatch phasing is a two step process the first is the phasing of the
                    the data and the creation of the maternal (M1) and the paternal(P1) IDs,
                    this step takes a minute or two.
                    The second step is to match those two phased IDs with the full gedmatch
                    database, this step takes a lot of computing and may take up to 24 hours
                    to complete.

                    Comment


                    • #11
                      Can someone smarter than me explain what I'm seeing when I compare my m1 and p1 files to the gedmatch database, and how I interpret the total/longest cm numbers? I understand when I do my own data, but I'm confused on what I have when I use the phased files.

                      I phased myself and my father's tests.

                      Comment


                      • #12
                        Phased M1 and P1

                        When you look at the matches for M1 those are the ones that are on your Mother's side
                        When you look at the matches for P1 those are the ones that are on your Father's side

                        Thus in a perfect matching world the M1 plus P1 matches should equal the
                        matches you see from your unphased data.

                        But alas the matching is imperfect, especially for segments less than
                        12 cM long.
                        For unphased data I believe that 50 % of the matches at 7 cM are random and not real.
                        Once you have partially phased data the 50% point drops to 3.5 cM

                        So phasing should give you two things, one it eliminates many, but not all,
                        the random matches, and two it will tell help with whether the match is on your mother's side or your father's side.

                        The longest segment to fail to hold up after phasing is 14.7 cM long
                        at 10 cM 10% of the unphased segments will fail to match after phasing.

                        Comment


                        • #13
                          Interesting... So how would I interpret the following external match:

                          Me: 87cM/18cM
                          Father: 30cM/9.8cM
                          P1: 46.3cM/8.5cM
                          M1: 87cM/18cM

                          This is likely someone related to both of my parents, and the P1 has higher cM than my father alone because the phasing can't differentiate between my parents?

                          Comment


                          • #14
                            I did the same thing and didn't really know what P1 and M1 were. I had an idea Paternal/Maternal BUT I only uploaded me and my father so how could it generate a M1 file without my mothers autosomal file? Reason I have doubt is when I look at the admix for the M1 file it's clearly nothing to do with my mother. She does not have AmerIndian, Tibetan, South American etc... HOWEVER my dads mother does! Did this separate my dads autosomal from his mother?

                            Comment


                            • #15
                              I believe phasing in this case is just simple subtraction. I got 50% of my 22 non sex determining chromosome material from my mother and the other 50% from my father.

                              If I get my father is not available but I get my mother tested I can subtract all of the DNA that I share with her and what is left must have come from my father. That is how you get a Paternal and Maternal phased file.

                              The phased file that coresponds to the parent who was tested isn't all that useful since you'll have their full result already but the missing parents phased data is gold since there is no other way to get it in most cases. It is however only a partial picture.
                              Last edited by 1_mke; 31st August 2012, 12:06 AM.

                              Comment

                              Working...
                              X