Announcement

Collapse
No announcement yet.

Phasing an X using sibling data ???

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Phasing an X using sibling data ???

    I have asked this question in a couple other forums and received no answer or suggestions, so hoping for at least a little conversation from folks here.

    How would I go about phasing an X? This is not your typical parent-child phasing, but I think it could be done with the data I have. I've looked at gedmatch and the Genetic Genealogy Tools at y-str and I don't see a utility that fits my needs. I don't have much experience with any of the tools yet, so perhaps I am missing something or I need to do it in baby steps. Here is my situation ---

    I have 4 full siblings, a female (x from Mom & x from Dad) and 3 males (x only from Mom). They are the only surviving members of the oldest generation in my family tree (i.e. this is as good as the data is going to get). These are FTDNA kits I am working with.

    The female and the youngest male (lets call him M3) have absolutely no X chromosome matches. This suggests to me they inherited fully intact and opposite X chromosomes from Mom. The only other possibility I can think of is Mom's X crossed-over at exactly the same location twice, which seems unlikely, but probably not impossible. After studying the siblings in a chromosome browser, I am pretty sure they are un-crossed Xs.

    The other two male siblings (M1 & M2) share large sections of their X with the female and M3. I estimate between M1 & M2, I have ~80% coverage of the female's maternal X.

    So, I believe I have an entire X chromosome from M3 that is maternal and ~80% of the 2nd maternal X from the other 3 offspring.

    Now, if I can identify ~80% of the female's maternal X, it stands to reason I can also identify ~80% of the female's paternal X.

    I am a visual learner, so I created a colorized pedigree and custom colored chromosome graphics to help me understand this. I have posted them on dropbox https://dl.dropboxusercontent.com/u/...s-X_chart2.png (pedigree) and https://dl.dropboxusercontent.com/u/...ns_X_chart.jpg (chromosome) if anyone is interested.

    If anyone sees a flaw in my logic or has more experience with this please speak up.

    I understand how to use this X chromosome knowledge and triangulation to locate matches, but at gedmatch It seems like it would be a great time saver to have separate kits to work with. The paternal grandmother is a mystery woman but I have narrowed it down to two women. My ultimate goal is to identify her dna signature. Having ~80% of her X seems like a good way to begin building a virtual ancestor.

    How do I go about phasing the family Xs ????? All ideas welcome

  • #2
    Originally posted by Canyon Wolf View Post
    I have asked this question in a couple other forums and received no answer or suggestions, so hoping for at least a little conversation from folks here.

    How would I go about phasing an X? This is not your typical parent-child phasing, but I think it could be done with the data I have. I've looked at gedmatch and the Genetic Genealogy Tools at y-str and I don't see a utility that fits my needs. I don't have much experience with any of the tools yet, so perhaps I am missing something or I need to do it in baby steps. Here is my situation ---

    I have 4 full siblings, a female (x from Mom & x from Dad) and 3 males (x only from Mom). They are the only surviving members of the oldest generation in my family tree (i.e. this is as good as the data is going to get). These are FTDNA kits I am working with.

    The female and the youngest male (lets call him M3) have absolutely no X chromosome matches. This suggests to me they inherited fully intact and opposite X chromosomes from Mom. The only other possibility I can think of is Mom's X crossed-over at exactly the same location twice, which seems unlikely, but probably not impossible. After studying the siblings in a chromosome browser, I am pretty sure they are un-crossed Xs.

    The other two male siblings (M1 & M2) share large sections of their X with the female and M3. I estimate between M1 & M2, I have ~80% coverage of the female's maternal X.

    So, I believe I have an entire X chromosome from M3 that is maternal and ~80% of the 2nd maternal X from the other 3 offspring.

    Now, if I can identify ~80% of the female's maternal X, it stands to reason I can also identify ~80% of the female's paternal X.

    I am a visual learner, so I created a colorized pedigree and custom colored chromosome graphics to help me understand this. I have posted them on dropbox https://dl.dropboxusercontent.com/u/...s-X_chart2.png (pedigree) and https://dl.dropboxusercontent.com/u/...ns_X_chart.jpg (chromosome) if anyone is interested.

    If anyone sees a flaw in my logic or has more experience with this please speak up.

    I understand how to use this X chromosome knowledge and triangulation to locate matches, but at gedmatch It seems like it would be a great time saver to have separate kits to work with. The paternal grandmother is a mystery woman but I have narrowed it down to two women. My ultimate goal is to identify her dna signature. Having ~80% of her X seems like a good way to begin building a virtual ancestor.

    How do I go about phasing the family Xs ????? All ideas welcome
    These are always really fun puzzles. To me the most likely explanation is that the mother made 7 crossovers. The crossover for her daughter, F and her son, M3 are so close that you can't see the difference so it appears to be a double crossover. You would have to look at the raw data in the vicinity of the nearby loci to see where the mismatches are. Mom made three crossovers for M1 and two for M2. There was one each for M3 and F in the same or a nearby location at the second cut from the left. So draw a new black bar immediately adjacent to the second cut.

    Going from left to right, Mom made the first crossover for M1, the second and third in the same location for F and M3, the 4th for M2, the 5th for M1, the 6th for M2 and the last for M1. She makes these crossovers independently for each child. Finding cousins who match these segments might confirm which side of the family (maternal grandfather or maternal grandmother) these came from. Biogeographical analysis of matches or painting the X would help.

    My hypothesis so far is that the female and M3 do not each have unrecombined X chromosomes but that the first ~6th comes from opposite spousal grandparents and the rest of the X comes from the other maternal grandparent. But see if you can come up with a different explanation.

    You need to label each cut to see which child is the most likely to have received that crossover. Then label each segment as to grandparent it may have come from. Right now all you can say is grandparent one and grandparent two. You won't know which is maternal grandfather and which is maternal grandmother but those are the only two choices. Go back and reconstruct the G1 and G2 match for each child's X chromosome.
    Last edited by Kathy Johnston; 2 April 2014, 09:55 AM.

    Comment


    • #3
      Attached picture roughly shows how I phased my father's and two of his siblings X's in a spreadsheet. My full spreadsheet actually contains, myself and 5 other siblings as well as my Mother and two of her siblings.

      Comparing 4 siblings raw data should give you a better picture/understanding of where the different grandparents X (mothers maternal and paternal X) are recombined in each sibling. For me it was easier then trying to figure out by comparing shared sections from the chromosome browser.

      Red and yellow columns represent where siblings match one another, yellow match, red no match, green match on both maternal and paternal.

      By comparing raw data you will probably see that what looks like a break in the same position between siblings is actually not, few 100 SNP position difference.

      Second picture is representation of my father and his siblings x base on my spreadsheet with representation of FTDNA's chromosome browser below it.
      Attached Files
      Last edited by prairielad; 18 August 2014, 03:09 PM.

      Comment


      • #4
        Originally posted by Kathy Johnston View Post
        These are always really fun puzzles. To me the most likely explanation is that the mother made 7 crossovers. The crossover for her daughter, F and her son, M3 are so close that you can't see the difference so it appears to be a double crossover. You would have to look at the raw data in the vicinity of the nearby loci to see where the mismatches are. Mom made three crossovers for M1 and two for M2. There was one each for M3 and F in the same or a nearby location at the second cut from the left. So draw a new black bar immediately adjacent to the second cut.

        Going from left to right, Mom made the first crossover for M1, the second and third in the same location for F and M3, the 4th for M2, the 5th for M1, the 6th for M2 and the last for M1. She makes these crossovers independently for each child. Finding cousins who match these segments might confirm which side of the family (maternal grandfather or maternal grandmother) these came from. Biogeographical analysis of matches or painting the X would help.

        My hypothesis so far is that the female and M3 do not each have unrecombined X chromosomes but that the first ~6th comes from opposite spousal grandparents and the rest of the X comes from the other maternal grandparent. But see if you can come up with a different explanation.

        You need to label each cut to see which child is the most likely to have received that crossover. Then label each segment as to grandparent it may have come from. Right now all you can say is grandparent one and grandparent two. You won't know which is maternal grandfather and which is maternal grandmother but those are the only two choices. Go back and reconstruct the G1 and G2 match for each child's X chromosome.
        For example F, M1, M2, M3 might have received the following from Grandparent 1 (G1) and Grandparent 2 (G2)
        G1G1G1G1G2G2G2G2G2G2G2G2G2G2G2G2G2G2G2G2G2G2G2G2G2G2G2;
        G2G2G1G1G1G1G1G1G1G1G1G1G1G2G2G2G2G2G2G2G2G2G2G2G2G2G1;
        G1G1G1G1G1G1G2G2G2G2G2G2G2G2G2G2G2G2G2G2G2G2G1G1G1G1G1;
        G2G2G2G2G1G1G1G1G1G1G1G1G1G1G1G1G1G1G1G1G1G1G1G1G1G1G1
        Last edited by Kathy Johnston; 2 April 2014, 09:58 AM.

        Comment


        • #5
          Need more details please

          Originally posted by Kathy Johnston View Post
          To me the most likely explanation is that the mother made 7 crossovers.
          You seem to be basing you explanation solely on the graphic. But you need to remember I created this with cut & paste from several different screen shots and it is certainly NOT exact. The start-stop data does not suggest there is any gap at all between Female and M3. I realize this could be due to errors, etc. - but is it? Why is your explanation better than mine?

          I am not trying to be confrontational, just trying to understand and learn.

          It was suggested "Biogeographical analysis of matches or painting the X would help". What are you referring to as "Biogeographical analysis", where so I find this tool, and in what way will it help? How is chromosome painting different from the chromosome browser. How do I compare painted chromosomes and what am I looking for once I paint a chromosome?

          Thanks for taking the time to respond.

          Comment


          • #6
            Wow - looks a bit intimidating!

            Originally posted by prairielad View Post
            Attached picture roughly shows how I phased my father's and two of his siblings X's in a spreadsheet.
            prairielad

            I think I understand what you are doing. Interesting.

            The red & yellow columns for the sibling matches - you are getting this data from match tables and coloring it by hand?

            You are manually assigning each location (snp) of each sibling to either maternal or paternal based on sex and matching criteria (or have you come up with a formula to calculate this for you???) and then extending that to the two maternal grandparents based on sibling matching.

            Looks like a lot of work! But if it gets me where I want to go I am willing to give it a try.

            Thanks for your help.

            Comment


            • #7
              Extracting the Paternal X

              I did something similar using my Dad vs 2 female siblings, but my goal was to extract the paternal X chromosome that was received by my aunts.

              Wherever a male and female sibling match on the x-chromosome, you know that they share one common maternal allele. Therefore, you can deduce the paternal allele at each position. (i.e. male G, female AG - therefore paternal allele is A)

              I wish I knew how to make an Excel macro for this but it was still not too bad doing this manually.

              I plan on asking the Gedmatch guys if I can upload this data as a research profile.
              Any matches on the x with this research profile would be a more precise confirmation that the connection is from my paternal grandfather's side of the family. (I hope)

              Comment


              • #8
                Originally posted by Canyon Wolf View Post
                prairielad

                I think I understand what you are doing. Interesting.

                The red & yellow columns for the sibling matches - you are getting this data from match tables and coloring it by hand?
                I use conditional formatting in those columns based on value entered from a separate spreadsheet. In my separate spreadsheet, I enter the start stop points of all the shared segments between 2 kits (full identical segments and half identical segments) and it assigns a value to each position based on whether it is full, half or no match. I then copy and paste results into main spreadsheets conditionally formatted columns which automatically display color based on values entered.

                I use David Pike's Utility Search for Shared DNA Segments in Two Raw Data Files for determining these start stop points
                http://www.math.mun.ca/~dapike/FF23utils/pair-comp.php

                You are manually assigning each location (snp) of each sibling to either maternal or paternal based on sex and matching criteria (or have you come up with a formula to calculate this for you???) and then extending that to the two maternal grandparents based on sibling matching.

                Looks like a lot of work! But if it gets me where I want to go I am willing to give it a try.

                Thanks for your help.
                A bit of both, formulas and manually.
                My initial template separates out, to the left and right of Result columns, any homologous alleles using a basic formula like =IF(G4=H4,G4,"-") for column on left of results and =IF(F4=G4,H4,IF(F4=H4,G4,"-"))for column on right of results. Columns G and H represent the individual values for that position. Spreadsheet is conditionally formatted to highlight in red any values returned as - (no call/undetermined)for visually locating.

                I then manually enter different formulas throughout based on shared sections and autofill the segments down. Manually entering values when needed when my basic formulas miss figuring out certain positions.

                It usually takes me an evening to set up spreadsheet, entering all the values (mine have 13 family members-mother, maternal aunt, maternal uncle, myself and 5 siblings, father, paternal uncle, paternal aunt, and my father's paternal 1st cousin who results help determine my which is my father's maternal and paternal chromosome)
                And another few days(depending on size of chromosome) manually scrolling through and entering values and adjusting formulas where needed

                It is a bit of work, especially on the larger chromosomes.
                There is probably a lot easier way but I only know how to use basic formulas in excel.

                Comment


                • #9
                  Originally posted by Canyon Wolf View Post
                  You seem to be basing you explanation solely on the graphic. But you need to remember I created this with cut & paste from several different screen shots and it is certainly NOT exact. The start-stop data does not suggest there is any gap at all between Female and M3. I realize this could be due to errors, etc. - but is it? Why is your explanation better than mine?

                  I am not trying to be confrontational, just trying to understand and learn.

                  It was suggested "Biogeographical analysis of matches or painting the X would help". What are you referring to as "Biogeographical analysis", where so I find this tool, and in what way will it help? How is chromosome painting different from the chromosome browser. How do I compare painted chromosomes and what am I looking for once I paint a chromosome?

                  Thanks for taking the time to respond.
                  I agree, there can be more than one explanation, especially when you only have four sibs. So lets try it.
                  As you say it is unlikely that mom crossed over twice in the same place but it has to happen somewhere. It is uncommon but there can be no other explanation that I can find so far but this time it is between the other children.
                  This works too. F, M1, M2, M3 might have received the following from Grandparent 1 (G1) and Grandparent 2 (G2)


                  F--- G1G1G1G1G1G1G1G1G1G1G1G1G1G1G1G1G1G1G1G1G1G1G1G1G1G1G1;
                  M1-- G2G2G1G1G2G2G2G2G2G2G2G2G2G1G1G1G1G1G1G1G1G1G1G1G1G1G2;
                  M2-- G1G1G1G1G2G2G1G1G1G1G1G1G1G1G1G1G1G1G1G1G1G1G2G2G2G2G2;
                  M3-- G2G2G2G2G2G2G2G2G2G2G2G2G2G2G2G2G2G2G2G2G2G2G2G2G2G2G2;

                  Here the mom had to make a crossover at the same site for M1 and M2 in the second slice position but it is invisible in the browser because there simply is a match but with two different grandparents back to back. Can you see where the parent makes a same-site crossover between two children, again?

                  Here there are two children that got no crossovers at all. One received 4 crossovers and another one received 3 crossovers but these two have one crossover in the same place.

                  So it is a matter of one, one, two and three crossovers in my first attempt versus none, none, 3 and 4 in the second scenario. Sometimes you just have to try the most likely scenario first. But you also need to prove these hypotheses with actual raw data and matching cousins, aunts, uncles if you don't have grandparents.

                  Right now we don't have good admixture tools for the X unless Doug MacDonald is providing it. Gedmatch has admixture tools for the autosomes only. I have had some success at 23andMe with their ancestry composition, speculative and their mapping on the X chromosome of Countries of Ancestry. But you have to have specific countries identified on the X. I am lucky because one X from my mother is some sort of German/Baltic/S.European and the other is British Isles (if you can believe any of it). We need better software tools. Hint, hint.

                  Comment


                  • #10
                    Originally posted by Canyon Wolf View Post
                    You seem to be basing you explanation solely on the graphic. But you need to remember I created this with cut & paste from several different screen shots and it is certainly NOT exact. <snip>
                    Just to let you know, I LOVE your graphic because I could assign at least 5 crossover points to individuals in 5 minutes. It is a great screening tool when you only have siblings, but it does not replace matching data with more distant relatives. This is the link you gave and I would like to use this picture in a PowerPoint presentation if you don't mind.
                    https://dl.dropboxusercontent.com/u/...ns_X_chart.jpg
                    You don't have to have exact lineups, as long as it is close.

                    When you compare M1 to his sibs you can immediately tell that there are three obvious crossover points his mom made for him. There are two obvious cuts when M2 is compared to the other sibs. Five crossover points can be assigned right off the bat because you have stacked these four browser results on top of each other. This is what I would call a quick screening test. Therefore using browser data alone is nothing to sneeze about. Sometimes there are holes in the data because of SNP poor regions or errors so it cannot always be dependable. But it is a start in some cases. You can come up with an hypothesis that needs more evidence to prove it.

                    The only difficult assignment was with the second thin vertical bar below in your image. My first inclination was that Female and M3 had received cuts in the same place. An alternative assessment was that M1 and M2 got their cuts in the same place instead creating matching haploblocks back to back that came from two grandparents. But two of the sibs had to receive these cuts. That is where reconstruction of the grandparents is valuable even if there are two proposals (not just one) so far.

                    If some day in the future, M1 and M2 match one grandparent (G1) in the first half the combined block (in the middle of the small arm) and the other grandparent (G2) in the second half of the combined block then you know there was a cut made in the middle. If on the other hand, the sister (F) matches a close cousin in the first part of her X but M3 does not match that cousin until right at the cutoff location, then you can surmise that F and M3 both got cuts in their X from the mom around the same spot and one matches G1 first and the other matches G2 first. Then the crossover switch occurred near the center of the small arm to flip the matching to the other grandparent. These puzzles are not that easy to understand but when you are lucky to have such great images, it is so much less complicated to reconstruct what might have happened.

                    Comment


                    • #11
                      Yes - More tools please!

                      Oh yes, we do indeed need more tools

                      Still not sure what an admix/geographic analysis might contribute to a phasing question. Especially in this case where the 4 full sibling's ancestry is all British Isles as far back as the tree goes (and it goes a long way!). Their chromosome paintings produce a vary monotone and boring trip through the genome.

                      Not so sure I agree with your statement that a crossover has to happen someplace. Roberta Estes is of the opinion crossover does not occur on the X nearly as often as on the autosomes.

                      http://dna-explained.com/2014/01/23/...osome-that-is/

                      I think I will take a deep breath and attempt to phase using prairielad's spreadsheet example. Now that he has provided a few more details, I think it is worth a try.

                      Thanks every one for your input.

                      Comment


                      • #12
                        Permission Granted

                        "This is the link you gave and I would like to use this picture in a PowerPoint presentation if you don't mind."

                        Yes, certainly, you may use the graphic for educational purposes to help others understand this wonderful, interesting, and frustrating new frontier.

                        Comment


                        • #13
                          Originally posted by Canyon Wolf View Post
                          Oh yes, we do indeed need more tools

                          Still not sure what an admix/geographic analysis might contribute to a phasing question. Especially in this case where the 4 full sibling's ancestry is all British Isles as far back as the tree goes (and it goes a long way!). Their chromosome paintings produce a vary monotone and boring trip through the genome.

                          Not so sure I agree with your statement that a crossover has to happen someplace. Roberta Estes is of the opinion crossover does not occur on the X nearly as often as on the autosomes.

                          http://dna-explained.com/2014/01/23/...osome-that-is/

                          I think I will take a deep breath and attempt to phase using prairielad's spreadsheet example. Now that he has provided a few more details, I think it is worth a try.

                          Thanks every one for your input.
                          Admixture tools can sometimes show an abrupt change at a crossover point but that remains to be seen on these X chromosomes. Best to wait for the research to catch up. Not all British Isles admixture paintings are so homogeneous as demonstrated by the tools available at GEDmatch.

                          I meant that a crossover at position number two has to occur somewhere in these offspring and you have to account for it. It is visible so it exists. Who actually got that particular crossover from Mom? It looks like two sibs got that crossover nearby because you can't explain it any other way. We may not be able to tell which two sibs got it until you have a lot more information from matches but I would look to see if there are any unreported breaks between M1 and M2 that just happened to be tolerated by the software program.

                          It is also possible that chromatids can crossover at the very tips and not be noticed but I have never seen any data on that. Yes, we commonly see what at least visibly appears to be unrecombined chromosomes on both the autosomes and the X.

                          One thing I should mention is that we need full matching between sibs if you want to compare a sister with another sister. You are lucky that there was only one female in your comparison. That is why it is a nice family to show in a FTDNA chromosome browser. I wish FTDNA would institute full matching segments and not have to rely so much on GEDmatch or David Pike's tool. Females also tolerate slightly longer matching segments because of heterozygosity so that can make a difference in segment length.

                          Comment


                          • #14
                            Phasing Female Sibs and Using Biogeographical Analysis (When Available)

                            Ann Turner's "Going Through a Phase: Haplotyping the Female X Chromosomes" is important to read if you are female.
                            http://www.jogg.info/42/

                            Below is a graphic demonstrating the phasing of sibs using raw data. Long before we had any chromosome browsers for genealogists we were having fun with the X chromosome because it was the easiest chromosome to phase and paint. Most of this work was discussed at the now defunct DNA Forums site. Sean MacGorman Powell helped by painting some of these chromosomes.

                            Several years ago a woman who self-identified as African American asked if I could phase her siblings based on a region of the X chromosome that appeared to have some typically European and typically African sequences. Her parents were deceased but I think she suspected that her mother carried two different ancestral populations on her two Xs. Sometimes these sections are only found on a tiny part of the X but can be quite important if you have significant differences in haplotypes for various populations. She didn't really know about her father's X but he had identified as African-American. Naturally you can't always tell exactly where an ancestor migrated from. Sometimes the sequences are not clearly associated with any one ethnic group. But in this case at least the phasing worked well for this particular section.

                            I had discovered some haplotypes involving specific X Chromosome mutations that appeared to be typical of Africans who came into America directly from Western Africa. This was just a tiny dissection of a segment representing markers on the X. HapMap.org and a large group of genealogists provided much of the biogeographical analysis for this test case.

                            If you have a Google account, the following raw data was colorized in this graphic:

                            https://drive.google.com/file/d/0Bx3...it?usp=sharing
                            and you will see the results of this kind of phasing

                            Or if you prefer Drop Box, same graphic here but not as clear:
                            https://www.dropbox.com/sc/91ucsl264umt9fk/ABfyxm_iyZ

                            Notice that Sister A and Sister B were the only two to have results in the beginning. The brother was tested after we made the assessment. Sister A and B were colorized based on what we knew about the European and African sequences.

                            See column 3: The father was then typed except for the two SNPs with the question mark (?). Most markers were easy because two full sisters always share their father's results. So you had to record the SNP that the two sisters had in common at each marker. The only markers that could not be determined were the ones in which the sisters were heterozygous AG and AG.

                            See columns 4, 5 and 6: Sister A minus the European sequence equaled the father's sequence. We now had figured out the two missing SNPs based on what we knew about biogeographical analysis. We figured that the mother had one European X segment. As I recall, the father also matched some Africans at HapMap.

                            See column 7, 8 and 9: Sister B minus her Father's results equaled the mother's second sequence in this block. How did we know we got it correct? Because after we made this assessment, their brother tested and he happened to match the predicted mother's 2nd X chromosome. We were able to confirm the hypothesis with real data. The brother had a slightly different haplotype than his father but I suspect this is a different branch of the same African tree. Can there really be phylogenetic trees on the X? Yes, I think so, in some identifiable sections. This particular European section is a very homogeneous haplotype (provided by Version 2 of 23andMe) but the African sequences seem to show more diversity here with occasional SNP differences.

                            Comment


                            • #15
                              Thank you for that link. The last sentence, "the region around the centromere.....does not recombine as often."

                              What is the reason for that?

                              Comment

                              Working...
                              X