Announcement

Collapse
No announcement yet.

a curiosity that needs explaining

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • a curiosity that needs explaining

    Take a GEDmatch kit and run an 'X one-to-one' with default parameters. A kit for a female typically will give you more matches. If GEDmatch has enough regular autosomal matching segments, it will give an estimate of number of generations to MRCA for the X-match. Hit the upward arrowhead over the generation column to get a sorted list. I think you will find most of your kits will have a generations to MRCA of over six for the x-matches. Isn't it interesting how the kits are concentrated around such a high generations to MRCA?

    Explanations anyone?

    Jack Wyatt
    Last edited by georgian1950; 18th October 2016, 02:36 AM. Reason: spelling

  • #2
    Yes, this observation is real, but the generations reported by GEDmatch are based on the autosomes only, as far as I can tell. Part of the explanation should be that there are only half as many opportunities for recombination on the X chromosome (per generation), because recombination is only possible in the maternal meiosis. The paternal meiosis can't "recombine" the single X chromosome.

    What I don't know, and can't easily determine, is whether there is actually a significant positive correlation between the length of the matching X segments and the estimated number of generations based on the autosomes. That would be pretty strange, wouldn't it?

    Comment


    • #3
      I've been playing around with X-matches and been itching to find a meaningful X-match on Gedmatch.

      For one (female) kit I manage some of the longest matching segments on the X-chromosome are with people who live around the shores of the Baltic sea, and the occasional weak autosomal match pops up from that area as well. One has a tree on here in Russian. A male double cousin has recently tested elsewhere and I'm (very impatiently!) waiting for him to upload to Gedmatch.

      Comment


      • #4
        Originally posted by John McCoy View Post
        Yes, this observation is real, but the generations reported by GEDmatch are based on the autosomes only, as far as I can tell. Part of the explanation should be that there are only half as many opportunities for recombination on the X chromosome (per generation), because recombination is only possible in the maternal meiosis. The paternal meiosis can't "recombine" the single X chromosome.

        What I don't know, and can't easily determine, is whether there is actually a significant positive correlation between the length of the matching X segments and the estimated number of generations based on the autosomes. That would be pretty strange, wouldn't it?
        Unfortunately getting information on methodology out of the GEDmatch principals can be difficult for various reasons. However I am going to assume that they have taken things like your objections into account and trust what they come up with.

        As for your objection on the recombination, only one possible X-line alternates male and female. All of the rest of the possible lines have a smaller proportion of males. Maybe the average of all of those lines does extend the expected number of generations to MRCA a bit back, but that does not change the results much. That is, the generations to MRCA is clustered way back for one's x-matches. If one was to make a guess about how far back MRCA is, it might be more like three or four generations instead of the more distant matches which we are seeing.

        Jack

        Comment


        • #5
          Originally posted by georgian1950 View Post
          Unfortunately getting information on methodology out of the GEDmatch principals can be difficult for various reasons. However I am going to assume that they have taken things like your objections into account and trust what they come up with.

          As for your objection on the recombination, only one possible X-line alternates male and female. All of the rest of the possible lines have a smaller proportion of males. Maybe the average of all of those lines does extend the expected number of generations to MRCA a bit back, but that does not change the results much. That is, the generations to MRCA is clustered way back for one's x-matches. If one was to make a guess about how far back MRCA is, it might be more like three or four generations instead of the more distant matches which we are seeing.

          Jack

          I have a relative with a 68.4 cM match on the X and the two females appear to be 3C1R based on a paper trail.

          Autosomes show
          9.6 cM on Chr 1
          22.1 cM on Chr 12
          20.3 cM or Chr 18

          Estimated number of generations to MRCA is 4.1 at GEDmatch based on the autosomes.

          These two have a maximum number of males in the line of descent. I think that if you have a long segment match on the X along with just a few smaller segments on the autosomes you can often predict that there will be a lot of alternating males. At least you know which line to start with when looking for a match.
          Kathy
          Attached Files

          Comment


          • #6
            Originally posted by Kathy Johnston View Post
            I have a relative with a 68.4 cM match on the X and the two females appear to be 3C1R based on a paper trail.

            Autosomes show
            9.6 cM on Chr 1
            22.1 cM on Chr 12
            20.3 cM or Chr 18

            Estimated number of generations to MRCA is 4.1 at GEDmatch based on the autosomes.

            These two have a maximum number of males in the line of descent. I think that if you have a long segment match on the X along with just a few smaller segments on the autosomes you can often predict that there will be a lot of alternating males. At least you know which line to start with when looking for a match.
            Kathy
            Thanks Kathy for an great example. It is interesting that the generations to MRCA based on the autosomes is in the ballpark with the predicted relationship.

            I am curious about the ranges for the autosomes segments.

            I you think about it, any time you have an X-match, you probably have some mathching on the autosomes, even though we might not have chosen the parameters which would allow us to see them. Given that almost all of our X-matches are clustering around 6-8 generations to MRCA (calculated from the autosomes), the X-match likely has about the same distance to MRCA.

            If we skip down our X-DNA match list past the ones which have a more recent MRCA and run the detailed comparisons ourselves on a number of kits and save the matching segments, it will not be too long before you find many kits triangulating on a few segments. My interpretation of this is that most of us share a common ancestor within the last 300 years.

            note: when you select a kit from your match list to do a 'X
            one to one' comparison with, GEDmatch will use 700 SNPs instead of the default of 500 SNPs. You will need to enter the 500 parameter yourself.

            Jack

            Comment


            • #7
              What puzzles me is the vast difference in number of matches between males and females.

              A one-to-many (X-match) on gedmatch for one male kit (7cM+, 700SNP+) brings up 79 matches of which 28 are also autsomal matches.

              Incredibly I had been corresponding with one of these 79 matches about genealogy for a couple of years before he tested. The common ancestors are from the 16th century, and are paternal, so doesn't explain the X-match. She's not an autosomal match.

              By contrast a couple of female kits (unrelated to the male) have over 2000 X-matches.

              I would have thought a female would have twice as many matches as a male, not 20 or 30-fold as many.

              I'm wondering whether Gedmatch can't phase the two X-chromosomes of the females and overlapping segments on the two X-chromosomes are getting read as one longer segment.

              Comment


              • #8
                Originally posted by ltd-jean-pull View Post
                What puzzles me is the vast difference in number of matches between males and females.

                A one-to-many (X-match) on gedmatch for one male kit (7cM+, 700SNP+) brings up 79 matches of which 28 are also autsomal matches.

                Incredibly I had been corresponding with one of these 79 matches about genealogy for a couple of years before he tested. The common ancestors are from the 16th century, and are paternal, so doesn't explain the X-match. She's not an autosomal match.

                By contrast a couple of female kits (unrelated to the male) have over 2000 X-matches.

                I would have thought a female would have twice as many matches as a male, not 20 or 30-fold as many.

                I'm wondering whether Gedmatch can't phase the two X-chromosomes of the females and overlapping segments on the two X-chromosomes are getting read as one longer segment.
                Really none of these matching segments which are showing up on the 'X one-to-many' match list about seven generations back are IBD. I think my little exercise does show that most of us share a particular common ancestor within the last 300 years. These matching segments are actually built up with DNA material from both parents inherited from that common ancestor, but there is nothing false about their match with the common ancestor.

                Why more x-matches for females? If the kit belongs to a male, he only has one x-chromosome from his mother. That is all he has for the x-matching. A female has two X-chromosome, so the total match is the union of the matches on each individual X-chromosome, plus by taking matching allies from each chromosome that were not sufficient to meet the criteria for matching and combining them, additional matching segments are reconstructed.

                Jack

                Comment


                • #9
                  In addition to these points, we have to be careful about generalizations derived from anecdotal evidence. The actual average number of X matches for males versus females needs to be explored with a large but "random" sample of kits. The sample will still show "sampling bias" because the available kits mainly come from the US. Will females from such a large sample have twice as many X matches? 30 times as many? Some other result? Let's find out!

                  Exactly how to accomplish this, I'm not sure, but I suppose it might be instructive to collect X match data for every 10th match to kits obtained by a GEDmatch GEDCOM search for "John Smith". The kits with an ancestor "John Smith" should themselves be a fairly random bunch, since the odds of two Smiths being related should be quite small.

                  Comment


                  • #10
                    Originally posted by John McCoy View Post
                    In addition to these points, we have to be careful about generalizations derived from anecdotal evidence. The actual average number of X matches for males versus females needs to be explored with a large but "random" sample of kits. The sample will still show "sampling bias" because the available kits mainly come from the US. Will females from such a large sample have twice as many X matches? 30 times as many? Some other result? Let's find out!

                    Exactly how to accomplish this, I'm not sure, but I suppose it might be instructive to collect X match data for every 10th match to kits obtained by a GEDmatch GEDCOM search for "John Smith". The kits with an ancestor "John Smith" should themselves be a fairly random bunch, since the odds of two Smiths being related should be quite small.
                    Hi John, how about trying the exercise I talked about in this thread? I would like to see if you get the results I predict (i.e. most kits triangulate on a handful of segments, suggesting that you share a common ancestor with the owners of those kits within the last 300 years).

                    There is really no point in random sampling until we prove or disprove my hypothesis about the common ancestor for most of us. If this common ancestor existed, that information would pretty much change everything.

                    Thanks,

                    Jack
                    Last edited by georgian1950; 28th November 2016, 10:37 AM. Reason: additional thought

                    Comment


                    • #11
                      Originally posted by georgian1950 View Post
                      Hi John, how about trying the exercise I talked about in this thread? I would like to see if you get the results I predict (i.e. most kits triangulate on a handful of segments, suggesting that you share a common ancestor with the owners of those kits within the last 300 years).

                      There is really no point in random sampling until we prove or disprove my hypothesis about the common ancestor for most of us. If this common ancestor existed, that information would pretty much change everything.

                      Thanks,

                      Jack
                      I have an even better test of your hypothesis of a common ancestor going back 300 years. Use the GEDmatch tool called "GEDmatch Archaic DNA matches" at https://www.gedmatch.com/archaic1.php. Use the same low level of shared segment cM that you've used to claim a segment goes back to a common ancestor who lived 300 years ago. I believe you've told us that your methodology relies on segments below 5 cM, as low as 1 cM.

                      At the 1 cM level, I get all 22 chromosomes lit up with shared segments with these ancient DNA samples:

                      F999916 - LBK, Stuttgart, 7,000 years old
                      F999918 - Luxemburg, 8,000 years old
                      F999937 - Hungary, 7,200 years old
                      plus others that are several thousand years old, that are not as lit up, but still with many shared segments

                      At the 2 cM level, that reduces the shared segments significantly, but I still get multiple shared segments on multiple chromosomes with the three samples I mentioned above.

                      At the 3 cM level, I still get 7 shared segments with F999916, 2 shared segments with F999918 and 8 shared segments with F999937.

                      At the 4 cM level, I still get 1 shared segment with F999916 but none with the other two I mentioned above.

                      It's only at the 5 cM level that I don't share any segments with ancient DNA remains in the GEDmatch database.

                      My very sincere question is how can you be so sure that the tiny segments your methodology uses gives you shared segments that go back to a common ancestor 300 years ago? According to what I found any segment below 4 cM could easily represent a common ancestor several thousand years ago. Even at 4 cM, there's certainly a chance that the common ancestor is still thousands of years ago.

                      This proves to me that using shared segments below 5 cM, which is not recommended or used by the three testing companies, is not a wise decision when looking for common ancestry within a genealogical time frame. While segments below 5 cM may represent a common ancestor within the last few hundred years, there's not much assurance that that's true. Perhaps if you can come up with paper trail documentation to support a common ancestor, I could accept using segments below 5 cM, but without that I could as easily say that the common ancestor lived thousands of years.

                      If you disagree with this, please tell me what's wrong in my thinking.

                      Comment


                      • #12
                        Originally posted by MMaddi View Post
                        If you disagree with this, please tell me what's wrong in my thinking.
                        Well, my current exercise uses default parameters.

                        As for comparisons with ancients, it is more like a parlour game. The science is not well developed on that.

                        Jack

                        Comment


                        • #13
                          Originally posted by georgian1950 View Post
                          Well, my current exercise uses default parameters.

                          As for comparisons with ancients, it is more like a parlour game. The science is not well developed on that.

                          Jack
                          Hmmm... to me a 1 cm shared segement is a 1 cM shared segment.

                          I don't see a difference if you apply it to two living individuals or one living individual and ancient remains. The same goes for a 2 cM segment or a 3 or 4 cM segment. The tool is still checking the database for a run of exactly matching segments of a sufficient size, whether it's comparing against a database of living individuals or ancient remains.

                          You'll have to come up with a more convincing reason than what you've given.

                          Comment


                          • #14
                            Assessing the genealogical significance of short segments by means of a robust statistical analysis seems like a very difficult problem to me. I was able to think of a statistical experiment that I believe could tell us something about male versus female X matches, if anyone has the time to gather the data. That's as far as I have got!

                            Comment


                            • #15
                              Originally posted by John McCoy View Post
                              Assessing the genealogical significance of short segments by means of a robust statistical analysis seems like a very difficult problem to me. I was able to think of a statistical experiment that I believe could tell us something about male versus female X matches, if anyone has the time to gather the data. That's as far as I have got!
                              Again, my exercise uses default parameters, 7.0 cM and 500 SNP's.

                              Jack

                              Comment

                              Working...
                              X