Announcement

Collapse
No announcement yet.

The root of the Y-DNA haplogroup tree

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • The root of the Y-DNA haplogroup tree

    I've been looking into some of the concepts around Y-DNA testing while waiting for my results to come in, and have a question about haplogroups.

    The root of the haplogroup tree has two branches from the root "Y": "A", with mutations M91 and P97 and "BT" with mutations M42, M94 and various others.

    (I realise that there is a more modern version of the tree with A1a-T etc. but for simplicity's sake I'll stick to the 2008 version for now.)

    My question is this: why was the root chosen so that "A" had two mutations from the root? Why not one mutation for "A" and the inverse of the other mutation for "BT"?

    Even if you found DNA from homo neanderthalensis which had neither of the mutations, doesn't that just tell you about the common ancestor of "Y" and neanderthals, rather than "Y" himself?

    Is it chosen this way because it's not possible to determine which of the mutations M91 or P97 came first? (Presumably it's astronomically unlikely that they both arose in the same generation.)

  • #2
    Back when the backbone nomenclature of the Y-tree was established, not all mutations (SNPs) had been identified. Although the Y-tree is periodically modified deep within a branch, the main nomenclature of haplogroups doesn't change; that would just be too overwhelming.

    You might note that P could just as well be called QR. I think there are very, very few people in the P haplogroup who don't also belong to either Q or R.

    Timothy Peterman

    Comment


    • #3
      I have no issue with the nomenclature -- "Y" could be called "Adam" or "ABT" or anything you like.

      I've actually found my answer in the Wikipedia article about Haplogroup A:

      The M91 and P97 mutations distinguish Haplogroup A from Haplogroup BT . Within Haplogroup A chromosomes, the M91 marker consists of a stretch of 8 T nucleobase units. In Haplogroup BT and chimpanzee chromosomes, this marker consists of 9 T nucleobase units. This pattern suggested that the 9T stretch of Haplogroup BT was the ancestral version and that Haplogroup A was formed by the deletion of one nuclease.
      In other words, the assumption was that the mutation state of "Y" would be the same as that of an earlier ancestor, in this case the common ancestor with chimpanzees. But the article goes on:

      But according to Cruciani et al. 2011, the region surrounding the M91 marker is a mutational hotspot prone to recurrent mutations. It is therefore possible that the 8T stretch of Haplogroup A may be the ancestral state of M91 and the 9T of Haplogroup BT may be the derived state that arose by an insertion of 1T. This would explain why subclades A1b and A1a-T, the deepest branches of Haplogroup A, both possess the 8T stretch. Furthermore Cruciani et al. 2011 determined that the P97 marker, which is also used to identify haplogroup A, possessed the ancestral state in haplogroup A but the derived state in Haplogroup BT.
      In other words, it is indeed dangerous to make assumptions about the clean slate of "Y" based on a distant cousin. Indeed, thinking about the danger of a "mutation hotspot" was exactly what lead me to ask this question, in consideration of the way lightning struck twice at SRY10831 (very unlikely in a flat distribution of mutations).

      So now my question would be -- why have Cruciani et al chosen the root ("clean slate", "ancestral state") such that "A1b" is defined by (V148, V149, V150, V151, V152, V13, V154, V157, V158, V159, V161, V162, V163, V164, V165, V166, V167, V169, V170, V172, V173, V176, V177, V181, V190, V195, V196, V223, V229, V233, V239) and "A1a-T" by (V168, V171, V174, V203, V238, V241, V250)? And presumably the answer is "because that's their best guess from looking at chimpanzee data".

      But now that the danger of a mutation hotspot is identified, it seems likely that comparing with a closer cousin than a chimpanzee is likely to re-root the tree again.

      Comment


      • #4
        I think the subclades of higher clades are determined if:

        1) the marker that will define the subclade is not found outside of its parent clade.

        2) the marker only covers part of the population in the higher clade.

        As we all know R1b1a2 is usually described as being the clade of the M269 marker. This, of course, comprises a vast population. So how do we know that L23 is nested beneath M269, rather than the other way around?

        Because all L23 men share the M269 mutation, and some men who are L23- also M269+, and only a subset of M269+ men are also L23+, & no one outside of M269 is L23+

        As far as your questions about the ancestral condition of Y. You will note that most charts don't ascribe mutations to Y. He could have had the markers of A (M91 & M97), or he could have had the markers of BT. After the split, each haplogroup had its characteristic markers.

        Timothy Peterman

        Comment


        • #5
          Not wanting to seem to be overly philosopical about things but a certain understanding helps me to grasp it better. Our western culture has a habit of naming or labeling things. If a thing has a name or a label then we think we understand it. DNA research is a very young science much of which is unknown. I think it premature to trust names and labels and treat them as facts. Ask again in 20 years and you may get a different answer.

          Comment


          • #6
            Brunetmj, you're absolutely right. The nomenclature that has been assigned is arbitrary & as more data is found, the tree will make less & less sense (ie, why is T now shown with L, instead of down there past S, where it used to be). The answer is, of course, that after the T folk were labelled as T, a unifying SNP was found that pulls them together with L into the LT subclade.

            I predict that an SNP will eventually be found that unifies IJK with G, to the exclusion of H. I base this on geography. Since H is found almost exclusively in India, it makes sense that H broke off before F(xH) moved northwest into the Near East/ Middle East

            Timothy Peterman

            Comment


            • #7
              I find the Y-DNA super haplogroups to be quite interesting. By super haplogroup I mean root haplogroups like IJ and P. As was mentioned P can also be called QR basically. When it comes to the peopling of West Eurasia these two super haplogroups are very important as they make up the majority of Y-DNA lineages present there. I can see the peopling of West Eurasia possibly going like this.

              1. IJ originated in the Middle East maybe around 40,000 years ago. Then possibly not much later IJ people move into Europe as we'll where after some time a man has a mutation that will mark his descendants as being in haplogroup I. These may be the first Homo Sapiens to enter Europe. The cousins of these IJ people who moved into Europe that remained back in the Middle East eventually turned into haplogroup J people of today. The haplogroup J people of the Middle East may have been among the first peoples to truly practice agriculture and some then may have spread into the Mediterranean areas of Europe during the Neolithic and also during following periods.

              2. Haplogroup P originates in Central Asia and it's descendant haplogroup Q and R also originate there. The Q people move tot he east and west and some stay in Asia while others move onto the Americas and some towards Europe in smaller numbers. Haplogroup R on the other hand has a massive expansion in numerous directions but mostly to the West towards Europe. Most of R settles down north of the Black Sea and later possibly with the adoption of agriculture and horse breeding R begins to move deep into Europe.

              Comment


              • #8
                Originally posted by T E Peterman View Post
                I think the subclades of higher clades are determined if:

                1) the marker that will define the subclade is not found outside of its parent clade.

                2) the marker only covers part of the population in the higher clade.
                Which is why the root is a special case -- we don't know what's in the parent clade. And assumptions based on very distant cousins (chimpanzees) turn out very likely to be erroneous in light of new data.

                As we all know R1b1a2 is usually described as being the clade of the M269 marker. This, of course, comprises a vast population. So how do we know that L23 is nested beneath M269, rather than the other way around?
                In short, we don't, but we consider it so much more likely than the alternative (that L23 happened before M269 and then a back-mutation at the position of L23 happened after M269) that we speak of it that way. And yet this has been demonstrated to be what likely happened at the root -- though it was always vastly more likely to have happened there than in R1B1a because the common ancestor, from which the assumption is derived, is many orders of magnitude further away.

                Comment


                • #9
                  The principle of parsimony is ultimately what governs how we align clades & subclades.

                  You are correct about the root. We can't say if the root was more like BT or more like A.

                  Timothy Peterman

                  Comment


                  • #10
                    Thanks Timothy -- "parsimony" was just the term I needed to give me a foothold in further research

                    Comment

                    Working...
                    X