Announcement

Collapse
No announcement yet.

I can't follow a simple mtdna cladogram

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • I can't follow a simple mtdna cladogram

    I saw the cladograms here:




    I melted then into a graph to explain what I don't understand:

    [it's an attachment at the bottom]

    The gray cladogram on the left is part of their depiction of the L1-3 haplotree. The bigger color picture in the background is their expansion of this tree showing where their tested samples fell in the cladogram.

    The big yellow blob at the top is labeled L1a1a and has all the L1a1a samples inside it. It is connected by a line called "278" to the next yellow blob, which is labeled L1a1. I see that this fits exactly with the gray cladogram: L1a1 -> 16278 -> L1a1a. And then this yellow blob is connected by a line called "168" to a circle that is called L1a*. Again, this fits exactly with the gray cladogram. The problems start with L1a*.

    The little white circle has an "#" next to it, and to the left you can see an "#" followed by a series of 10 mutations. Obviously, this is a depiction of the lines that connect to the white circle, aka L1a*. In the gray cladogram there are 6 mutations before L1a. I've circled them and pointed to them next to the "#". They are 129, 148, 172, 188G, 278, 320. 5 of them are present next to the "#", but 278 is missing.

    Next problem is that there are still more numbers in the series next to the "#", so I'm supposing that they must have included all the other mutations that lead to the root of L. Unfortunately, you can see that I have circled 3 mutations in the gray cladogram (187, 189, 311) which are depicted as being the branch that leads off to L2/3*, but are included next to the "#". How can L1 be positive for 3 mutations that define L2/3*?

    Finally, one of the mutations next to the "#" (mutation 223) I wasn't even able to locate. I'm wondering if perhaps they made a mistake and they mispelled the missing 278 as 223?

    One final weird thing is why they put the word "Root" between L1* and L1d/f/a/k. Branching off to the left and right of L1* are branches L1e/c/b. You can see this better in the original PubMed images I linked above, it's the first link.
    Attached Files

  • #2
    argiedude:

    just a quick guess. The grey graph depicts the relation of the various subgroups relative to each other. It means that they differ from each other in the mutations indicated along the lines. But nothing is implied regarding their relation to CRS.

    In the other table, I suspect the numbers next to the # are the other mutations relative to CRS, not to the root. So take for instance 16223. Essentially all African sequences have the same 16223, so that would not be shown in the grey table. However, CRS has a different 16223 (the mutation happened down in the tree, between N and R), so that's why when you compare the L sequences to CRS, 16223 shows up. Or take 16187. L2 has a different 16187 relative to L1, so that's where it shows up in the grey chart: between L1 and L2. In that scheme, CRS is downstream of L3, so CRS has the same 16187 as L2, but a different one relative to L1. That's why it is shown next to the # in the colored chart: it is a mutation seen between CRS and L1. Remember that CRS is downstream from N, which is itself downstream of L3.

    It would be nice if these relations were shown in a tree-like form, but I don't know of anything of the sort for HVR1 in L sequences.

    But I don't have the Salas paper with me now, so I'd have to doublecheck. Note incidentally that more recent papers have slightly improved on the interpretation of the L tree. Now, L1a/d/k/f of that paper are called
    L0a, L0d, L0f and L0k, the reason being that they are higher up in the tree and L1b,c,e instead all belong to the same subbranch. The exact placement of the root of the tree between L0a/d/f/k is still debated.

    cacio

    Comment


    • #3
      Great response, cacio, thanks.

      Originally posted by cacio
      argiedude:

      just a quick guess. The grey graph depicts the relation of the various subgroups relative to each other. It means that they differ from each other in the mutations indicated along the lines. But nothing is implied regarding their relation to CRS.

      In the other table, I suspect the numbers next to the # are the other mutations relative to CRS, not to the root. So take for instance 16223. Essentially all African sequences have the same 16223, so that would not be shown in the grey table. However, CRS has a different 16223 (the mutation happened down in the tree, between N and R), so that's why when you compare the L sequences to CRS, 16223 shows up. Or take 16187. L2 has a different 16187 relative to L1, so that's where it shows up in the grey chart: between L1 and L2. In that scheme, CRS is downstream of L3, so CRS has the same 16187 as L2, but a different one relative to L1. That's why it is shown next to the # in the colored chart: it is a mutation seen between CRS and L1. Remember that CRS is downstream from N, which is itself downstream of L3.
      This makes sense, and it all checks out.


      Originally posted by cacio
      It would be nice if these relations were shown in a tree-like form, but I don't know of anything of the sort for HVR1 in L sequences.
      Aha! So I'm not alone! I have quite a few L, H, U, and other charts and I have to try to work them together myself and it's all a mess, they don't fit with each other, it's just crazy.


      Originally posted by cacio
      But I don't have the Salas paper with me now, so I'd have to doublecheck. Note incidentally that more recent papers have slightly improved on the interpretation of the L tree. Now, L1a/d/k/f of that paper are called
      L0a, L0d, L0f and L0k, the reason being that they are higher up in the tree and L1b,c,e instead all belong to the same subbranch. The exact placement of the root of the tree between L0a/d/f/k is still debated.

      cacio
      You really cleared things up for me with that last observation. I also realized later that the cladogram I posted is from 2002. The changes in the field of genetics makes the world of computers seem like they're standing still.

      Comment


      • #4
        argiedude:

        there are several trees around, but they are based on the coding region, not on HVR1. For instance, you can check ian logan's:



        or the big one at mitomap

        (check mitomap tree)

        Part of the reason is that HVR changes too much back and forth, whereas coding region mutations tend to be more stable, and usually HVR is not the defining mutation for a haplogroup. So HVR it is not good to define trees.

        cacio

        Comment


        • #5
          Originally posted by argiedude
          I saw the cladograms here:




          I melted then into a graph to explain what I don't understand:

          [it's an attachment at the bottom]

          The gray cladogram on the left is part of their depiction of the L1-3 haplotree. The bigger color picture in the background is their expansion of this tree showing where their tested samples fell in the cladogram.

          The big yellow blob at the top is labeled L1a1a and has all the L1a1a samples inside it. It is connected by a line called "278" to the next yellow blob, which is labeled L1a1. I see that this fits exactly with the gray cladogram: L1a1 -> 16278 -> L1a1a. And then this yellow blob is connected by a line called "168" to a circle that is called L1a*. Again, this fits exactly with the gray cladogram. The problems start with L1a*.

          The little white circle has an "#" next to it, and to the left you can see an "#" followed by a series of 10 mutations. Obviously, this is a depiction of the lines that connect to the white circle, aka L1a*. In the gray cladogram there are 6 mutations before L1a. I've circled them and pointed to them next to the "#". They are 129, 148, 172, 188G, 278, 320. 5 of them are present next to the "#", but 278 is missing.

          Next problem is that there are still more numbers in the series next to the "#", so I'm supposing that they must have included all the other mutations that lead to the root of L. Unfortunately, you can see that I have circled 3 mutations in the gray cladogram (187, 189, 311) which are depicted as being the branch that leads off to L2/3*, but are included next to the "#". How can L1 be positive for 3 mutations that define L2/3*?

          Finally, one of the mutations next to the "#" (mutation 223) I wasn't even able to locate. I'm wondering if perhaps they made a mistake and they mispelled the missing 278 as 223?

          One final weird thing is why they put the word "Root" between L1* and L1d/f/a/k. Branching off to the left and right of L1* are branches L1e/c/b. You can see this better in the original PubMed images I linked above, it's the first link.

          you my friend are not alone

          Comment


          • #6
            Thanks Cacio for the mitomap link, I've been doing a ton of stuff with it (you can see it at dna-forums) and it has really given me a new perspective on mtdna. I reccomend that map to anyone who is messed up about mtdna.

            I have a specific question about the tree, though.

            Many times, a lineage will have 2 daughter branches which then strangely converge into a single grandaughter branch. How is that possible? Here's an example, there are hundreds like this:
            Attached Files

            Comment


            • #7
              Argiedude:

              that is clearly a mistake. though I guess to find out exactly which one, it may be necessary to look at the original sequences. if they happen to be in ian logan's trees as well, that may be checked.

              cacio

              Comment


              • #8
                Originally posted by argiedude
                Thanks Cacio for the mitomap link, I've been doing a ton of stuff with it (you can see it at dna-forums) and it has really given me a new perspective on mtdna. I reccomend that map to anyone who is messed up about mtdna.

                I have a specific question about the tree, though.

                Many times, a lineage will have 2 daughter branches which then strangely converge into a single grandaughter branch. How is that possible? Here's an example, there are hundreds like this:
                It would not be possible if each mutation in the coding region had occurred once and only once, then never changed back again. However, even the coding region can have parallel and reverse mutations, and then it become more difficult to construct a single tree. There are alternative ways to get to the same end result -- a network or "reticulation."

                In your example, the sequences in the center would have all four mutations. One explanation could be that the sequences on the sides could have had two reverse mutations.

                It's also possible that the published sequences used to construct the tree contain errors.

                Ann Turner
                co-author (with Megan Smolenyak) of "Trace Your Roots with DNA"

                Comment


                • #9
                  Originally posted by Ann Turner
                  One explanation could be that the sequences on the sides could have had two reverse mutations.

                  It's also possible that the published sequences used to construct the tree contain errors.
                  Hello Ann. Obviously, the latter two explanations are the most plausible ones. On the contrary, a parallel development of these rare mutations in the same background seems improbable.

                  Comment

                  Working...
                  X
                  😀
                  🥰
                  🤢
                  😎
                  😡
                  👍
                  👎