Announcement

Collapse
No announcement yet.

E3b project cladograms

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by Jim Denning
    vistor you keep showing this and i look and say whats this saying
    i dont know the numbers and it doesnt mean anything with out something.
    whats it supposed to be saying
    Sorry, Jim. I know this is not very easy to understand just as it isn't very easy to explain either.

    Basically, these diagrams or cladograms are a visual representation of the TMRCA tables that you see in many DNA surname projects. In fact, the diagrams are generated out of a TMRCA table. But they show something else that the simple tables don't. Our premise is that just as it is possible to statistically "predict" a haplogroup by the allele values, it is also possible to "infer" the subclades by the pairwise genetic distance among haplotypes.

    The software converts the units in the distance matrix (generations or years) to bifurcating branches that group the haplotypes into somewhat definable clusters. These clusters, assuming that distinct haplotypes are correlated to specific SNPs, should correspond to haplogroup subclades.

    Right now we have just a few subclades confirmed by SNP and we haven't found a contradicting result yet. We're waiting for more SNP results from the E3b project participants to see it the software algorithm and the diagrams hold.

    If you have any other question I'll try my best to answer it.

    Victor

    Comment


    • #17
      Originally posted by Victor
      Sorry, Jim. I know this is not very easy to understand just as it isn't very easy to explain either.

      Basically, these diagrams or cladograms are a visual representation of the TMRCA tables that you see in many DNA surname projects. In fact, the diagrams are generated out of a TMRCA table. But they show something else that the simple tables don't. Our premise is that just as it is possible to statistically "predict" a haplogroup by the allele values, it is also possible to "infer" the subclades by the pairwise genetic distance among haplotypes.

      The software converts the units in the distance matrix (generations or years) to bifurcating branches that group the haplotypes into somewhat definable clusters. These clusters, assuming that distinct haplotypes are correlated to specific SNPs, should correspond to haplogroup subclades.

      Right now we have just a few subclades confirmed by SNP and we haven't found a contradicting result yet. We're waiting for more SNP results from the E3b project participants to see it the software algorithm and the diagrams hold.

      If you have any other question I'll try my best to answer it.

      Victor
      isnt it necessary to really see the dna


      i mean yeah you and me might get the same whole reading but maybe your 16-18 385a-b are slightly different how would a program see that
      i mean ftdna makesa big thing of seeing the differences

      Comment


      • #18
        Originally posted by Jim Denning
        isnt it necessary to really see the dna

        i mean yeah you and me might get the same whole reading but maybe your 16-18 385a-b are slightly different how would a program see that
        i mean ftdna makesa big thing of seeing the differences
        Exactly. What the software does is amplify those small differences. The whole point of this exercise is to see the big picture; the panoramic view so to speak, and then say: here I am, or here you are!

        Comment


        • #19
          E3b - 25 Marker Network Diagram

          http://img.villagephotos.com/p/2005-...124_25M_ND.jpg


          Data for diagram found at:
          http://www.familytreedna.com/public/freemanDNAProject/

          Comment


          • #20
            Excellent Work

            Excellent work Victor. Thank you for your effort. If I am reading the data correctly my subclade should be M78. You should be recieving confirmation in a few weeks as I will be ordering the subclade test shortly.

            Any particular reason that 10640 is way out on a limb in this diagram:

            Earlier Diagram

            I do not see it duplicated in later diagrams.

            Again, excellent work and I am most appreciative.

            Best,

            Rick

            Comment


            • #21
              In other words, all our haplotypes are very similar at the 12 marker level regardless of what subclades we belong to. It is at a higher number of markers where the differences (or genetic distance) start to show.
              I touched on this point in another thread. IMHO the 12 marker level is insufficient data for building the cladogram. You would need an incredibly large sample for good results at only the 12 marker level.

              I applaud your efforts, and your willingness to share. It is my hope that in the near future the Genographic Project will start putting out the same type of information for the global tree. It will be interesting to see if your results concur with theirs...

              Comment


              • #22
                Originally posted by Rossi
                ...
                Any particular reason that 10640 is way out on a limb in this diagram:

                Earlier Diagram

                I do not see it duplicated in later diagrams.
                Rick
                Rick,

                Actually, 10640 is duplicated in the latest diagram but the perspective is a little different. Everytime new records are added the fluxus software rearranges the relative positions of each haplotype.

                If you want I can send you a small pdf file that shows the Id numbers of all haplotypes in the diagram. The same goes for anyone else interested.

                I'd be interested in your deep clade results. There are some others of us who are also expecting to hear news in the coming weeks.

                Victor

                Comment


                • #23
                  Originally posted by EBurgess
                  I touched on this point in another thread. IMHO the 12 marker level is insufficient data for building the cladogram. You would need an incredibly large sample for good results at only the 12 marker level.

                  I applaud your efforts, and your willingness to share. It is my hope that in the near future the Genographic Project will start putting out the same type of information for the global tree. It will be interesting to see if your results concur with theirs...
                  Right. I'm aware of discussions elsewhere about the limited value of 12-marker haplotypes to distinctly resolve the sub-branching of haplogroup E3b and/or other haplogroups.

                  However, it could also be that what matters isn't only the count of markers but which markers are selected for the analysis. For example in the study Phylogeographic Analysis of Haplogroup E3b (E-M215) Y Chromosomes Reveals Multiple Migratory Events Within and Out Of Africa, the researchers use only 11 markers, as the quote below shows, and still they are able to plot the structure of the haplogroups, even the four sub-clusters within E-M78.

                  We further typed 509 of the 515 E3b subjects for seven GATA STR (A7.1, A7.2, and A10 [White et al. 1999]; DYS19, DYS391, and DYS393 [Roewer et al. 1992, 1996]; and DYS439 [Ayub et al. 2000]) and four CA dinucleotide repeat (YCAIIa, YCAIIb, DYS413a, and DYS413b [Mathias et al. 1994; Malaspina et al. 1997]) polymorphisms.
                  http://www.pubmedcentral.gov/article...=figure&id=FG2

                  Out of the markers employed in Cruciani's study from the following list, only the ones in bold characters coincide with our 12 marker panel.

                  DYS19
                  DYS391
                  DYS393
                  DYS439
                  GATA A7.1 (DYS460)*
                  GATA A7.2 (DYS462)
                  GATA A10
                  YCAIIa*
                  YCAIIb*
                  DYS413a
                  DYS413b

                  Even selecting from the 37-marker panel (*) there would still be four markers missing. (Can someone verify if I got the equivalent names right?)

                  As to the Genographic Project making their data available in some shape or form that would be great although I doubt they will spontaneously do it. That's why we're motivated (on a very small scale) to make our feeble attempts and try to find some sense in all of this as best we can.

                  Victor

                  Comment


                  • #24
                    Originally posted by Victor
                    Rick,

                    Actually, 10640 is duplicated in the latest diagram but the perspective is a little different. Everytime new records are added the fluxus software rearranges the relative positions of each haplotype.

                    If you want I can send you a small pdf file that shows the Id numbers of all haplotypes in the diagram. The same goes for anyone else interested.

                    I'd be interested in your deep clade results. There are some others of us who are also expecting to hear news in the coming weeks.

                    Victor
                    Sorry Victor, by duplicated I meant out on a similiar limb, so to speak. I was not clear. I am trying to figure out what would be the meaning of the "perspective", if any.

                    Yes, I would like the ID numbers for all the haplotypes in the diagram. Do you need my email address?

                    Thanks,

                    Rick

                    Comment


                    • #25
                      Originally posted by Rossi
                      Sorry Victor, by duplicated I meant out on a similiar limb, so to speak. I was not clear. I am trying to figure out what would be the meaning of the "perspective", if any.

                      Yes, I would like the ID numbers for all the haplotypes in the diagram. Do you need my email address?

                      Thanks,

                      Rick

                      Victor, I think I now understand the Fluxus diagram. I need the haplotype labels or at least mine.

                      Thanks,

                      Rick

                      Comment


                      • #26
                        Originally posted by Rossi
                        Sorry Victor, by duplicated I meant out on a similiar limb, so to speak. I was not clear. I am trying to figure out what would be the meaning of the "perspective", if any.

                        Yes, I would like the ID numbers for all the haplotypes in the diagram. Do you need my email address?

                        Thanks,

                        Rick
                        Rick,

                        Haplotype 10640 is indeed out on a similar limb but the new diagram inverted the positions of the two main clusters. It is as if we were looking from the opposite side. As I explained, when new records are added to the haplotype dataset the Fluxus software rearranges the diagram as it inserts the new data. Also, although it isn't very clear in the images, and contrary to the plain phylogenetic trees or cladograms, the networking in these diagrams is supossed to be a three dimensional structure.

                        Please send me a private message with an email address where I can send you the file with the haplotype numbers.

                        Victor

                        Comment


                        • #27
                          E3b Cladograms

                          Dear friends of E3b,

                          I have created again a couple of cladograms with the latest available recordset from the E3b Haplogroup Project.

                          One was generated with 12 markers and the other with 25 markers. And, although as we have previously discussed the limited value of a 12 marker cladogram to correctly infer subclades, I've decided to do it anyway for the benefit of those participants who have recently joined the project who only have a 12-marker haplotype.

                          There are currently 135 records in the project of which 6 have an SNP result that I know of. But new SNP results are about to come soon and then we'll see if the clustering pattern still holds up. Also, if anyone else from the project has already a "snip" result not in the cladograms I would appreciate if you could post it here so we can learn more about our haplogroup.

                          So, for whatever is worth, here are the links to the diagrams. And as always your comments are welcome.

                          12-Marker Cladogram
                          http://img.villagephotos.com/p/2005-...60201_12CG.jpg
                          25-Marker Cladogram
                          http://img.villagephotos.com/p/2005-...60201_25CG.jpg

                          _Victor


                          p.s. The data was processed in the Kitsch module from the PHYLIP (Phylogeny Inference Program) and the diagrams generated with Tree Explorer. The raw data was prepared with McGee's Y-DNA Tools.
                          Last edited by Victor; 31 January 2006, 09:08 AM.

                          Comment


                          • #28
                            However, it could also be that what matters isn't only the count of markers but which markers are selected for the analysis. For example in the study Phylogeographic Analysis of Haplogroup E3b (E-M215) Y Chromosomes Reveals Multiple Migratory Events Within and Out Of Africa, the researchers use only 11 markers, as the quote below shows, and still they are able to plot the structure of the haplogroups, even the four sub-clusters within E-M78.
                            I have no doubts that at the higher level branches only a few STRs are required. I just think that when you start breaking it down further to say the twig level, you will need more markers because the variability in individual marker values will decrease. The recent marketing of the Niall of the Nine Hostages hints at this when they compare the 12-marker vs 25 marker tests. You get a better picture if you include the 464 cluster.

                            That being said I am not clear on how old some of the sub-branches are? I have to do some more reading.

                            I am personally interested in E3B because my maternal Grandfathers Family name which traces back to Northern France in the late 1500s appears to be E3B. I will need to get an uncle to participate!

                            Comment


                            • #29
                              Originally posted by EBurgess
                              I have no doubts that at the higher level branches only a few STRs are required. I just think that when you start breaking it down further to say the twig level, you will need more markers because the variability in individual marker values will decrease. The recent marketing of the Niall of the Nine Hostages hints at this when they compare the 12-marker vs 25 marker tests. You get a better picture if you include the 464 cluster.

                              That being said I am not clear on how old some of the sub-branches are? I have to do some more reading.

                              I am personally interested in E3B because my maternal Grandfathers Family name which traces back to Northern France in the late 1500s appears to be E3B. I will need to get an uncle to participate!
                              I agree. A higher number of markers will always produce better discriminating resolution for certain clustering purposes. For example, a comparison based on the 37-marker panel is much better in defining the "twigs" in a large DNA surname project, where participants share a common ancestor within the last few centuries or even like in your example of the Niall of the Nine Hostages that dates back to the 5th. (?) century.

                              On the other hand, my logic tells me (although I could be wrong) that in a Haplogroup project like E3b, where a common ancestor dates back to several thousands of years ago, an intermediate number of selected markers may produce better clustering results than including a lot of fast mutating markers which could possibly add a lot of noise in the calculations. Besides, the 37-marker haplotypes represented in our recordset are too few so I can not fully test that idea.

                              And as you know, these software tools allow us to do these learning exercises and help us visually grasp an otherwise unintelligible rosary of numbers. I hope everyone understands that no guarantee about the accuracy of the results can be given.

                              Comment


                              • #30
                                Victor, Thank you for the quick response.

                                I looked at the microsatellite graph. you linked to in your post. Very interesting the E3b1 Alpha sub-group looks predominantly european. The other cluster looks almost exclusively East-African. So I suspect that 2 of the 4 regions (sub-clades?) will hold up even when additional markers are considered.

                                So I believe that you will be able to identify some sub-clades from STRs. I just wonder as an individual haplotype if you are not a perfect match to the modal haplotype STRs how close do you have to be before considering an SNP test to confirm the results? Also, how far back in time do we estimate these splits to have occured? Is there a way to measure given that SNPs are on average over 5000 years old? I am just trying to get my head around the practical implications.


                                For example: I find a 23 on 25 match with another individual who is also R1b but does not share the same surname, was it convergence or is this guy more closely related to me than I would otherwise think? If STRs are that predictive with only 4 or 5 markers...

                                BTW: I really appreciate your feedback

                                Comment

                                Working...
                                X