Announcement

Collapse
No announcement yet.

E3b project cladograms

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Rossi
    replied
    Excellent Work

    Excellent work Victor. Thank you for your effort. If I am reading the data correctly my subclade should be M78. You should be recieving confirmation in a few weeks as I will be ordering the subclade test shortly.

    Any particular reason that 10640 is way out on a limb in this diagram:

    Earlier Diagram

    I do not see it duplicated in later diagrams.

    Again, excellent work and I am most appreciative.

    Best,

    Rick

    Leave a comment:


  • Victor
    replied
    E3b - 25 Marker Network Diagram

    http://img.villagephotos.com/p/2005-...124_25M_ND.jpg


    Data for diagram found at:
    http://www.familytreedna.com/public/freemanDNAProject/

    Leave a comment:


  • Victor
    replied
    Originally posted by Jim Denning
    isnt it necessary to really see the dna

    i mean yeah you and me might get the same whole reading but maybe your 16-18 385a-b are slightly different how would a program see that
    i mean ftdna makesa big thing of seeing the differences
    Exactly. What the software does is amplify those small differences. The whole point of this exercise is to see the big picture; the panoramic view so to speak, and then say: here I am, or here you are!

    Leave a comment:


  • Jim Denning
    replied
    Originally posted by Victor
    Sorry, Jim. I know this is not very easy to understand just as it isn't very easy to explain either.

    Basically, these diagrams or cladograms are a visual representation of the TMRCA tables that you see in many DNA surname projects. In fact, the diagrams are generated out of a TMRCA table. But they show something else that the simple tables don't. Our premise is that just as it is possible to statistically "predict" a haplogroup by the allele values, it is also possible to "infer" the subclades by the pairwise genetic distance among haplotypes.

    The software converts the units in the distance matrix (generations or years) to bifurcating branches that group the haplotypes into somewhat definable clusters. These clusters, assuming that distinct haplotypes are correlated to specific SNPs, should correspond to haplogroup subclades.

    Right now we have just a few subclades confirmed by SNP and we haven't found a contradicting result yet. We're waiting for more SNP results from the E3b project participants to see it the software algorithm and the diagrams hold.

    If you have any other question I'll try my best to answer it.

    Victor
    isnt it necessary to really see the dna


    i mean yeah you and me might get the same whole reading but maybe your 16-18 385a-b are slightly different how would a program see that
    i mean ftdna makesa big thing of seeing the differences

    Leave a comment:


  • Victor
    replied
    Originally posted by Jim Denning
    vistor you keep showing this and i look and say whats this saying
    i dont know the numbers and it doesnt mean anything with out something.
    whats it supposed to be saying
    Sorry, Jim. I know this is not very easy to understand just as it isn't very easy to explain either.

    Basically, these diagrams or cladograms are a visual representation of the TMRCA tables that you see in many DNA surname projects. In fact, the diagrams are generated out of a TMRCA table. But they show something else that the simple tables don't. Our premise is that just as it is possible to statistically "predict" a haplogroup by the allele values, it is also possible to "infer" the subclades by the pairwise genetic distance among haplotypes.

    The software converts the units in the distance matrix (generations or years) to bifurcating branches that group the haplotypes into somewhat definable clusters. These clusters, assuming that distinct haplotypes are correlated to specific SNPs, should correspond to haplogroup subclades.

    Right now we have just a few subclades confirmed by SNP and we haven't found a contradicting result yet. We're waiting for more SNP results from the E3b project participants to see it the software algorithm and the diagrams hold.

    If you have any other question I'll try my best to answer it.

    Victor

    Leave a comment:


  • Jim Denning
    replied
    [QUOTE=Victor]
    To look at a sample E3b cladogram click here.
    The colors were added to highlight what I believe to be the three main subclades, that is E3b3, E3b1 and E3b2 in the order of appearance.
    (I'll be posting a cladogram with the latest data soon.)QUOTE]


    vistor you keep showing this and i look and say whats this saying
    i dont know the numbers and it doesnt mean anything with out something.
    whats it supposed to be saying

    Leave a comment:


  • Victor
    replied
    Originally posted by Victor
    Cladogram from 25 marker panel haplotypes.

    http://img.villagephotos.com/p/2005-...1_25_phylo.jpg

    Note: TMRCA calculated using a constant mutation rate of 0.0024; units are generations not years.
    The diagram from the URL above is missing the legend for the branch labels.

    triangle = M123 (E3b3)
    square = M81 (E3b2)
    rhombus = M78 (E3b1)

    Leave a comment:


  • Victor
    replied
    Cladogram from 25 marker panel haplotypes.

    http://img.villagephotos.com/p/2005-...1_25_phylo.jpg

    Note: TMRCA calculated using a constant mutation rate of 0.0024; units are generations not years.

    Leave a comment:


  • Victor
    replied
    Median joining networks - E3b

    As of today, the E3b project stands at 115 records.

    This time I want to share with you some fluxus diagrams. The first shows the global picture generated by all haplotypes (at 12 marker level). As in a previous diagram with fewer haplotypes, the two-cluster pattern remains very similar.
    http://img.villagephotos.com/p/2005-...23/fluxus4.jpg

    The second diagram is a zoom-in of the cluster where the M81 haplotype is located. The numbers correspond to the individual haplotype id's.
    http://img.villagephotos.com/p/2005-...23/fluxus2.jpg

    The third graphic is a bit more crowded and some id numbers overlap making them hard to identify. This cluster might need to be analized on its own as it seems to show considerable variation.
    http://img.villagephotos.com/p/2005-...23/fluxus3.jpg


    p.s. An excellent document on network diagrams (although it doesn't always open correctly) is here:
    http://dimacs.rutgers.edu/Workshops/...orksTREE01.pdf

    Victor

    Leave a comment:


  • Victor
    replied
    Limited value of 12 marker cladograms

    Originally posted by Bill Harvey
    Victor,

    Regarding the placement of #19310 and #32872 on the 12 marker cladogram - an analysis of the non-modal STRs reveals that #19310 had a total of 7 out of 12 non-modal markers - each of the seven was one off from modal- whereas #32872 had three non-modal markers one of which was 8 where the modal was 10 (DYS 391).
    Good observation, Bill. The uniqueness of these two haplotypes seems to support our assumption that distinct subclades should have accumulated enough distance from other subclades to make them stand apart. This pre-supposes that there's a correlation between the appearance of a defining Unique Event Polymorphism or SNP and haplotype allele values.

    Also of interest is #N15407 which actually is shown as a separate clade if I am reading the graph correctly. Here again there is a total of seven non-modal markers but DYS 385-a has a genetic distance of 3 from modal. In fact DYS 385-a and DYS 385-b being at a double 19,19 is unique in your sample base. The only sure way to determine whether this kit# indicates M123 or another sub-clade would be to extend the STR marker string for an educated guess or be SNP tested and be sure.

    Interestingly, #19310 has a total of seven non-modal markers in the second panel of 13 markers with three of the seven each showing a genetic distance of 3 and a fourth mutation (DYS458) has a genetic distance of 4 from modal.
    The way the software works is finding first the two haplotypes with the greatest distance between each other and placing them at opposite ends. It then rearranges the remaining haplotypes according to genetic distance/proximity from each other. At the 12 marker level it is #N15407 that appears in the top branch, although it doesn't necessarily mean that it is a distinct subclade. At the 25 marker level it is #19310 which has the greatest distance from the bottom cluster. In the latter case we have confirmed that #19310 was indeed in a different subclade. In the former case (#N15407) when we get a SNP result it could also turn out to be in a different subclade or it may simply be that the 12 marker based distance matrix doesn't amplify enough the haplotype distinctions. In other words, all our haplotypes are very similar at the 12 marker level regardless of what subclades we belong to. It is at a higher number of markers where the differences (or genetic distance) start to show.

    When comparing the relatively clear and precise distinction of the known clade groupings shown at 25 marker level with the non-distinct and confused graphing at 12 markers, there doesn't appear to be much value given by the 12 marker level - considering the amount of work you put into the cladogram generation.
    Agreed again. One reason that I have decided to make the 12 marker diagrams is mainly for those same persons who have only tested the 12 marker panel and might come and browse by this forum. I would encourage them, if possible, to upgrade their haplotypes And to confirm their subclade by SNP testing.

    Now.... for 37 marker cladograms..... I believe I see some benefit to be gained in defining marker patterns for use in STR prognosticating but we undoubtedly need a larger database and many more SNP tested samples to say for sure.

    Food for thought?

    Bill
    Right. The E3b project database is slowly growing. We just broke the 100 haplotypes mark. As to the 37 marker cladograms, when I figure out the bug in the software I'll post a new diagram.

    Victor

    Leave a comment:


  • Bill Harvey
    replied
    New SNP results - 12 marker cladogram questions?

    Victor,

    Regarding the placement of #19310 and #32872 on the 12 marker cladogram - an analysis of the non-modal STRs reveals that #19310 had a total of 7 out of 12 non-modal markers - each of the seven was one off from modal- whereas #32872 had three non-modal markers one of which was 8 where the modal was 10 (DYS 391).

    Also of interest is #N15407 which actually is shown as a separate clade if I am reading the graph correctly. Here again there is a total of seven non-modal markers but DYS 385-a has a genetic distance of 3 from modal. In fact DYS 385-a and DYS 385-b being at a double 19,19 is unique in your sample base. The only sure way to determine whether this kit# indicates M123 or another sub-clade would be to extend the STR marker string for an educated guess or be SNP tested and be sure.

    Interestingly, #19310 has a total of seven non-modal markers in the second panel of 13 markers with three of the seven each showing a genetic distance of 3 and a fourth mutation (DYS458) has a genetic distance of 4 from modal.

    When comparing the relatively clear and precise distinction of the known clade groupings shown at 25 marker level with the non-distinct and confused graphing at 12 markers, there doesn't appear to be much value given by the 12 marker level - considering the amount of work you put into the cladogram generation.

    Now.... for 37 marker cladograms..... I believe I see some benefit to be gained in defining marker patterns for use in STR prognosticating but we undoubtedly need a larger database and many more SNP tested samples to say for sure.

    Food for thought?

    Bill

    Leave a comment:


  • Victor
    replied
    Originally posted by Jim Denning
    please tell me your usuing more then just the e3b group

    how can you use that few people to decide anything

    i look at the chart and i dont know what it represents at least now it has some explaination
    why not just use ftdnas maybe i am missing something please tell me what it is
    Hello Jim,

    With a couple of exceptions when I've inserted one or two external haplotypes, I've been using exclusively the dataset from the E3b project.

    I agree that the sample size is small to make any definitive conclusions about anything but so far the latest SNP confirmations have not contradicted the clustering generated by the software application.

    Maybe if I describe briefly the process used to get from haplotype dataset to cladogram you'll get a better understanding.

    For example, in the latest cladograms I start creating two files out of the whole E3b project dataset: one with 12 marker records and another with 25 marker records. I run each one thru the whole following process.

    Generate a PHYLIP (Phylogeny Inference Package) compatible data file using McGee's YDNA comparison utility, out of the TMRCA table (infinite allele model) with the following settings:
    Probability 50%,
    Mutation Rate FTDNATiP(tm) 0.004..0.009,
    Units 25 years/generation.

    Next, process the data using the Kitsch module of the PHYLIP package with the following settings:
    Method Fitch-Margoliash
    Lower triangular data matrix
    Randomized input order (seed = 9, 11 times)

    Finally draw the cladogram/phylogram/radial tree from the resulting "phylip" file using either TreeView or TreeExplorer.

    The comments and shading are added with a regular graphics editor.

    So, in essence what the inference software does is rearrange and cluster the haplotypes on the base of genetic distance amongst all haplotypes in the dataset as measured by the TMRCA table. Our assumption is that the main branches in the diagram could correlate to corresponding subclades.

    Of course all of this is just experimental and an attempt to understand better the branching of our haplogroup.

    Regards,

    Victor

    Leave a comment:


  • Jim Denning
    replied
    Originally posted by Victor
    Bill,



    To get another idea about distances see this phylogenetic tree that shows the number of steps between each node.



    Victor

    please tell me your usuing more then just the e3b group

    how can you use that few people to decide anything

    i look at the chart and i dont know what it represents at least now it has some explaination
    why not just use ftdnas maybe i am missing something please tell me what it is

    Leave a comment:


  • Victor
    replied
    New SNP results

    Hi Everyone,

    The E3b Project is now approaching the 100 haplotype mark. We currently have 5 confirmed SNPs (beyond M35): two M78+ (E3b1), one M81+ (E3b2) and two M123+ (E3b3).

    Below are the links to the latest 12 marker and 25 marker based cladograms.
    The comparison of these two diagrams helps to illustrate one important consideration about the inference of subclades by the genetic distance amongst haplotypes in our recordset.

    The objective is to find the optimal marker count to process the data by the inference software and generate the cladograms. Which has a higher prediction value, 12 or 25?

    12 marker cladogram:
    http://img.villagephotos.com/p/2005-...-12-051213.jpg

    25 marker cladogram:
    http://img.villagephotos.com/p/2005-...25-051213x.jpg

    The main observation I made on the 12 marker cladogram is that the second M123+ SNP (32872) does not cluster next or very near to our first M123+ (19310). This raises the question if all those haplotypes located between these two could also be M123+ or if the 12 marker based cladogram doesn't provide enough resolution for the software to create the correct clustering?

    Unfortunately, the haplotype of the second confirmed M123+ SNPs (32872) only has tested 12 markers so we can't know for now. Maybe when we get a few more SNP results we will know the answer. The other confirmed SNPs seem to support the model in both cladograms. I'm inclined to think that the 25 marker cladogram produces better results. Any comments?

    Victor


    p.s. @ Bill, I tried to run the 37 markers also but I was getting a run time error that could not pinpoint and correct. I'll keep trying.

    Leave a comment:


  • Victor
    replied
    Originally posted by Bill Harvey
    Victor ,

    This newest version gives the same basic info as the prior format (my personal favorite of the three recent trees) but seems to accentuate the difference M81 shows in relationship to M78 and M123 - these latter indicate a distinct separation from one another but not as sharply as M81 shows a separation from the other two clades in your latest effort.

    Is this primarily due to only having one test in the M123 clade? - whereas the other two have numerous test samples.

    I would like to see the results of all 37 markers in a run using the same format as your prior tree. I would like to begin trying to sort out the potential M123 testees from all the rest and am curious as to whether the additional 12 markers will be of any assistance in determining specific modal differences?

    If it is a lot of work - just forget about doing it. I'll probably take forever to figure it out anyway!

    Bill
    Bill,

    In regards to separation of M123 and M78 in the median-joining diagram I think it is not dependent on the number of test samples but on genetic distance. The branch length is supposed to be proportional to the amount of steps between one haplotype and another. In other words, E3b3 seems to be closer to E3b1.

    In the previous illustration, the curved branches tree, the E3b2 cluster appears next to E3b3 but this proximity does not necessarily reflects genetic distance. That's why I made a note that the tree doesn't reflect the chronology of genetic events. The main usefulness of that diagram is to show the clustering of haplotypes. The ordinal position of the haplotypes is only relevant within their own clade.

    To get another idea about distances see this phylogenetic tree that shows the number of steps between each node.

    Next time I'll do also the 37 marker haplotypes to see it there is a similar or different pattern.

    Victor

    Leave a comment:

Working...
X