Announcement

Collapse
No announcement yet.

DIYDodecad 2.0

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DIYDodecad 2.0

    Dienekes released a new version of DIYDodecad, which adds the ability to analyze your raw data by chromosome, segment, or specified region. The time required to do the more finely-grained analysis is greater than doing the genome-wide analysis. My MacBook (running Windows XP ) takes 10-15 minutes to do a genome-wide run (the original mode), but almost 45 minutes to do a by-segment run, with his suggested parameters.

    The result is a text file giving the percentage of the 12 Dodecad ancestral components for each 500 SNP segment. Then, using the newly included R script, paint_byseg.r, you can create graphs from your results. This is similar analysis to the chromosome diagrams from Dr McDonald, but not quite as pretty. Dienekes warns that an entire chromosome graph with all 12 components can be messy, so there are options to graph shorter segments, or only the top ancestral components. To illustrate what they look like, I attached the plots from my chromosome 6:
    • 6-all.png shows all 12 components, and is rather messy.
    • 6.3.png shows only the top 3. It is much easier to read, but in this case, the top 3 weren't enough to show the majority "West Asian" component from 80-100 Mb, and significant "South Asian" admixture from 70-80 Mb.
    • 6-6.png shows only the top 6, which includes enough to correct the deficiencies in the top 3 graph, but it's still mostly readable. This seems to be the best compromise; with 7 or more, some of the colors start getting difficult to distinguish from each other.

    I foresee this being useful in conjunction with FTDNA's chromosome browser. For example, if a few of my matches and I share a segment on one chromosome, I could check that segment with DIYDodecad 2.0 and see what the ancestral components are. It may help narrow down which ancestors we should concentrate our research on to find a common ancestor. Also, there are probably better ways to visualize the results of this analysis. Some enterprising individual could create a script to plot chromosome diagrams like Dr McDonald's, or some novel graphic nobody's thought of yet.
    Attached Files

  • #2
    Wooo

    Comment


    • #3
      Did you guys make any changes to the dv3?

      Comment


      • #4
        The byseg option is pretty awesome, however it's taking allot of time. Well worth it though.
        Last edited by Taz85; 17th August 2011, 08:04 PM.

        Comment


        • #5
          Originally posted by Taz85 View Post
          Did you guys make any changes to the dv3?
          Yes, I replaced the last line, "genomewide," with his suggested values for doing it by segment:
          byseg
          500
          50
          Now I'm taking some overlapping segments between multiple matches and me, then I'm going to re-run DIYDodecad for just those segments. Hopefully, it reveals some profound insights into the ancestry of the segments, and I'll be able to figure out our common ancestor.

          Comment


          • #6
            Here's my Chromosome 1.
            Attached Files

            Comment


            • #7
              Ive noticed a significant amount of Asian on all of my chromosomes, much of which did not show up on Mcdonalds analysis. Also I see now that PF is way off like I thought. I show a high amount of Met, non of which showed up on PF, As you can see in my sig. I figured that much knowing at least 3 of my lines are from Southern Italy.

              Comment


              • #8
                Chromosome #2 - Top 6
                Attached Files

                Comment


                • #9
                  Originally posted by Taz85 View Post
                  Ive noticed a significant amount of Asian on all of my chromosomes, much of which did not show up on Mcdonalds analysis. Also I see now that PF is way off like I thought. I show a high amount of Met, non of which showed up on PF, As you can see in my sig. I figured that much knowing at least 3 of my lines are from Southern Italy.
                  I don't think it's a case of PF being wrong, just more coarsely-grained. When you average the entire genome, it's bound to gloss over minor admixtures, which show up quite strongly in more finely-grained analysis.

                  Comment


                  • #10
                    Originally posted by nathanm View Post
                    I don't think it's a case of PF being wrong, just more coarsely-grained. When you average the entire genome, it's bound to gloss over minor admixtures, which show up quite strongly in more finely-grained analysis.
                    DIY shows me as 26.10 Met. PF shows me as 0%. I have family trees going back to the late 1700's to Italy. I don't think anything like 26% is small. That's a major error on PF's part.

                    Comment


                    • #11
                      Here are my %'s per chromosome. You can take the data from each Chromosome and enter it into Oracle.
                      Attached Files

                      Comment


                      • #12
                        Originally posted by Taz85 View Post
                        DIY shows me as 26.10 Met. PF shows me as 0%. I have family trees going back to the late 1700's to Italy. I don't think anything like 26% is small. That's a major error on PF's part.
                        That's comparing apples to oranges. The "Mediterranean" Dodecad ancestral component isn't the same as any of the reference populations in PF. It's one of 12 ancestral populations inferred by Admixture, from a specific set of genetic inputs. You could run the exact same analysis, but get very different results, if you use different inputs. Even if you ran the same test twice with the exact same inputs, you could get slightly different results, because the random seed generator uses time as a starting point.

                        The results in your sig from both PF and Dr McDonald are pretty consistent, and entirely reasonable for someone with Italian ancestry. Remember, your Dodecad Oracle results don't show much Italian either, it says you're closer to Slovenian genetically.

                        Comment


                        • #13
                          Originally posted by nathanm View Post
                          That's comparing apples to oranges. The "Mediterranean" Dodecad ancestral component isn't the same as any of the reference populations in PF. It's one of 12 ancestral populations inferred by Admixture, from a specific set of genetic inputs. You could run the exact same analysis, but get very different results, if you use different inputs. Even if you ran the same test twice with the exact same inputs, you could get slightly different results, because the random seed generator uses time as a starting point.

                          The results in your sig from both PF and Dr McDonald are pretty consistent, and entirely reasonable for someone with Italian ancestry. Remember, your Dodecad Oracle results don't show much Italian either, it says you're closer to Slovenian genetically.
                          So then when is the point to any of it? If your going to get totally different results? I would think if one data set shows you as 0% Med, and the other shows 26%, It just makes you more confused, and makes it totally pointless.

                          Comment


                          • #14
                            I haven't looked into it deeply but I'm pretty sure DIY is using ancestral while PF (and McDonald) are using current comparisions. Regardless DIY is one method and PF and/or McDonald are another. Consider McDonald a "newer version" of PF.

                            You may not get the same results in McDonald as you do in DIY that is true. You should get similar results between PF and McDonald however unless a special sample comparison had been made in the McDonald test that PF doesn't have. The two use the same process but McDonald has more samples and better reports. Actuallly PF can produce another report too but FTDNA has not released it, a spot on the map report.

                            Expect differences when the processes are not the same. Also DNA comes from many mixtures of past ancestry and it is unique to every person. Your DNA may lean toward one side of the family and your slbilng's may not. Just because you see something on the papertrail does not mean it will show up on the PF, McDonald or DIY report.

                            These admixture reports are not the same as a list of shared segments to other family members. These tests use SNPs. Tracing a segment down from ancestors is one thing but trying to trace all the individual SNPs is a different issue. Who knows where all the SNPs came from. They may have even combined from various ancestors to report something else.

                            In some cases it is simply cutting edge science for you.
                            Last edited by mkdexter; 18th August 2011, 01:22 AM.

                            Comment


                            • #15
                              Originally posted by Taz85 View Post
                              So then when is the point to any of it? If your going to get totally different results? I would think if one data set shows you as 0% Med, and the other shows 26%, It just makes you more confused, and makes it totally pointless.
                              The only analysis showing any "Mediterranean" is Dodecad. PF doesn't have any "Mediterranean" samples. However, they do have reference populations that test high in Dodecad's "Mediterranean" component. According to the FAQ, most of the reference populations for PF are from the Human Genetic Diversity Project (HGDP), plus a few other datasets. The Dodecad project includes those, so you can see what their percentages for the Dodecad ancestral components are in this spreadsheet. The HGDP populations highest for the "Mediterranean" component are:
                              Code:
                              Sardinian	55.5
                              North_Italian	49.6
                              Tuscan		47.8
                              Basque		45.6
                              French		33.9
                              Druze		31.5
                              Palestinian	27.4
                              Orcadian	24.8
                              Bedouin		21
                              So your French, Orcadian, Bedouin, and Druze results from PF are consistent with the "Mediterranean" result from Dodecad. Like Matt said above, this is still pretty bleeding edge science. That's why people are submitting their data to Dr McDonald, Dodecad, and other projects. Their methods are different, and might provide different insights into our ancestry. I've learned a lot about how this stuff works in my own experiments with Plink and Admixture, by following this tutorial.

                              FTDNA is a business, so they're naturally going to take a conservative approach. There are more reference populations available, but it might be a while before they add any more. Dr McDonald is doing the analysis for free in his spare time, probably applying the techniques he developed in his primary research. Dienekes is a pseudonymous hobbyist, who obviously knows what he's doing, and managed to amass a larger dataset than any currently published paper.
                              Originally posted by mkdexter View Post
                              I haven't looked into it deeply but I'm pretty sure DIY is using ancestral while PF (and McDonald) are using current comparisions.
                              They're all using current populations for comparison. There simply aren't enough ancient DNA samples, from a wide enough area, to use in this kind of analysis. The method Dienekes used to derive his components is described in a series of blog posts (here, here, here, and here). He created synthetic "zombie" populations from the allele frequencies of K=12 unsupervised Admixture runs, and now uses those as ancestral components to analyze all the samples. It solves the problem of deciding which reference population to represent whole continents or sub-continents.

                              Comment

                              Working...
                              X