Announcement

Collapse
No announcement yet.

Extent of DNA database

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extent of DNA database

    Does anyone know how many people, unduplicated count, have been tested as to either their y or mt DNA haplogroups? The information on FtDNA, yseach, and mitosearch would suggest to me less than 100,000. Of course I could have misunderstood what I read. In most of the papers I have read, their analysis is based on a thousand or less participants, and they do not go into whether those participants are among a database used for other studies or originally tested. I guess I am trying to get a grasp on what portion of the world and of specific high profile groups, Europeans, Native Americans, etc. have actually been tested. From that I wonder what confirmations have been conducted as to the statistical validity of generalized population conclusions based on those proportions.

  • #2
    I heard too that the genographic project had reached and surpassed 100,000 people, though of course a lot of the data is not publicly available. Ian Logan in his mtdna keeps a count of mtdna FGS - I don't remember how many there are. And yhrd contains the STR results of many papers - but not all.

    As far as I know, all scientific papers say which samples are new to their papers, and which they took from other sources, sometimes it's buried in a supplement or in an appendix. But as far as I know there is no central repository of these results, so it's hard to know the totals.

    cacio

    Comment


    • #3
      Originally posted by Deirwha
      Does anyone know how many people, unduplicated count, have been tested as to either their y or mt DNA haplogroups? The information on FtDNA, yseach, and mitosearch would suggest to me less than 100,000.
      According to the info on FTDNA's home page, as of today, that lab alone is up to 142,323 Y-DNA and 81,336 mtDNA records. SMGF, according to its home page, has collected over 90,000 samples--so that's potentially 90k each for Y-DNA and mtDNA (though who knows how many of those have been processed yet--probably nowhere near that number).

      If FTDNA's claim that they have several times more records than all the other labs combined is correct, then I'm guessing we're looking at perhaps 200,000 Y-DNA records and 125,000 mtDNA records (I could be way off--that's just an off-the-cuff guess).

      Comment


      • #4
        Originally posted by Deirwha
        Does anyone know how many people, unduplicated count, have been tested as to either their y or mt DNA haplogroups?
        I just reread your question, and realized that the numbers I quoted earlier do not represent people who have had their haplogroups definitively confirmed by SNP (deep-clade) testing, if that's what you meant. So who knows how much smaller that total is than the number of people who have simply been Y-STR tested, or mtDNA HVR tested, with their haplogroups simply estimated.

        Comment


        • #5
          More info

          From my FT DNA page five minutes ago, just in terms of Y-DNA results:

          "RECENT ANCESTRAL ORIGINS (RAO) database of Y-DNA results to learn about your ancestry. We have entries from 159 countries in the RAO database, which contains 142323 samples. This represents 30550 unique 12 marker haplotypes. The RAO contains 77636 25 marker samples with 49334 unique haplotypes, and 58681 37 marker samples with 51093 unique haplotypes."

          That information does not tell me the UNDUPLICATED COUNT of persons who have samples. I know, because I am in this count, that I have a 12 marker sample, a 25 marker sample, a 37 marker sample, a 67 marker sample and a deep clade test. So am I counted once, four, or five times. Assuming arguendo that most go on to take at least 2 tests are they counted once or twice? Assuming that there is a basis in testing for FT DNA's representation that the data base includes 51,093 unique haplotypes, are we then assured there are at least 51,093 unduplicated count samples? If there are 51,093 unique haplotypes out of a pool of 58681 37 marker samples what does that tell us, statistically about the validity of this pool for projection of ratios of unique haplotypes in a population, the world, that is what, must be closer to 6 billion than 4.5 by now. I have not looked it up in a while.

          My first question is, what is the unduplicated count.

          My second question is, what does that unduplicated count tell us that is statistically valid with respect to the wider population?

          Please be clear, no criticism of FT DNA should be implied from what I have asked. I am trying to get a handle on how I should evaluate claims made concerning the history and associations of various clades.

          Comment


          • #6
            Originally posted by Deirwha
            From my FT DNA page five minutes ago, just in terms of Y-DNA results:

            "RECENT ANCESTRAL ORIGINS (RAO) database of Y-DNA results to learn about your ancestry. We have entries from 159 countries in the RAO database, which contains 142323 samples. This represents 30550 unique 12 marker haplotypes. The RAO contains 77636 25 marker samples with 49334 unique haplotypes, and 58681 37 marker samples with 51093 unique haplotypes."

            That information does not tell me the UNDUPLICATED COUNT of persons who have samples. I know, because I am in this count, that I have a 12 marker sample, a 25 marker sample, a 37 marker sample, a 67 marker sample and a deep clade test. So am I counted once, four, or five times. Assuming arguendo that most go on to take at least 2 tests are they counted once or twice? Assuming that there is a basis in testing for FT DNA's representation that the data base includes 51,093 unique haplotypes, are we then assured there are at least 51,093 unduplicated count samples? If there are 51,093 unique haplotypes out of a pool of 58681 37 marker samples what does that tell us, statistically about the validity of this pool for projection of ratios of unique haplotypes in a population, the world, that is what, must be closer to 6 billion than 4.5 by now. I have not looked it up in a while.

            My first question is, what is the unduplicated count.

            My second question is, what does that unduplicated count tell us that is statistically valid with respect to the wider population?

            Please be clear, no criticism of FT DNA should be implied from what I have asked. I am trying to get a handle on how I should evaluate claims made concerning the history and associations of various clades.
            I have always assumed that the "number of records" posted on FTDNA's home page represents the number of different people who contributed to that category, not the number of different types of tests that any given person has contributed to that total. In other words, if a person took a 12-marker Y-DNA test and then upgraded in steps, first to 37, and then to 67 markers, that only counts as one Y-DNA record. It's a count of the number of records, with each person being a "record," not a count of the number of tests of a given type. At least that's the way I see it.

            When they refer to the number of unique haplotypes... let's say for the sake of discussion that 100 people in their database are all perfect 12/12 matches, and have only tested 12 markers. That counts as one unique haplotype, not 100.

            As for the statistical validity of this sample size... this is one of my pet peeves about the state of DNA testing right now. People tend to draw sweeping conclusions about the geographic distributions of certain haplogroups without taking into consideration (or without giving it enough weight) that samples are very strongly biased toward certain ancestral regions (e.g., Ireland and Great Britain).

            There are completely different questions that can be addressed using entirely different statistics though... estimates of mutation rates may be more independent of geographical sampling populations than statistical questions of the origins or present distributions of haplogroups.

            I tend to read the Recent Ancestral Origins page with a grain of salt, simply because it is so biased toward the countries that are popular with DNA testing right now...though perhaps it is more useful for certain haplogroups than for others. This is not the fault of the testing labs, just a matter of the current distribution of the databases.

            Comment


            • #7
              I think we have an accord

              I would like it if someone from National Geo or FT DNA addressed our questions just to know for sure. On the other hand it appears to me we share some perceptions about the interpretation of the results either way the unduplicated count question breaks.

              Comment


              • #8
                http://www.nationalgeographic.com/ng...gle/wells.html

                Over 250,000 people have tested with the genographic project.

                Comment


                • #9
                  I had ALL of my dna tests done by FTDNA. I had all tests done.
                  I typed in my surname and it shows only one test completed.
                  I also donated all of the information to science.

                  Comment


                  • #10
                    I had my MTDNA test done thru the Genographic Project. FTDNA is the lab they send dna to. My result was H. I joined this forum around the time I got my mtdna result, in June 2006.
                    Later on I had the H subclade test thru FTDNA directly and I'm an H1.

                    Comment


                    • #11
                      So, is the consensus

                      that we are looking at somewhat in excess of 250,000 unduplicated count tests in aggregate? Do we know how many are mtDNA and how many are y?

                      Comment


                      • #12
                        Originally posted by Deirwha
                        that we are looking at somewhat in excess of 250,000 unduplicated count tests in aggregate? Do we know how many are mtDNA and how many are y?

                        I don't know how many are YDNA and MTDNA. Their website says over a quarter million people.

                        Comment


                        • #13
                          Well thank you for the first figure

                          250,000 sounds like a lot and it is more than I thought. Of course, it is only a fraction of what is out there waiting to be discovered. Hey, I like that analogy. We who have hitched on to this think are like Star Treck, to Infinity and Beyond! And the adventure has just begun.

                          Comment


                          • #14
                            You're welcome.
                            Yes, it is a fascinating adventure.

                            I would like to see the Genographic Project combine their ydna & mtdna results with autosomal dna results. But thankfully DNA Tribes is the expert on detailed autosomal reports.
                            Last edited by rainbow; 14 December 2008, 05:04 PM.

                            Comment


                            • #15
                              I wonder what percentage of the Genographic Project's results has been transferred over to FTDNA's database (it's not automatic--you have to request the transfer) and is therefore actually comparable with other testees. The number of people who are stated to be in FTDNA's database includes a substantial proportion of that 250k people who have tested with the GP (and conversely, a substantial proportion of people who tested with FTDNA have transferred their results over to the GP).

                              Comment

                              Working...
                              X