No announcement yet.

ySearch & haplogroup

  • Filter
  • Time
  • Show
Clear All
new posts

  • ySearch & haplogroup

    My brother's haplogroup indicates R-M269, however, it is not an option to select from dropdown list within ySearch when creating his profile. Instead there are R1xxxxxxx to choose from but we are not sure which one corresponds to his grouping. How does one determine corresponding detailed group for overall grouping? Please advise

  • #2
    This site explains:


    • #3
      Originally posted by Hneel View Post
      The site gives correspondence between SNPs (M269) and haplogroups (in this case R1b1a2), unfortunately it does not explain things.

      In short, the Y-DNA Haplogroup Tree is in a fluid state (being redrawn), so it is more convenient to refer to a haplogroup by its defining SNP, for example R-M269. On the other hand, that notation does not mean anything in terms of a position on the tree, while notation R1b1a2 does.
      So we either have
      R → R1 → R1b → R1b1 → R1b1a → R1b1a2
      R → R-M173 → R-M343 → R-P25_1 → R-P297 → R-M269
      I am still not explaining why... but may be someone has a reference to a detailed up to date issue description...


      • #4
        Originally posted by Yapper View Post
        My brother's haplogroup indicates R-M269, however, it is not an option to select from dropdown list within ySearch when creating his profile. Instead there are R1xxxxxxx to choose from but we are not sure which one corresponds to his grouping. How does one determine corresponding detailed group for overall grouping? Please advise
        If he has 67 markers tested his closest matches may indicate which group he belongs to. A lot of testers at ysearch are SNP tested. Those he matches exactly at 12 markers and is still a close match at 67 is something that I have observed over the last eight years as significant. I hope this helps.


        • #5
          Found the official FTDNA explanation

          The Methodology Behind the 2014 Y-DNA Haplotree

          Family Tree DNA created the 2014 Y-DNA Haplotree in partnership with the National Geographic Genographic Project using the proprietary GenoChip. Launched publicly in late 2012, the chip tests approximately 10,000 Y-DNA SNPs that had not, at the time, been phylogenetically classified. The team used the first 50,000 male samples with the highest quality results to determine SNP positions. Using only tests with the highest possible “call rate” meant more available data, since those samples had the highest percentage of SNPs that produced results, or “calls.”

          In some cases, SNPs that were on the 2010 Y-DNA Haplotree didn’t work well on the GenoChip, so the team used Sanger sequencing on anonymous samples to test those SNPs and to confirm ambiguous locations. For example, if it wasn’t clear if a clade was a brother (parallel) clade, or a downstream clade, they tested for it.

          The scope of the project did not include going farther than SNPs currently on the GenoChip in order to base the tree on the most data available at the time, with the cutoff for inclusion being about November of 2013.

          Where data were clearly missing or underrepresented, the team curated additional data from the chip where it was available in later samples. For example, there were very few Haplogroup M samples in the original dataset of 50,000, so to ensure coverage, the team went through eligible Geno 2.0 samples submitted after November, 2013, to pull additional Haplogroup M data. That additional research was not necessary on, for example, the robust Haplogroup R dataset, for which they had a significant number of samples.

          Family Tree DNA, again in partnership with the Genographic Project, is committed to releasing at least one update to the tree this year. The next iteration will be more comprehensive, including data from external sources such as known Sanger data, Big Y testing, and publications. If the team gets direct access to raw data from other large companies’ tests, then that information will be included as well. We are also committed to at least one update per year in the future.

          Known SNPs will not intentionally be renamed. Their original names will be used since they represent the original discoverers of the SNP. If there are two names, one will be chosen to be displayed and the additional name will be available in the additional data, but the team is taking care not to make synonymous SNPs seems as if they are two separate SNPs. Some examples of that may exist initially, but as more SNPs are vetted, and as the team learns more, those examples will be removed.

          In addition, positions or markers within STRs, as they are discovered, or large insertion/deletion events inside homopolymers, potentially may also be curated from additional data because the event cannot accurately be proven. A homopolymer is a sequence of identical bases, such as AAAAAAAAA or TTTTTTTTT. In such cases it’s impossible to tell which of the bases the insertion is, or if/where one was deleted. With technology such as Next Generation Sequencing, trying to get SNPs in regions such as STRs or homopolymers doesn’t make sense because we’re discovering non-ambiguous SNPs that define the same branches, so we can use the non-ambiguous SNPs instead.

          Some SNPs from the 2010 tree have been intentionally removed. In some cases, those were SNPs for which the team never saw a positive result, so while it may be a legitimate SNP, even haplogroup defining, it was outside of the current scope of the tree. In other cases, the SNP was found in so many locations that it could cause the orientation of the tree to be drawn in more than one way. If the SNP could legitimately be positioned in more than one haplogroup, the team deemed that SNP to not be haplogroup defining, but rather a high polymorphic location.

          To that end, SNPs no longer have .1, .2, or .3 designations. For example, J-L147.1 is simply J-L147, and I-147.2 is simply I-147. Those SNPs are positioned in the same place, but back-end programming will assign the appropriate haplogroup using other available information such as additional SNPs tested or haplogroup origins listed. If other SNPs have been tested and can unambiguously prove the location of the multi-locus SNP for the sample, then that data is used. If not, matching haplogroup origin information is used.

          We will also move to shorthand haplogroup designations exclusively. Since we’re committing to at least one iteration of the tree per year, using longhand that could change with each update would be too confusing. For example, Haplogroup O used to have three branches: O1, O2, and O3. A SNP was discovered that combined O1 and O2, so they became O1a and O1b.

          There are over 1200 branches on the 2014 Y Haplogroup tree, as compared to about 400 on the 2010 tree. Those branches contain over 6200 SNPs, so we’ve chosen to display select SNPs as “active” with an adjacent “More” button to show the synonymous SNPs if you choose.

          The Genographic Project is currently integrating the new data into their system and will announce on their website when the process is complete in the coming weeks. At that time, all Geno 2.0 participants’ results will be updated accordingly and will be accessible via the Genographic Project website.
          2014-04-25 by Janine Cloud


          • #6
            I use this method for finding subclades.
            Enter the ysearch ID
            Show users that tested at least 67 of the markers that I did.

            maximum genetic distance of 1 per marker compared above 57 markers .
            Do not limit search by last name.

            In most cases the first 12 markers match.