No announcement yet.

Sort order within subgroups; almost-but-not-quite numeric?

  • Filter
  • Time
  • Show
Clear All
new posts

  • Sort order within subgroups; almost-but-not-quite numeric?

    Can someone confirm my suspicions about sort order within subgroups in Y results pages? It's automatic, right, and no way to change it?

    I think it's doing a numeric sort order on columns from left to right, which would be simple and sensible--except for multiple-copy markers, where it seems to be sadly and inconsistently reverting to lexicographical order.

    E.g. looking at the FGC32899 subgroup under the M222 project (as it is now), in DYS439 the first kit is 11 and the next three are 12, so far so good, but the next difference is DYS459b. The first three are 9-10, and the fourth is 9-9. Numerically 9-9 should come before 9-10, but lexicographically "9-9" > "9-10".

    Likewise, in the Kennedy project, I see:

    29 | 17 | 9-10
    29 | 17 | 9-10
    29 | 17 | 9-11
    29 | 17 | 9-9 ???
    29 | 17 | 9-9

    It might require some extra logic to compare these markers where the copy numbers differ, but sorting multicopy markers lexicographically is lazy, wrong, and inconsistent.

    This would just be an annoyance with the results presentation and grouping, but it also makes me worry whether genetic distance computations on multicopy markers are wrong too. I have one kit with an uncommon 6-copy DYS464 who has a distance 1 match at comparing 12 markers (for a 37-marker test) that doesn't show up at all under 37 markers.

  • #2
    As far as I can tell, the order is automatic and there is no way to change it. I had assumed the kits were sorted in some fashion but hadn't figured out how. I think you are 100% correct that they are sorted lexicographically.

    Re genetic distance calculation - I don't think irregularities in the way the kits are displayed on the results page necessarily means there are problems with the genetic distance calculations. I have noticed that the coloration of the values on the results page is not fully consistent with genetic distance. I think the code base they use for the display page is probably different from the code they use for their genetic distance calculations.


    • #3
      There is no way for an admin to sort them. I don't think the left to right order is completely correct. I have seen some kits where they have the DYS460 and Y-GATA-H4 as 10 and 10. Then the next kits are 10 and 9 and those should go first if it really is an order from left to right. Perhaps it is mostly left to right but some STRs are in a different order. On top of that, I have noticed some kits switch order in the list when their STRs are exactly identical.


      • #4
        Originally posted by The_Contemplator View Post
        I have seen some kits where they have the DYS460 and Y-GATA-H4 as 10 and 10. Then the next kits are 10 and 9 and those should go first if it really is an order from left to right.
        In lexicographical order, 10 10 comes before 10 9, just like in a computer file folder a file with the name 10-1-2017-stuff is listed above one with the name 9-1-2017-stuff because the sorting is done character by character and 1 comes before 9.
        Last edited by TwiddlingThumbs; 16 September 2017, 07:50 AM.


        • #5
          Yes I get that, but these are separate STR markers and not multi-copy STRs like DYS464 where lexicographical order has been observed. Perhaps this shows it isn't partial lexicographical but likely completely that way.

          I, too, thought it was just that way for multicopy STRs but looking at the results of a haplogroup, it shows from the first STR that it is lexicographical. View subgroup A's last STR results shown here.


          • #6
            Agreed. It seems to be lexicographical for all the markers, not just the multi-value ones.