No announcement yet.

Changed ranking of matches

  • Filter
  • Time
  • Show
Clear All
new posts

  • Changed ranking of matches

    In my spreadsheet I have matches listed by total CM. However, I think they changed how they rank matches, perhaps to be based more on relationship. It looked like some matches were missing, but then I sorted by CM and it made sense. It also looks like when two people have the same amount of CM, it next ranks by longest block.

  • #2
    The lack of a consistent sort order, both here and in GEDmatch, has been a minor irritation for a long time. I expect each vendor (and not just FTDNA and GEDmatch) has its own algorithm for ranking autosomal DNA matches. In some cases (GEDmatch) it may be possible to sort on either the longest matching segment or the total shared cM (subject to whatever segment matching parameters are used, another thing that may change from time to time). So far so good, the order of the results seems reasonable.

    The irritation comes about when I attempt to review matches in descending order of closest match, a few pages at a time. From one day to the next, at least when I was doing this in a systematic way on GEDmatch, the order of matches having the same (rounded) value of total shared cM would change from day to day. In other words, the way the matches are sorted does not seem to result in a unique, fixed sort sequence. If I go back to the same kit where I stopped the previous day, the surrounding kits may not be in the same order. That situation made it difficult to be sure I had actually looked at each match, and hadn't missed any. The data processing explanation behind this effect is likely that the "sort key" (the list of variables that are used to sort the data) did not include enough additional variables to result in a unique sort key, such that each kit must end up in the same order every time the data are accessed and sorted. Assuring a stable sort sequence is a basic problem of software engineering. There's no reason it can't be done here, because it is always possible to include the timestamp when the kit was processed as the final variable in the sort key. In my opinion, there would be no additional cost to any vendor if the sort key were extended to produce a reliable unique sort sequence for matches, and doing so would facilitate manipulation of the results by means of spreadsheets and similar methods that customers may want to use for analyzing their data. A reliable sort order would seem to be essential, too, in the vendor's' quality assurance activities.