No announcement yet.

It works!! blind phasing using spectral clustering

  • Filter
  • Time
  • Show
Clear All
new posts

  • It works!! blind phasing using spectral clustering

    I just wanted to share a very cool result i just got.
    I tried to find a way to do phasing only from the information supplied by ftdna without having an actual DNA sample of one of the parents. The phasing is made ONLY from the FF matching segments.

    After few months thinking about the problem i think i finally found a solution. I used spectral clustering with mixed membership for each 0.5 mbp of the matches and here are the results:

    In the link below is the DNA i share with my maternal grandmother:

    and this is the phasing result without using any results from known relationships!!:

    It is almost 100% acurate!!
    of course i cannot know which one of my grandprents DNA it is but i think it is still very cool.

    What do you think?

  • #2
    I think we need more information.

    I think we need the ability to try this out on a lot of folks to see if the results are consistent.


    • #3
      I think this ought to work fairly well, but yes, I'd like to see more details and test it on a few more people.

      I've also thought it might be interesting to use fuzzy logic and even very tiny match segments from known or supposed relationships to come up with a probability map that shows how likely each bit of your DNA came from a certain branch of the family. For example, I've often seen when I have a group of 5 or so known relatives that descend from a distant ancestor, I often have segments of 1-2cM that all 5 share. While a single person sharing 1 cM is basically meaningless since it could very well be IBS, it seems like the more known relatives from the same ancestor who share that segment, the more likely it is to be IBD. If one was able to determine that multiple small segments were likely from this ancestor, the process could also be reversed if a potential match shared a bunch of small segments that all had been ID'd as likely from a certain source it could possibly be treated just as a single larger segment.


      • #4
        I did some more tests and it is not as consistent as I thaught.
        I am working on the mathematics again.

        Anyway, if anyone is interested i'm posting the paper and slides i relied on.