Announcement

Collapse
No announcement yet.

Get error when uploading ANcestryDNA

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Same

    Any response yet from FTDNA?

    Comment


    • They could fix this

      I noticed the file size difference in August but did nothing else since my files were not affected. This week I had a file that was the small size and could not be fixed with the header change. It took me nearly a day to write the program to fix it, and I'm not a professional programmer. Here is what I discovered (since I put some counters in my program):

      The working V2 Ancestry file has 668,962 lines, 19 of which are header and the last one is blank.

      The broken Ancestry file has 650,430 lines, with the same header lines and final blank line.

      Both files have a number of "null" records where the allele values are "0" rather than A, T, G, or C. Ancestry documents this. There are also some records with values of "I" or "D" -- I can't find documentation of these (ideas?).

      The rs identification strings are different for some of the locations; however, the positions on the chromosome are consistent.

      Given all these factors, I was able to create a master template with all null values (leaving the I's and D's where I found them, for no reason). I then matched up the chromosomes and locations, giving every location the same rsID name as the known good file. I then inserted the results (A,T,C, or G) from the damaged file.

      During the comparison, there were 1,048 records discarded without a matching rsID from the good V2 file.

      The resulting "fixed" file is the exact size and length required by FTDNA upload and includes results ONLY from the DNA subject's faulty file. Missing data that was replaced is null as described by Ancestry's documentation.

      To test the result, I loaded the original Ancestry raw download and the "fixed" file onto GEDmatch. I then ran a one-to-one compare. The match is identical, similar to running the kit against its own kit number. The new results are only 2-10 SNPs different from the original (expected with the missing data). I have tested this on 3 of the damaged (size of 17,184 KB) files with consistent results.

      GEDmatch reports all cM matches between the "before" and "after" files are exact, all matching segments are the same length, with the same start and end positions. This new file does upload to FTDNA with no complaint.

      During the conversion, several details were noted:

      Chr 1: 50618 records added (52 were already null) -- 63 records were discarded // 27 modified rsID's

      Chr 2: 52461 records added (64 were already null) -- 49 records were discarded // 25 modified rsID's

      Chr 3: 41069 records added (45 were already null) -- 41 records were discarded // 11 modified rsID's
      .....

      I could go on, but those of you following this thread get the picture: It took me just over a day from decision to completion, and it's not my business.

      I would share the program (it runs in a command shell on Windows) but I'm not set up for testing on different computers and not sure I want the liability for user error. But we're talking about less than 500 lines of ugly, amateur C++ code that I'm too self-conscious to post publicly. But if I can do it, Ancestry could easily provide this and FTDNA should provide this. It makes me wonder why they don't.

      Comment


      • Originally posted by OldFinneyKid View Post

        .......

        Both files have a number of "null" records where the allele values are "0" rather than A, T, G, or C. Ancestry documents this. There are also some records with values of "I" or "D" -- I can't find documentation of these (ideas?).

        The rs identification strings are different for some of the locations; however, the positions on the chromosome are consistent.

        ........
        I and D refer to insertion and deletion

        rs numbers merge with others when reference build is updated. one would have to look up rsid to see if has merged with another and thus name change
        dbSNP is a public-domain archive for human single nucleotide variations, microsatellites, and small-scale insertions and deletions along with publication, population frequency, molecular consequence, and genomic and RefSeq mapping information for both common variations and clinical mutations.


        attached image on onedrive of merged rsids I have identified
        Attached Files
        Last edited by prairielad; 20 October 2017, 10:34 PM.

        Comment


        • Originally posted by OldFinneyKid View Post
          I noticed the file size difference in August but did nothing else since my files were not affected. This week I had a file that was the small size and could not be fixed with the header change. It took me nearly a day to write the program to fix it, and I'm not a professional programmer. Here is what I discovered (since I put some counters in my program):

          The working V2 Ancestry file has 668,962 lines, 19 of which are header and the last one is blank.

          The broken Ancestry file has 650,430 lines, with the same header lines and final blank line.

          Both files have a number of "null" records where the allele values are "0" rather than A, T, G, or C. Ancestry documents this. There are also some records with values of "I" or "D" -- I can't find documentation of these (ideas?).

          The rs identification strings are different for some of the locations; however, the positions on the chromosome are consistent.

          Given all these factors, I was able to create a master template with all null values (leaving the I's and D's where I found them, for no reason). I then matched up the chromosomes and locations, giving every location the same rsID name as the known good file. I then inserted the results (A,T,C, or G) from the damaged file.

          During the comparison, there were 1,048 records discarded without a matching rsID from the good V2 file.

          The resulting "fixed" file is the exact size and length required by FTDNA upload and includes results ONLY from the DNA subject's faulty file. Missing data that was replaced is null as described by Ancestry's documentation.

          To test the result, I loaded the original Ancestry raw download and the "fixed" file onto GEDmatch. I then ran a one-to-one compare. The match is identical, similar to running the kit against its own kit number. The new results are only 2-10 SNPs different from the original (expected with the missing data). I have tested this on 3 of the damaged (size of 17,184 KB) files with consistent results.

          GEDmatch reports all cM matches between the "before" and "after" files are exact, all matching segments are the same length, with the same start and end positions. This new file does upload to FTDNA with no complaint.

          During the conversion, several details were noted:

          Chr 1: 50618 records added (52 were already null) -- 63 records were discarded // 27 modified rsID's

          Chr 2: 52461 records added (64 were already null) -- 49 records were discarded // 25 modified rsID's

          Chr 3: 41069 records added (45 were already null) -- 41 records were discarded // 11 modified rsID's
          .....

          I could go on, but those of you following this thread get the picture: It took me just over a day from decision to completion, and it's not my business.

          I would share the program (it runs in a command shell on Windows) but I'm not set up for testing on different computers and not sure I want the liability for user error. But we're talking about less than 500 lines of ugly, amateur C++ code that I'm too self-conscious to post publicly. But if I can do it, Ancestry could easily provide this and FTDNA should provide this. It makes me wonder why they don't.
          I've done what I thought was the exact same thing but it wasn't accepted by ftdna uploader, it was by GEDMATCH though and I got similar results as you comparing those. I'm not sure exactly what the difference between our 2 methods were, but I would love to swap source code and compare?

          Here is the kit # for the unmodified file(v2 that ftdna wont accept): A084257

          And here is the kit # for the modified file doing what you did(using a zeroed out template from working v2 file): A245328

          Comment


          • Originally posted by chrisbonisa View Post
            I've done what I thought was the exact same thing but it wasn't accepted by ftdna uploader, it was by GEDMATCH though and I got similar results as you comparing those. I'm not sure exactly what the difference between our 2 methods were, but I would love to swap source code and compare?

            Here is the kit # for the unmodified file(v2 that ftdna wont accept): A084257

            And here is the kit # for the modified file doing what you did(using a zeroed out template from working v2 file): A245328
            I've realized here the main difference is you have left the I's and D's in the template, where I zeroed everything(didnt even realize they were in there). I'm going to try that and see if it works....

            Comment


            • upload file revisions

              Originally posted by prairielad View Post
              I and D refer to insertion and deletion

              rs numbers merge with others when reference build is updated. one would have to look up rsid to see if has merged with another and thus name change
              Thank you, then leaving them intact with the I's and D's was the correct decision. Good to know.

              I understand what you say about looking up rsid's to determine status, from the standpoint of strict adherance to recognized standards. It is obvious FTDNA must reference the absolute positions instead of the names -- the names do change. The positions cannot change: they can only be included or excluded. So the practical decision to match up the Ancestry-provided positions with the FTDNA positions, whatever FTDNA currently names them, does bypass the exercise of looking up what they should have been named. But the rsid name doesn't appear to matter with autosomal transfer -- using the previously accepted Ancestry file as a template for correcting the incompatible file automatically compensates for the rsID revisions. I checked several of the rsIDs from your list and they were all processed correctly.

              Thanks for the feedback.

              Comment


              • Comparisons

                Originally posted by chrisbonisa View Post
                I'm not sure exactly what the difference between our 2 methods were, but I would love to swap source code and compare?
                I sent you a private message yesterday...? After I posted my original suggestion I downloaded the free Visual Studio and turned it into a Windows console program you could try.
                Last edited by OldFinneyKid; 22 October 2017, 09:53 AM.

                Comment


                • Originally posted by OldFinneyKid View Post
                  I sent you a private message yesterday...? After I posted my original suggestion I downloaded the free Visual Studio and turned it into a Windows console program you could try.
                  Yeah I didn't notice the PM, but I've read and responded to it. I've built out a tool that seems to work 100%. The key to solving it is a combination of what I was doing before, filling in the blank template file of a working file with the data from the non-working one, but keeping the I's and D's from the template and not zeroing the entire thing and also filling everything in by position number instead of RSID, which is what I was doing the first go around.

                  I stuck it up on one of my websites where I'm doing some other DNA stuff, anyone can feel free to use it if they wish:



                  If you do use the tool, please respond in here and let others know if it works or not. So far in my testing its working 100%, but the more people that try it the more we will know if this is a solution until FTDNA resolves it themselves.

                  Comment


                  • Originally posted by chrisbonisa View Post
                    Yeah I didn't notice the PM, but I've read and responded to it. I've built out a tool that seems to work 100%. The key to solving it is a combination of what I was doing before, filling in the blank template file of a working file with the data from the non-working one, but keeping the I's and D's from the template and not zeroing the entire thing and also filling everything in by position number instead of RSID, which is what I was doing the first go around.

                    I stuck it up on one of my websites where I'm doing some other DNA stuff, anyone can feel free to use it if they wish:



                    If you do use the tool, please respond in here and let others know if it works or not. So far in my testing its working 100%, but the more people that try it the more we will know if this is a solution until FTDNA resolves it themselves.
                    You should end up seeing something like this...
                    Attached Files

                    Comment


                    • Glad I could help

                      I'm glad you could use the info about the additional records in the template. I, too, have tested mine 4 times now and feel good about the reliablity.

                      For anyone wishing to use it, it's at AncestryDNAFix. Feel free to use it, I don't need any feedback unless you need something....
                      Attached Files
                      Last edited by OldFinneyKid; 22 October 2017, 06:30 PM.

                      Comment


                      • Originally posted by OldFinneyKid View Post
                        I'm glad you could use the info about the additional records in the template. I, too, have tested mine 4 times now and feel good about the reliablity.

                        For anyone wishing to use it, it's at AncestryDNAFix. Feel free to use it, I don't need any feedback unless you need something....
                        Yes, thank you for those additional ideas, that was really the last key to getting the darn thing working! Hopefully between these 2 tools it will solve our problems for now! And I agree with your comments before, there is no reason FTDNA cannot do this themselves, its really not rocket science once you know whats wrong with it.

                        Comment


                        • 2 work-arounds for FTDNA

                          Originally posted by chrisbonisa View Post
                          Hopefully between these 2 tools it will solve our problems for now!
                          Yeah, there are 2 complete methods -- send the DNA file to your server and get it back finished, or process it locally on your own PC. Something for everyone.

                          FTDNA could easily incorporate these checks into their uploads if they really wanted the autosomal transfer to work.

                          Comment


                          • OldFinneyKid and chrisbonisa, you may want to give FTDNA an extra not-so-gentle nudge by submitting the AncestryDNAFix link, or your methods, via the Family Tree DNA Feedback form, because they don't usually read the forums.

                            Comment


                            • Originally posted by KATM View Post
                              OldFinneyKid and chrisbonisa, you may want to give FTDNA an extra not-so-gentle nudge by submitting the AncestryDNAFix link, or your methods, via the Family Tree DNA Feedback form, because they don't usually read the forums.
                              Good idea! I've sent them a message...

                              Comment


                              • Follow up

                                Originally posted by chrisbonisa View Post
                                Yeah I didn't notice the PM, but I've read and responded to it. I've built out a tool that seems to work 100%. The key to solving it is a combination of what I was doing before, filling in the blank template file of a working file with the data from the non-working one, but keeping the I's and D's from the template and not zeroing the entire thing and also filling everything in by position number instead of RSID, which is what I was doing the first go around.

                                I stuck it up on one of my websites where I'm doing some other DNA stuff, anyone can feel free to use it if they wish:



                                If you do use the tool, please respond in here and let others know if it works or not. So far in my testing its working 100%, but the more people that try it the more we will know if this is a solution until FTDNA resolves it themselves.
                                Just a follow up to this, my moms dna which i put through this tool and uploaded successfully to FTDNA has finished processing by them and shows all the matches I would expect as a normal upload would! I can now use the matrix tool and narrow down which lines my matches are coming from on my tree!

                                Comment

                                Working...
                                X