Get error when uploading ANcestryDNA

OldFinneyKid · 22 October 2017, 06:28 PM

Glad I could help

I'm glad you could use the info about the additional records in the template. I, too, have tested mine 4 times now and feel good about the reliablity.

For anyone wishing to use it, it's at AncestryDNAFix. Feel free to use it, I don't need any feedback unless you need something....

Attached Files

Using AncestryDNAFix.pdf (208.2 KB, 26 views)

chrisbonisa · 22 October 2017, 06:23 PM

Originally posted by chrisbonisa View Post

Yeah I didn't notice the PM, but I've read and responded to it. I've built out a tool that seems to work 100%. The key to solving it is a combination of what I was doing before, filling in the blank template file of a working file with the data from the non-working one, but keeping the I's and D's from the template and not zeroing the entire thing and also filling everything in by position number instead of RSID, which is what I was doing the first go around.

I stuck it up on one of my websites where I'm doing some other DNA stuff, anyone can feel free to use it if they wish:

Mapmy23 Tool

http://www.mapmy23.com/tools/ancestry_ftdna_fix.php

mapmy23.com

If you do use the tool, please respond in here and let others know if it works or not. So far in my testing its working 100%, but the more people that try it the more we will know if this is a solution until FTDNA resolves it themselves.

You should end up seeing something like this...

Attached Files

Screen Shot 2017-10-22 at 7.22.29 PM.jpg (115.9 KB, 17 views)

chrisbonisa · 22 October 2017, 06:06 PM

Originally posted by OldFinneyKid View Post

I sent you a private message yesterday...? After I posted my original suggestion I downloaded the free Visual Studio and turned it into a Windows console program you could try.

Yeah I didn't notice the PM, but I've read and responded to it. I've built out a tool that seems to work 100%. The key to solving it is a combination of what I was doing before, filling in the blank template file of a working file with the data from the non-working one, but keeping the I's and D's from the template and not zeroing the entire thing and also filling everything in by position number instead of RSID, which is what I was doing the first go around.

I stuck it up on one of my websites where I'm doing some other DNA stuff, anyone can feel free to use it if they wish:

Mapmy23 Tool

http://www.mapmy23.com/tools/ancestry_ftdna_fix.php

mapmy23.com

If you do use the tool, please respond in here and let others know if it works or not. So far in my testing its working 100%, but the more people that try it the more we will know if this is a solution until FTDNA resolves it themselves.

OldFinneyKid · 22 October 2017, 09:50 AM

Comparisons

Originally posted by chrisbonisa View Post

I'm not sure exactly what the difference between our 2 methods were, but I would love to swap source code and compare?

I sent you a private message yesterday...? After I posted my original suggestion I downloaded the free Visual Studio and turned it into a Windows console program you could try.

OldFinneyKid · 22 October 2017, 09:46 AM

upload file revisions

Originally posted by prairielad View Post

I and D refer to insertion and deletion

rs numbers merge with others when reference build is updated. one would have to look up rsid to see if has merged with another and thus name change

Thank you, then leaving them intact with the I's and D's was the correct decision. Good to know.

I understand what you say about looking up rsid's to determine status, from the standpoint of strict adherance to recognized standards. It is obvious FTDNA must reference the absolute positions instead of the names -- the names do change. The positions cannot change: they can only be included or excluded. So the practical decision to match up the Ancestry-provided positions with the FTDNA positions, whatever FTDNA currently names them, does bypass the exercise of looking up what they should have been named. But the rsid name doesn't appear to matter with autosomal transfer -- using the previously accepted Ancestry file as a template for correcting the incompatible file automatically compensates for the rsID revisions. I checked several of the rsIDs from your list and they were all processed correctly.

Thanks for the feedback.

chrisbonisa · 22 October 2017, 09:34 AM

Originally posted by chrisbonisa View Post

I've done what I thought was the exact same thing but it wasn't accepted by ftdna uploader, it was by GEDMATCH though and I got similar results as you comparing those. I'm not sure exactly what the difference between our 2 methods were, but I would love to swap source code and compare?

Here is the kit # for the unmodified file(v2 that ftdna wont accept): A084257

And here is the kit # for the modified file doing what you did(using a zeroed out template from working v2 file): A245328

I've realized here the main difference is you have left the I's and D's in the template, where I zeroed everything(didnt even realize they were in there). I'm going to try that and see if it works....

chrisbonisa · 22 October 2017, 09:30 AM

Originally posted by OldFinneyKid View Post

I noticed the file size difference in August but did nothing else since my files were not affected. This week I had a file that was the small size and could not be fixed with the header change. It took me nearly a day to write the program to fix it, and I'm not a professional programmer. Here is what I discovered (since I put some counters in my program):

The working V2 Ancestry file has 668,962 lines, 19 of which are header and the last one is blank.

The broken Ancestry file has 650,430 lines, with the same header lines and final blank line.

Both files have a number of "null" records where the allele values are "0" rather than A, T, G, or C. Ancestry documents this. There are also some records with values of "I" or "D" -- I can't find documentation of these (ideas?).

The rs identification strings are different for some of the locations; however, the positions on the chromosome are consistent.

Given all these factors, I was able to create a master template with all null values (leaving the I's and D's where I found them, for no reason). I then matched up the chromosomes and locations, giving every location the same rsID name as the known good file. I then inserted the results (A,T,C, or G) from the damaged file.

During the comparison, there were 1,048 records discarded without a matching rsID from the good V2 file.

The resulting "fixed" file is the exact size and length required by FTDNA upload and includes results ONLY from the DNA subject's faulty file. Missing data that was replaced is null as described by Ancestry's documentation.

To test the result, I loaded the original Ancestry raw download and the "fixed" file onto GEDmatch. I then ran a one-to-one compare. The match is identical, similar to running the kit against its own kit number. The new results are only 2-10 SNPs different from the original (expected with the missing data). I have tested this on 3 of the damaged (size of 17,184 KB) files with consistent results.

GEDmatch reports all cM matches between the "before" and "after" files are exact, all matching segments are the same length, with the same start and end positions. This new file does upload to FTDNA with no complaint.

During the conversion, several details were noted:

Chr 1: 50618 records added (52 were already null) -- 63 records were discarded // 27 modified rsID's

Chr 2: 52461 records added (64 were already null) -- 49 records were discarded // 25 modified rsID's

Chr 3: 41069 records added (45 were already null) -- 41 records were discarded // 11 modified rsID's
.....

I could go on, but those of you following this thread get the picture: It took me just over a day from decision to completion, and it's not my business.

I would share the program (it runs in a command shell on Windows) but I'm not set up for testing on different computers and not sure I want the liability for user error. But we're talking about less than 500 lines of ugly, amateur C++ code that I'm too self-conscious to post publicly. But if I can do it, Ancestry could easily provide this and FTDNA should provide this. It makes me wonder why they don't.

I've done what I thought was the exact same thing but it wasn't accepted by ftdna uploader, it was by GEDMATCH though and I got similar results as you comparing those. I'm not sure exactly what the difference between our 2 methods were, but I would love to swap source code and compare?

Here is the kit # for the unmodified file(v2 that ftdna wont accept): A084257

And here is the kit # for the modified file doing what you did(using a zeroed out template from working v2 file): A245328

prairielad · 20 October 2017, 10:21 PM

Originally posted by OldFinneyKid View Post

.......

Both files have a number of "null" records where the allele values are "0" rather than A, T, G, or C. Ancestry documents this. There are also some records with values of "I" or "D" -- I can't find documentation of these (ideas?).

The rs identification strings are different for some of the locations; however, the positions on the chromosome are consistent.

........

I and D refer to insertion and deletion

rs numbers merge with others when reference build is updated. one would have to look up rsid to see if has merged with another and thus name change

Home - SNP - NCBI

https://www.ncbi.nlm.nih.gov/snp

dbSNP is a public-domain archive for human single nucleotide variations, microsatellites, and small-scale insertions and deletions along with publication, population frequency, molecular consequence, and genomic and RefSeq mapping information for both common variations and clinical mutations.

attached image on onedrive of merged rsids I have identified

https://1drv.ms/i/s!Al27wnXopRKxhwndtftCo9GljnJU

Attached Files

merged report.png (83.1 KB, 3 views)

OldFinneyKid · 20 October 2017, 06:29 PM

They could fix this

I noticed the file size difference in August but did nothing else since my files were not affected. This week I had a file that was the small size and could not be fixed with the header change. It took me nearly a day to write the program to fix it, and I'm not a professional programmer. Here is what I discovered (since I put some counters in my program):

The working V2 Ancestry file has 668,962 lines, 19 of which are header and the last one is blank.

The broken Ancestry file has 650,430 lines, with the same header lines and final blank line.

Both files have a number of "null" records where the allele values are "0" rather than A, T, G, or C. Ancestry documents this. There are also some records with values of "I" or "D" -- I can't find documentation of these (ideas?).

The rs identification strings are different for some of the locations; however, the positions on the chromosome are consistent.

Given all these factors, I was able to create a master template with all null values (leaving the I's and D's where I found them, for no reason). I then matched up the chromosomes and locations, giving every location the same rsID name as the known good file. I then inserted the results (A,T,C, or G) from the damaged file.

During the comparison, there were 1,048 records discarded without a matching rsID from the good V2 file.

The resulting "fixed" file is the exact size and length required by FTDNA upload and includes results ONLY from the DNA subject's faulty file. Missing data that was replaced is null as described by Ancestry's documentation.

To test the result, I loaded the original Ancestry raw download and the "fixed" file onto GEDmatch. I then ran a one-to-one compare. The match is identical, similar to running the kit against its own kit number. The new results are only 2-10 SNPs different from the original (expected with the missing data). I have tested this on 3 of the damaged (size of 17,184 KB) files with consistent results.

GEDmatch reports all cM matches between the "before" and "after" files are exact, all matching segments are the same length, with the same start and end positions. This new file does upload to FTDNA with no complaint.

During the conversion, several details were noted:

Chr 1: 50618 records added (52 were already null) -- 63 records were discarded // 27 modified rsID's

Chr 2: 52461 records added (64 were already null) -- 49 records were discarded // 25 modified rsID's

Chr 3: 41069 records added (45 were already null) -- 41 records were discarded // 11 modified rsID's
.....

I could go on, but those of you following this thread get the picture: It took me just over a day from decision to completion, and it's not my business.

I would share the program (it runs in a command shell on Windows) but I'm not set up for testing on different computers and not sure I want the liability for user error. But we're talking about less than 500 lines of ugly, amateur C++ code that I'm too self-conscious to post publicly. But if I can do it, Ancestry could easily provide this and FTDNA should provide this. It makes me wonder why they don't.

hansonrf · 20 October 2017, 12:56 PM

Same

Any response yet from FTDNA?

Zack · 17 October 2017, 09:51 AM

AncestryDNA kit received on 10/17/17

Upload attempt (many) on 10/17/17

Can not upload successfully, headers were already the values suggested earlier in the thread.

Error Code: 'The file is an unsupported version or in a corrupt/malformed format.'

Zip file is 5,607 KB
Extracted *.txt is 17,184 KB

WilliamKF · 15 October 2017, 11:13 AM

GEDmatch works fine, but FTDNA rejects

I'm having this issue too with my cousin's results that just posted at Ancestry.com last week. GEDmatch.com accepted the transfer with no issues, same for MyHerritage.com.

chrisbonisa · 13 October 2017, 02:58 PM

Originally posted by aprilmcg123 View Post

Unfortunately, this doesn't make sense as being just a new chip issue between Ancestry and FTDNA as I have evidence of several profiles processed using the same chip (V2 according to the file) within days of each other where the size difference is occuring and some of them will upload and others (those with the missing SNPs) will not.

I'm curious about this myself, I wonder if its a new chip not fully rolled out yet? Or perhaps new software? But yes I agree, its inconsistent and confusing.

aprilmcg123 · 13 October 2017, 09:25 AM

Originally posted by chrisbonisa View Post

I've just done a similar comparison using php/mysql database and came up with the exact same results as you with my broken file(2.0b):

v2.0a_count - 668942
v2.0b_count - 650410
diff_count - 18532
v2.0a_missing - 1354
v2.0b_missing - 19886

This suggests its not an error on the part of Ancestry but either a different chip or simply different parsed results for some reason. This also means its most likely FTDNA just not able to handle this new scheme, its probably seeing the "2.0" in the file header and looking for some exact snp schema in the file and since it doesn't match it rejects it. Not counting the new unique snps, this is merely a difference of ~3% which should still be plenty of overlap for FTDNA to handle this, if they would simply try. All other sites are able to accept this type of export without a problem and generate matches from it, I've tried gedmatch, myheritage, wegene, dnaland, etc. FTDNA is the only one erring out...

FTDNA - you are losing money and customers, Ancestry is utilizing this new schema for more and more exports, we need a fix ASAP!

Unfortunately, this doesn't make sense as being just a new chip issue between Ancestry and FTDNA as I have evidence of several profiles processed using the same chip (V2 according to the file) within days of each other where the size difference is occuring and some of them will upload and others (those with the missing SNPs) will not.

chrisbonisa · 12 October 2017, 03:14 PM

Originally posted by ltd-jean-pull View Post

This post dated the 27th August was the first to give us a hint about what might be causing this current issue with uploading Ancestry autosomal data to FamilyTreeDNA being due to a change in file size -i.e. Missing data.

Some might think that FamilyTreeDNA ought to make it a priority to sort out this issue so that they can upload a file that it is a different format from the the files FtDNA is set up to accept, even though FtDNA doesn't get informed by Ancestry that they're going to change the files. Ancestry can't inform their customer service reps or customers of the file change so I doubt they'd pick up the phone and tell a competitor.

Anyway, back to August 27th. What a shame that some computer geek who works for FtDNA who was stuck at home with water lapping around the porch didn't somehow manage to log-in from home and start working on the coding. And how unreasonable of them to close down the server for a few days meaning that e-mails couldn't be received during the flooding.

I can understand people's frustration, but some people need to get some perspective. FamilyTreeDNA's priority is their PAYING customers who tested with them, and in the last few weeks they've been working in very trying circumstances.

Hopefully it will be sorted in due course, but it wouldn't surprise me if Ancestry continues to muck around with file size. I have no idea why they do it.

This is completely the wrong approach. First of all there are TONS of us who do the transfer and pay to upgrade it to get the chromosome browser. As a matter of fact, FTDNA isn't of much use to me without that as they have far fewer people than Ancestry or 23andme. FTDNA can only be successful if they allow transfers to build up their user base over time. Also, lots of us, me included, pay for other products and want to have all of our data aggregated in one place. I paid the $160 for y-dna-37 test and will most likely upgrade that(+$100). And the $19 for each transfer upgrade is pure profit for them and likely the same amount of profit they receive from autosomal tests directly through them after lab costs.

Regarding the hurricane, I run an IT company with a datacenter and central office in Tampa, FL.....you know, where hurricane Irma hit. We had emergency contingency plans in place for this sort of thing years ago and because of that, the impact was nill even though power was out for almost all of our local employees and flooding everywhere. We are no where as large as FTDNA and they are more than capable of handling emergencies, and I'm sure they did in fact deal with it in a responsible way. I don't think this problem is in any way related to the hurricane. Not to mention that was like 6 weeks ago.

Get error when uploading ANcestryDNA

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: