Announcement

Collapse
No announcement yet.

Big-Y BAM Analysis Tool

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • It didn't work at all for me. The program completed, but showed 0 at every marker. This was on an FTDNA BAM file.

    EDIT: Is it possible that the tool you're using doesn't know how to read Windows files? The very beginning of the log looks ridiculous:
    ---
    [E::hts_open] fail to open file 'D:\Big'
    [bam_sort_core] fail to open file D:\Big
    ---

    As if it doesn't understand that Windows file names can include the space character!
    Last edited by lgmayka; 5 July 2015, 09:25 AM.

    Comment


    • Originally posted by lgmayka View Post
      It didn't work at all for me. The program completed, but showed 0 at every marker. This was on an FTDNA BAM file.

      EDIT: Is it possible that the tool you're using doesn't know how to read Windows files? The very beginning of the log looks ridiculous:
      ---
      [E::hts_open] fail to open file 'D:\Big'
      [bam_sort_core] fail to open file D:\Big
      ---

      As if it doesn't understand that Windows file names can include the space character!
      It's late here and will fix this bug by tomorrow.

      But, if you place the BAM file in a path that doesn't have spaces, does it work? How much is it accurate and the reliability percentage?

      Comment


      • Originally posted by felix View Post
        But, if you place the BAM file in a path that doesn't have spaces, does it work? How much is it accurate and the reliability percentage?
        It kind of, sort of, works. Out of 111 markers, 49 are green ("reliable"). Most of them look plausible, though some do not. They do not match FTDNA's results, but then my specific purpose was to check whether a sample was switched, and the program's results are far too erratic for that purpose.

        The red values are generally implausible.

        If nothing else, this makes me admire YFull's work all the more. YFull is able to read about 100 of the 111 standard markers, almost always with no deviation from FTDNA's results.

        Comment


        • I tried again on a BAM file known to be correct. (YFull's STRs exactly match FTDNA's.)

          The first attempt failed. The tool created a 76MB file called "out" instead of a folder called "out". I renamed that file and hand-created an empty folder called "out".

          The second attempt succeeded, in a way. 35 out of 111 markers are in green. Of those 35, only 16 match FTDNA's numbers.

          Comment


          • Originally posted by lgmayka View Post
            I tried again on a BAM file known to be correct. (YFull's STRs exactly match FTDNA's.)

            The first attempt failed. The tool created a 76MB file called "out" instead of a folder called "out". I renamed that file and hand-created an empty folder called "out".

            The second attempt succeeded, in a way. 35 out of 111 markers are in green. Of those 35, only 16 match FTDNA's numbers.
            The differences can be fixed. How much is the difference? Just 1 or 2? How does the detailed fasta like display with actual repeats look for those mismatches?

            Comment


            • Originally posted by felix View Post
              It's late here and will fix this bug by tomorrow.

              But, if you place the BAM file in a path that doesn't have spaces, does it work? How much is it accurate and the reliability percentage?
              Fixed the bug and reuploaded.

              Comment


              • Felix,

                I ran your STR tool without problems and took about an hour to complete the analysis on my computer. Unfortunately there were some issues with the results, about the majority of the calls were considered unreliable, and 24 of 111 were reported as null vs. 13 of 111 for YFull's analysis (and none for FTDNA's Y-111 test).

                For a quick comparison of the first 12 markers, the deltas of the tool analysis from actual were: -2, -2, 0, -2, (null), (null), -1, 0, 0, (null), 0, -12

                Hope this helps

                Comment


                • Originally posted by KSDA View Post
                  Felix,

                  I ran your STR tool without problems and took about an hour to complete the analysis on my computer. Unfortunately there were some issues with the results, about the majority of the calls were considered unreliable, and 24 of 111 were reported as null vs. 13 of 111 for YFull's analysis (and none for FTDNA's Y-111 test).

                  For a quick comparison of the first 12 markers, the deltas of the tool analysis from actual were: -2, -2, 0, -2, (null), (null), -1, 0, 0, (null), 0, -12

                  Hope this helps
                  So 4 values in YDNA12 match, which is good to hear. For (null or 0) I believe there are are no values in the Y location. Regarding -2 and -1 deltas, are they unreliable? What are the reliability percentages? If you click on the numbers, you can actually go down to look into the repeats and the missing sites.

                  Unfortunately, nothing can be done for unreliable values as there are no values on the sites and I emit only confident sites from BAM. I try to eliminate the the Y-STR if the repeat pattern doesn't follow the motif with minimum repeats to make sure the Y-STR is as accurate as possible.

                  Comment


                  • Originally posted by felix View Post
                    So 4 values in YDNA12 match, which is good to hear. For (null or 0) I believe there are are no values in the Y location. Regarding -2 and -1 deltas, are they unreliable? What are the reliability percentages? If you click on the numbers, you can actually go down to look into the repeats and the missing sites.

                    Unfortunately, nothing can be done for unreliable values as there are no values on the sites and I emit only confident sites from BAM. I try to eliminate the the Y-STR if the repeat pattern doesn't follow the motif with minimum repeats to make sure the Y-STR is as accurate as possible.
                    Yes, it seems the variant caller is emitting only the most confident sites and it's impacting the STR calls. The small difference values and several of the null results are explained by some of the locations having a small variation in values (e.g. 29T/2C -> T and 51T/4C -> T but both called as '.'), and the rest of the null calls seem to be for multi-copy markers - I can go into IGV and manually count the repeats in the BAM so they're definitely there.

                    All of the first 12 are rated as unreliable, and looking at 13-25 four of them are called reliable and three are correct while one is an unexpected null value at a multi-copy marker. It also looks like only 100% reliability calls are considered confident results, 99% isn't, and there are a couple that have values >100%.

                    Comment


                    • Originally posted by KSDA View Post
                      Yes, it seems the variant caller is emitting only the most confident sites and it's impacting the STR calls. The small difference values and several of the null results are explained by some of the locations having a small variation in values (e.g. 29T/2C -> T and 51T/4C -> T but both called as '.'), and the rest of the null calls seem to be for multi-copy markers - I can go into IGV and manually count the repeats in the BAM so they're definitely there.

                      All of the first 12 are rated as unreliable, and looking at 13-25 four of them are called reliable and three are correct while one is an unexpected null value at a multi-copy marker. It also looks like only 100% reliability calls are considered confident results, 99% isn't, and there are a couple that have values >100%.
                      The only problem is, 99% reliable isn't really 99% accurate value because it takes only the available sites into consideration for reliability calculation and that 1% still has a 50% change of mutation on one site which can reduce the Y-STR values to 1 or 2. I compared my results with Y-DNA 64 with FTDNA which I tested (confirmed) and noticed emitting non-confident sites from BAM to get many sites may be a bad idea since it gives a lot of wrong values (atleast for me). I believe we can be confident on the values that are reliable as accurate and ignore the rest, rather than having a lot of unreliable values using non-confident sites.

                      Comment


                      • I downloaded and used your program today. Comparing the STR values with what I have on my Y 37 test, the results were completely off. It had me worried for a bit that my big Y test was bad.
                        DYS 393 was off by -1. DYS 390 was off by +2. DYS 385 is off by -11 and -2. DYS 464 is off by -17 -17 -17 -17.

                        These are just a hand full of examples.

                        Comment


                        • Originally posted by Muldurath View Post
                          I downloaded and used your program today. Comparing the STR values with what I have on my Y 37 test, the results were completely off. It had me worried for a bit that my big Y test was bad.
                          DYS 393 was off by -1. DYS 390 was off by +2. DYS 385 is off by -11 and -2. DYS 464 is off by -17 -17 -17 -17.

                          These are just a hand full of examples.
                          What is the reliability of those values? If the program says not reliable, then they are not to be used and will be off.

                          Comment


                          • DYS 303 and 390 were at 100% reliability. DYS 19 and 391 were unreliable, but were actually correct. The vast majority of the values were not correct at all.

                            Comment


                            • Originally posted by Muldurath View Post
                              DYS 303 and 390 were at 100% reliability. DYS 19 and 391 were unreliable, but were actually correct. The vast majority of the values were not correct at all.
                              If DYS 303 and 390 are 100 reliable, you go to the details and count the repeats yourself to see the value you getting for Y-STR. That's exactly what is in the BAM.

                              I can fix if the repeats are different when counted manually and what the program reports at the top. However, I cannot fix if the repeats when counted manually matches with what the program reports and the possible explanation could be motif/nomenclature differences. Can you paste the screenshot of the details with the bases for DYS 303 and 390? This helps me to check if the motif interpretations are correct.

                              Comment


                              • See attached to compare.
                                Attached Files

                                Comment

                                Working...
                                X