Announcement

Collapse
No announcement yet.

Big-Y BAM Analysis Tool

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #46
    variants_y.csv empty

    Hi Felix,

    I downloaded the patch and unzipped it on my Windows 7 laptop, reran Big-Y BAM Analysis UI.exe on my Big Y .bam file and got a variants_y.csv file in the out directory, but it has no records. I'm not sure what specifically to expect in this file in terms of (novel) variants, but surely it should have something > 0?

    Comment


    • #47
      Originally posted by coupster View Post
      Hi Felix,

      I downloaded the patch and unzipped it on my Windows 7 laptop, reran Big-Y BAM Analysis UI.exe on my Big Y .bam file and got a variants_y.csv file in the out directory, but it has no records. I'm not sure what specifically to expect in this file in terms of (novel) variants, but surely it should have something > 0?
      Are there any records in bigy_out_variants.vcf in the extracted folder?

      Comment


      • #48
        Originally posted by coupster View Post
        Hi Felix,

        I downloaded the patch and unzipped it on my Windows 7 laptop, reran Big-Y BAM Analysis UI.exe on my Big Y .bam file and got a variants_y.csv file in the out directory, but it has no records. I'm not sure what specifically to expect in this file in terms of (novel) variants, but surely it should have something > 0?
        Now I understand why. You must unzip into the older version of the tool and overwrite existing files. The patch cannot work independently.

        Comment


        • #49
          Cranking the analysis

          The current updates to the toolkit are processing my FGC file. I noticed the following comment that I had to many reads at one location.
          Attached Files

          Comment


          • #50
            Originally posted by coupster View Post
            Hi Felix,

            I downloaded the patch and unzipped it on my Windows 7 laptop, reran Big-Y BAM Analysis UI.exe on my Big Y .bam file and got a variants_y.csv file in the out directory, but it has no records. I'm not sure what specifically to expect in this file in terms of (novel) variants, but surely it should have something > 0?
            My apologies. Just found a bug and could be related. I uploaded a new patch which should fix it.
            Download: patch.zip.

            If bigy_out.vcf exists, then after applying the patch, the following commands can be applied.

            Code:
            bin\cygwin\bin\bash.exe -c "cat bigy_out.vcf | grep chrY | grep 1/1 > bigy_out_variants.vcf"
            
            bin\jre\bin\java.exe -Xmx2g -classpath bin\bigyvcf\bigyvcf.jar;. fc.id.au.BigYVariantsY bigy_out_variants.vcf

            Comment


            • #51
              Big-Y BAM STR Analysis Tool

              I also made a STR Analysis Tool which can extract STR values from Big-Y BAM.

              Download: Big-Y BAM STR Analysis (64 bit).zip (97.3 MB)

              Comment


              • #52
                Originally posted by felix View Post
                I also made a STR Analysis Tool which can extract STR values from Big-Y BAM.

                Download: Big-Y BAM STR Analysis (64 bit).zip (97.3 MB)
                Felix -

                When I ran your STR analysis tool to extract STR values from my Big Y BAM file, my output file listed approximately 70 short tandem repeats. How many STRs should appear in my output file?

                Stephen

                Comment


                • #53
                  Originally posted by Stephen Parrish View Post
                  Felix -

                  When I ran your STR analysis tool to extract STR values from my Big Y BAM file, my output file listed approximately 70 short tandem repeats. How many STRs should appear in my output file?

                  Stephen
                  Yes, pretty much that number. It can detect a maximum of 96 STRs (the list of STRs are in ref\y.bed file).

                  Please note that BigY is not a test for STR, but the STRs results in BigY are mostly correct for me when using this tool but for some values, it is one less (when compared to STR specific tests which I had done with FTDNA). Depending on your BAM file, the STRs that can be detected may vary.

                  Comment


                  • #54
                    Originally posted by felix View Post
                    Please note that BigY is not a test for STR, but the STRs results in BigY are mostly correct for me when using this tool but for some values, it is one less (when compared to STR specific tests which I had done with FTDNA). Depending on your BAM file, the STRs that can be detected may vary.
                    Felix, can you tell if there might be nomenclature differences for the STRs that differ?

                    For instance, if you take the sequence

                    GAATAATAATG

                    you could say there are three AAT repeats or two TAA repeats.

                    Comment


                    • #55
                      Originally posted by Ann Turner View Post
                      Felix, can you tell if there might be nomenclature differences for the STRs that differ?

                      For instance, if you take the sequence

                      GAATAATAATG

                      you could say there are three AAT repeats or two TAA repeats.
                      This conversion is done entirely by lobSTR project. I get a final VCF from it which i convert/extract STR values.

                      The following link, in the last section provides the details on how to convert/extract STR values from lobSTR VCF output.
                      Link: http://melissagymrek.com/lobstr-code/ystr-codis.html

                      "SMGF nomenclature was used for: DYS389, DYS449, DYS452, DYS461, DYS463, and GATA-A10.
                      Genbase nomenclature was used for: DYS413a/b, DYS472, DYS487, DYS492, DYS494, DYS511, DYS520, DYS537, DYS568, DYS578, DYS590, DYS617, DYS640, DYS714 and DYS717


                      Also,

                      "lobSTR results are given as the number of base pairs length difference from the reference sequence. To convert a lobSTR call at one of these loci to the standard nomenclature, use the simple formula: RefCopyNum + lobSTRAllele/MotifLength. A more in depth tutorial on doing this is coming soon.

                      Unfortunately, there is no in depth tutorial available yet from lobSTR. The way I understood is, from the VCF file, REF value in INFO + (length of ALT sequence - length of REF sequence)/(Motif length) , if ALT length is greater than REF length. I also used VenterSurnameRecovery.pdf as a reference point in building the tool which used a earlier version that outputs in a different tab format.
                      Last edited by felix; 26 April 2014, 10:30 AM.

                      Comment


                      • #56
                        Originally posted by felix View Post
                        "SMGF nomenclature was used for: DYS389, DYS449, DYS452, DYS461, DYS463, and GATA-A10.
                        Genbase nomenclature was used for: DYS413a/b, DYS472, DYS487, DYS492, DYS494, DYS511, DYS520, DYS537, DYS568, DYS578, DYS590, DYS617, DYS640, DYS714 and DYS717
                        I now think, I should account for this final tweaking as well for converting it into FTDNA STR nomenclature.

                        http://www.smgf.org/ychromosome/marker_standards.jspx
                        http://www.ysearch.org/conversion_page.asp

                        Comment


                        • #57
                          I just got my BAM file downloaded and with the string of downloads and patches, I am a little confused on exactly the current procedure for downloading the utility. Do I need to download both the utility and the patch files, or is there a version containing the patches?

                          Thank you

                          Comment


                          • #58
                            Originally posted by JohnG View Post
                            I just got my BAM file downloaded and with the string of downloads and patches, I am a little confused on exactly the current procedure for downloading the utility. Do I need to download both the utility and the patch files, or is there a version containing the patches?

                            Thank you
                            No, just the download and don't download the patch. The patch is only if you have a previous version and want to update the already downloaded version to the latest. I will remove the patch soon to avoid any confusion.

                            Comment


                            • #59
                              I downloaded and it worked great - only took 3 hours.

                              Comment


                              • #60
                                Ok, I have started to look at the results. I was not sure how to 'play with' them. In early looking I note that one of the SNPs shown as ? in my directly reported results shows as positive in the output of the BAM analysis. So first question is how does the BAM analysis 'know better', or does it?

                                I have another ? that did not show on the BAM report.

                                The one that did show fills in a blank in my haplotree.

                                I plan to compare the FASTA MtDNA from the BAM with my reported result.

                                Anything else I should be doing?

                                Comment

                                Working...
                                X