Announcement

Collapse
No announcement yet.

Big-Y BAM Analysis Tool

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #61
    BAM analysis kit

    I updated the BAM analysis kit to work with all BAM files including BigY BAM that has build 37 positions. The reference for BigY BAM used is from Yoruba. Previously, the BigY Analysis tool was also reporting based on this. It is now changed to provide RSRS directly which will match mtDNA mutations if you had done other tests. Also, a customizable UI where individual chromosomes alone can be run. I also bundled all BAM related into one tool, BAM analysis kit and made other tools obsolete.

    Some features include:
    • Works on all BAM files with build 37 positions
    • Extracts SNPs from Autosomal DNA, X-DNA, Y-DNA and mtDNA.
    • Provides mtDNA FASTA.
    • Auto-converts Yoruba references in mtDNA and provides RSRS values.
    • Provides Y-SNPs in ISOGG Nomenclature.
    • Provides Y-STR markers.
    • Calculates Telomere Length.


    Link: BAM Analysis Kit

    Comment


    • #62
      Bam files

      Now that I have successfully installed the tools, how or where do I get the/mine *.bam file from?
      Many thanks,
      Russ


      Originally posted by felix View Post
      I updated the BAM analysis kit to work with all BAM files including BigY BAM that has build 37 positions. The reference for BigY BAM used is from Yoruba. Previously, the BigY Analysis tool was also reporting based on this. It is now changed to provide RSRS directly which will match mtDNA mutations if you had done other tests. Also, a customizable UI where individual chromosomes alone can be run. I also bundled all BAM related into one tool, BAM analysis kit and made other tools obsolete.

      Some features include:
      • Works on all BAM files with build 37 positions
      • Extracts SNPs from Autosomal DNA, X-DNA, Y-DNA and mtDNA.
      • Provides mtDNA FASTA.
      • Auto-converts Yoruba references in mtDNA and provides RSRS values.
      • Provides Y-SNPs in ISOGG Nomenclature.
      • Provides Y-STR markers.
      • Calculates Telomere Length.


      Link: BAM Analysis Kit

      Comment


      • #63
        Originally posted by RussellC View Post
        Now that I have successfully installed the tools, how or where do I get the/mine *.bam file from?
        Many thanks,
        Russ
        You should request your BAM file from FTDNA helpdesk.

        Comment


        • #64
          Bam files

          Many thanks Felix,

          Russell

          Originally posted by felix View Post
          You should request your BAM file from FTDNA helpdesk.

          Comment


          • #65
            Felix-
            I ran the updated BAM Analysis kit. The analysis ran perfect!
            The "telomere" text file has no data, this is displayed when I open the file:
            "Telomere Length:" .
            There is also a "telseq.out" file, I am unable to open the file. What program is needed to open the OUT file?
            Thanks

            Comment


            • #66
              Originally posted by Հայկո View Post
              Felix-
              I ran the updated BAM Analysis kit. The analysis ran perfect!
              The "telomere" text file has no data, this is displayed when I open the file:
              "Telomere Length:" .
              There is also a "telseq.out" file, I am unable to open the file. What program is needed to open the OUT file?
              Thanks
              I also ran the telomere program and got this result.

              Comment


              • #67
                Originally posted by JohnG View Post
                I also ran the telomere program and got this result.
                Hi John, I presume you and Zuylin had a bam file to use, to be able to run the Bam Analysis Kit, and you did get output files?

                I thought I ran mine correctly [after searching for the "Example Bam.bam" file, and received the error;
                "User error has occurred (Version3.1-1-g07a4bf8: )"
                no headers .....etc

                regs,
                Russ

                Comment


                • #68
                  I ran the BAM Analysis last week and got all the output files. I tried to run the telomere program later and got no length. Scared me, I am getting a year older every year and thought I had run out of telomere.

                  Comment


                  • #69
                    Originally posted by RussellC View Post
                    I thought I ran mine correctly [after searching for the "Example Bam.bam" file, and received the error;
                    "User error has occurred (Version3.1-1-g07a4bf8: )"
                    no headers .....etc
                    The script, as well as the tool GATK requires group headers. BigY and most of the genetic genealogy related BAM files comes with group headers. So, I didn't care fixing BAMs without group headers before processing. Let me know if some genetic genealogy BAMs gives such errors.

                    Comment


                    • #70
                      Originally posted by Հայկո View Post
                      Felix-
                      I ran the updated BAM Analysis kit. The analysis ran perfect!
                      The "telomere" text file has no data, this is displayed when I open the file:
                      "Telomere Length:" .
                      There is also a "telseq.out" file, I am unable to open the file. What program is needed to open the OUT file?
                      Thanks
                      Just open the telseq.out file in MS Word or wordpad. It is just a text file but has unix line endings (i think). It is basically the raw output from telseq. telseq.out contains the following column definitions (Ref: https://github.com/zd1/telseq/) The text file just takes the LENGH_ESTIMATE from this raw data.

                      Column Definitions:
                      1. ReadGroup: read group the result is corresponding to. Defined by the RG tag in BAM header.
                      2. Library: sequencing library that the read group belongs to.
                      3. Sample: defined by the SM tag in BAM header.
                      4. Total: total number of reads in this read group.
                      5. Mapped: total number of mapped reads in this read group. Wether a read is mapped is determined by SAM flag 0x4.
                      6. Duplicates: total number of duplicate reads in this read group. Wether a read is a duplicate is determined by SAM flag 0x400.
                      7. LENGH_ESTIMATE: estimated telomere length
                      8. TEL0: read counts for reads containing no TTAGGG/CCCTAA repeats.
                      9. TEL1: read counts for reads containing only 1 TTAGGG/CCCTAA repeats.
                      10. TELn: read counts for reads containing only n TTAGGG/CCCTAA repeats.
                      11. TEL16: read counts for reads containing 16 TTAGGG/CCCTAA repeats.
                      12. GC0: read counts for reads with GC composition between 40%-42%,
                      13. GC1: read counts for reads with GC composition between 42%-44%,
                      14. GCn: read counts for reads with GC composition between (40%+n*2%)-(42%+(n+1)*2%),
                      15. GC9: read counts for reads with GC composition between 58%-60%,

                      Comment


                      • #71
                        BAM file with invalid header

                        Originally posted by felix View Post
                        The script, as well as the tool GATK requires group headers. BigY and most of the genetic genealogy related BAM files comes with group headers. So, I didn't care fixing BAMs without group headers before processing. Let me know if some genetic genealogy BAMs gives such errors.
                        Felix--

                        I've tried, without success, to use your BAM Analysis Kit on a .bam file posted at http://evolbio.ut.ee/leviteY/P3.recalQ.realigned.bam. (The website, run by the Estonian Biocentre, posts data from some recent papers.)

                        The kit reports:
                        [bam_header_read] invalid BAM binary header <this is not a BAM file>.
                        [bam_sort_core] truncated file. Continue anyway.

                        At this point, samtools.exe stops working.

                        Thanks for your help.

                        Jeff

                        Comment


                        • #72
                          Originally posted by JeffWexler View Post
                          Felix--

                          I've tried, without success, to use your BAM Analysis Kit on a .bam file posted at http://evolbio.ut.ee/leviteY/P3.recalQ.realigned.bam. (The website, run by the Estonian Biocentre, posts data from some recent papers.)

                          The kit reports:
                          [bam_header_read] invalid BAM binary header <this is not a BAM file>.
                          [bam_sort_core] truncated file. Continue anyway.

                          At this point, samtools.exe stops working.

                          Thanks for your help.

                          Jeff
                          "truncated file" - now, that sounds like an alarm for lack of space. How much disk space do you have on the drive you are running the tool?

                          Comment


                          • #73
                            Originally posted by JeffWexler View Post
                            Felix--

                            I've tried, without success, to use your BAM Analysis Kit on a .bam file posted at http://evolbio.ut.ee/leviteY/P3.recalQ.realigned.bam. (The website, run by the Estonian Biocentre, posts data from some recent papers.)

                            The kit reports:
                            [bam_header_read] invalid BAM binary header <this is not a BAM file>.
                            [bam_sort_core] truncated file. Continue anyway.

                            At this point, samtools.exe stops working.

                            Thanks for your help.

                            Jeff
                            I downloaded the mentioned bam file and noticed it contains only Y chromosome. Hence splitting based on chromosomes during processing gives 0 bytes for all except Y, which is fine and correct. To save time, you must select only Y during processing.

                            Comment


                            • #74
                              I reran the telomere and got this


                              **** Telomere Length ****

                              [bigy.bam]
                              Start analysing BAM bigy.bam
                              Specified BAM has 1 read groups
                              [scan] processed 10000000 reads
                              [scan] total reads in BAM scanned 10811987
                              Completed scanning BAM
                              1 [main] telseq 7980 cygwin_exception:pen_stackdumpfile: Dumping stack t
                              race to telseq.exe.stackdump
                              ******* RESULT ********
                              /bin/cat: telseq.out: No such file or directory
                              Telomere Length:

                              Comment


                              • #75
                                Originally posted by felix View Post
                                Just open the telseq.out file in MS Word or wordpad. It is just a text file but has unix line endings (i think). It is basically the raw output from telseq. telseq.out contains the following column definitions (Ref: https://github.com/zd1/telseq/) The text file just takes the LENGH_ESTIMATE from this raw data.

                                Column Definitions:
                                1. ReadGroup: read group the result is corresponding to. Defined by the RG tag in BAM header.
                                2. Library: sequencing library that the read group belongs to.
                                3. Sample: defined by the SM tag in BAM header.
                                4. Total: total number of reads in this read group.
                                5. Mapped: total number of mapped reads in this read group. Wether a read is mapped is determined by SAM flag 0x4.
                                6. Duplicates: total number of duplicate reads in this read group. Wether a read is a duplicate is determined by SAM flag 0x400.
                                7. LENGH_ESTIMATE: estimated telomere length
                                8. TEL0: read counts for reads containing no TTAGGG/CCCTAA repeats.
                                9. TEL1: read counts for reads containing only 1 TTAGGG/CCCTAA repeats.
                                10. TELn: read counts for reads containing only n TTAGGG/CCCTAA repeats.
                                11. TEL16: read counts for reads containing 16 TTAGGG/CCCTAA repeats.
                                12. GC0: read counts for reads with GC composition between 40%-42%,
                                13. GC1: read counts for reads with GC composition between 42%-44%,
                                14. GCn: read counts for reads with GC composition between (40%+n*2%)-(42%+(n+1)*2%),
                                15. GC9: read counts for reads with GC composition between 58%-60%,
                                Felix, there is no info when I open the telseq file with MS Word, see attached photo.
                                Attached Files

                                Comment

                                Working...
                                X