Announcement

Collapse
No announcement yet.

Steps/procedure verification for vcf conversion and post processing .BAM files

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Steps/procedure verification for vcf conversion and post processing .BAM files

    Here is what I did and I got some very interesting results. Can someone verify the steps?

    I tried to convert .BAM to .VCF, where bigy.bam is my input.

    Code:
    $samtools sort bigy.bam bigy_sorted
    
    $samtools index bigy_sorted.bam
    
    $samtools faidx ucsc.hg19.fasta
    
    $java -Xmx2g -jar ~/picard-tools/CreateSequenceDictionary.jar  R=ucsc.hg19.fasta O=ucsc.hg19.dict
    
    $java -Xmx2g -jar ~/GATK/GenomeAnalysisTK.jar -T RealignerTargetCreator -R ucsc.hg19.fasta 
        -I bigy_sorted.bam  -o bigy.intervals
    
    $java -Xmx2g -jar ~/GATK/GenomeAnalysisTK.jar -T IndelRealigner -R ucsc.hg19.fasta 
        -I bigy_sorted.bam -targetIntervals bigy.intervals -o bigy_sorted_realigned.bam
    
    $samtools index bigy_sorted_realigned.bam
    
    $java -Xmx2g -jar ~/GATK/GenomeAnalysisTK.jar -l INFO -R ucsc.hg19.fasta -T UnifiedGenotyper 
         -I bigy_sorted_realigned.bam -rf BadCigar -o bigy_out.vcf 
         --output_mode EMIT_ALL_CONFIDENT_SITES
    I got a ~5 GB VCF file compressed to BigY_BAM_to_VCF.zip (237 MB). Is the above procedure for converting to VCF correct?

    Then, I filtered only the SNPs used by FTDNA (~700000) and it wasn't much just 9632 SNPs for autosomal and 177 SNPs for X.

    However, if I filter by all available SNPs for build 37/ snp138 (~ 60 million SNPs), I get more than 1.72 million autosomal SNPs and 53693 X-Chr SNPs. Also, I was able to get 41983 Y-SNPs (~8000 additional SNPs than what was reported in results page - on contrast, YFull reported 52304 YSNPs). Are the 1.7 million autosomal SNPs correct?

    The converted files can be found here.

    I did several random checks with the output I received with my autosomal DNA test results received and everything that has common SNPs seems to be matching.

    Are there any parameters in the above commands required to get better results?
    Last edited by felix; 14 April 2014, 01:27 PM.
Working...
X