If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

HaplotypeCaller, StrandBiasBySample Annotation


I am using GATK 3.3-0 HaplotypeCaller for variant calling. When I run HaplotypeCaller with the command

java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R all_chr_reordered.fa -I 30_S30.bam --genotyping_mode DISCOVERY -stand_emit_conf 10 -stand_call_conf 50 -o 30_S30_control1.vcf -L brca.intervals

I get all the variants I want, however, I also want to get the number of forward and reverse reads that support REF and ALT alleles. Therefore I use StrandBiasBySample annotation when running HaplotypeCaller with the command:

java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R all_chr_reordered.fa -I 30_S30.bam --genotyping_mode DISCOVERY -stand_emit_conf 10 -stand_call_conf 50 -o 30_S30_control2.vcf -L brca.intervals -A StrandBiasBySample

The SB field is added, but a variant that was in 30_S30_control1.vcf is absent in 30_S30_control2.vcf. All the remaining variants are there. The only difference between two variant calls was adding -A StrandBiasBySample. What I'm wondering about is that why this one variant is absent.

the missing variant:
17 41276152 . CAT C 615.73 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.147;ClippingRankSum=0.564;DP=639;FS=15.426;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=-1.054;QD=0.96;ReadPosRankSum=0.698;SOR=2.181 GT:AD:DP:GQ:PL 0/1:565,70:635:99:653,0,18310

So I decided to run HaplotypeCaller without -A StrandBiasBySample and later add the annotations with VariantAnnotator. Here is the command:

java -jar GenomeAnalysisTK.jar -T VariantAnnotator -R all_chr_reordered.fa -I 30_S30.bam --variant 30_S30_control1.vcf -L 30_S30_control1.vcf -o 30_S30_control1_SBBS.vcf -A StrandBiasBySample

However, the output vcf file 30_S30_control1_SBBS.vcf was not different from the input variant file 30_S30_control1.vcf except for the header, SB field wasn't added. Why was the SB field not added? Is there any other way to get the number of forward and reverse reads?

Please find 30_S30_control1.vcf, 30_S30_control2.vcf and 30_S30_control1_SBBS.vcf in attachment


Best Answer


Sign In or Register to comment.