Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
HaplotypeCaller does not call variant depending on the interval list I use.
I am trying to explain why HaplotypeCaller calls a variant in some cases but not in others. I notice that a FN had very good coverage (>2000x) but wasn't being called. I ran HaplotypeCaller using an interval list of just that site, and the variant was emitted. I increased the size of the surrounding interval (10bp, 50bp, 100bp on either side), and the variant was still called. But if the interval gets too big (1000bp or larger), the gVCF does not have a call at the site:
chr pos . C . . END= GT:DP:GQ:MIN_DP:PL 0/0:2198:0:2192:0,0,0
I am curious why the PL's are all zero too!
The output for this site when I use the smaller interval lists is:
chr20 pos . C T,A, 36264.77 . BaseQRankSum=8.793;ClippingRankSum=-0.574;DP=2068;ExcessHet=3.0103;MLEAC=1,0,0;MLEAF=0.500,0.00,0.00;MQRankSum=1.003;RAW_MQ=7438970.00;ReadPosRankSum=-0.478 GT:AD:DP:GQ:PL:SB 0/1:1009,994,1,0:2004:99:36293,0,32500,39597,33950,77303,39022,33568,73438,72576:402,607,486,509
Here's my full command:
java -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -Xmx4096m -jar /path/to/GenomeAnalysisTK.jar -T HaplotypeCaller \ -R /path/to/hs38DH.fa \ --minPruning 3 --maxNumHaplotypesInPopulation 200 -variant_index_parameter 128000 -variant_index_type LINEAR --emitRefConfidence GVCF --max_alternate_alleles 3 --contamination_fraction_to_filter 0.0 \ -I /path/to/bam \ -L /path/to/interval/list \ -o /path/to/g.vcf \ -pairHMM VECTOR_LOGLESS_CACHING
GATK version: version 3.5-0-g36282e4
Note that running in
-genotyping_mode GENOTYPE_GIVEN_ALLELES calls it correctly.
Thanks for any help you could offer.