If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Force skip INDELs in GATK output
Dear GATK, I ma using GATK4, my syntax is:
gatk HaplotypeCaller -R $reference -I in.bam -ERC BP_RESOLUTION --max-alternate-alleles 1 -O out.raw.snps.g.vcf -L $SNP_TARGET
In some regions i have curious output (also compare with another variant callers + IGV).
Result from GATK in a few samples is:
SNP target is:
chr1 152145740 152145741 rs12144907
chr1 152145741 . G GGAGAGC,<NON_REF> 3985.64 . BaseQRankSum=-1.051;DP=317;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=-0.789;RAW_MQandDP=1131778,317;ReadPosRankSum=1.158 GT:AD:DP:GQ:PL:SB0/1:138,172,0:310:99:3993,0,3067,4406,3603,8009:76,62,98,74
But in IGV I can see genotype GC (total count is 406, C is 208 and G is 197). I can see a few insert bases (called by GATK output GAGAGC), but only maybe in 3 reads (note: I do not downsampling BAM in IGV).
Result from bcftools is:
chr1 152145741 . G C,A,<*> 0 . DP=381;I16=102,28,110,23,5898,308852,6254,334182,7562,446018,7880,470048,2518,56072,2784,64140;QS=0.48217,0.514629,0.00320142,0;VDB=2.97088e-34;SGB=-0.693147;RPB=0.0605761;MQB=0.804333;MQSB=0.764982;BQB=0.578277;MQ0F=0 PL 255,0,255,255,255,255,255,255,255,255
Which seems to be correct to my observation in IGV. Note - I need parameter -ERC BP_RESOLUTION.
Any idea how to set up GATK to avoid this weird SNP calling in this region (and few others)? Maybe force skip insertion?
Thank you for any help and reply!