We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Force skip INDELs in GATK output

Dear GATK, I ma using GATK4, my syntax is:

gatk HaplotypeCaller -R $reference -I in.bam -ERC BP_RESOLUTION --max-alternate-alleles 1 -O out.raw.snps.g.vcf -L $SNP_TARGET

In some regions i have curious output (also compare with another variant callers + IGV).

Result from GATK in a few samples is:

SNP target is:

chr1 152145740 152145741 rs12144907

Output is:

chr1 152145741 . G GGAGAGC,<NON_REF> 3985.64 . BaseQRankSum=-1.051;DP=317;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=-0.789;RAW_MQandDP=1131778,317;ReadPosRankSum=1.158 GT:AD:DP:GQ:PL:SB0/1:138,172,0:310:99:3993,0,3067,4406,3603,8009:76,62,98,74

But in IGV I can see genotype GC (total count is 406, C is 208 and G is 197). I can see a few insert bases (called by GATK output GAGAGC), but only maybe in 3 reads (note: I do not downsampling BAM in IGV).

Result from bcftools is:

chr1 152145741 . G C,A,<*> 0 . DP=381;I16=102,28,110,23,5898,308852,6254,334182,7562,446018,7880,470048,2518,56072,2784,64140;QS=0.48217,0.514629,0.00320142,0;VDB=2.97088e-34;SGB=-0.693147;RPB=0.0605761;MQB=0.804333;MQSB=0.764982;BQB=0.578277;MQ0F=0 PL 255,0,255,255,255,255,255,255,255,255

Which seems to be correct to my observation in IGV. Note - I need parameter -ERC BP_RESOLUTION.

Any idea how to set up GATK to avoid this weird SNP calling in this region (and few others)? Maybe force skip insertion?

Thank you for any help and reply!




  • bshifawbshifaw Member, Broadie, Moderator admin

    Are you looking for the following type of argument?

    The PCR indel model to use
    When calculating the likelihood of variants, we can try to correct for PCR errors that cause indel artifacts. The correction is based on the reference context, and acts specifically around repetitive sequences that tend to cause PCR errors). The variant likelihoods are penalized in increasing scale as the context around a putative indel is more repetitive (e.g. long homopolymer). The correction can be disabling by specifying '-pcrModel NONE'; in that case the default base insertion/deletion qualities will be used (or taken from the read if generated through the BaseRecalibrator). VERY IMPORTANT: when using PCR-free sequencing data we definitely recommend setting this argument to NONE.

    The --pcr-indel-model argument is an enumerated type (PCRErrorModel), which can have one of the following values:

    no specialized PCR error model will be applied; if base insertion/deletion qualities are present they will be used
    a most aggressive model will be applied that sacrifices true positives in order to remove more false positives
    a more aggressive model will be applied that sacrifices true positives in order to remove more false positives
    a less aggressive model will be applied that tries to maintain a high true positive rate at the expense of allowing more false positives

  • zgtmanzgtman Member

    Hello @bshifaw , thank you for reply. I tested to add parameter --pcr-indel-model to my workflow, but result is the same (mentioned above). Any other idea?

  • bshifawbshifaw Member, Broadie, Moderator admin

    Are you looking at the bamout in IGV produced by HC, as mentioned here? Also mind providing a screenshot.

    There are a few more knobs you can turn to decrease sensitivity, they are listed in the HC tool documentation along with their description (e.g. --standard-min-confidence-threshold-for-calling). Search for keywords like sensitivity and threshold and most of them should be highlighted.

Sign In or Register to comment.