We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

HaplotypeCaller: Alternate allele get called or not depending on -ip option

slegrasslegras Member
edited May 2018 in Ask the GATK team

Hi, I'm currently analyzing some data (exome-seq) using HaplotypeCaller and get what seems to me an odd behaviour:
The problem is that I've got a position which is clearly bi-allelic in IGV and that is said to have only the reference allele in the gVCF I'm generating with HaplotypeCaller.

Here is the command line I used:

nohup java -jar PATH/GenomeAnalysisTK-3.4-46/GenomeAnalysisTK.jar \
-T HaplotypeCaller \
-R PATH/ucsc.hg19_noHaps.fasta \
-I PATH/JLCL254.realigned.recalibrated.bam \
-L PATH/merged.bed \
-ip 50 \
--emitRefConfidence GVCF \
--variant_index_type LINEAR \
--variant_index_parameter 128000 \
-o JLCL254.vcf 

Here is the gVCF line of the variant of interest:


chr1 912049 . T NON_REF . . END=912049 GT:DP:GQ:MIN_DP:PL 0/0:68:0:68:0,0,0

The variant of interest is located 19bp away from the captured region but with "-ip 50" it should be detected.

To check what is really analyzed, I output the bamout for all analyzed regions (-L PATH/merged.bed, -ip 50) and saw that the location of the variant is not analyzed (If I'm correct: as there is no coverage, this is not an active region).

Then I forced the bamout at the location +/-20nt around my variant to check whether some reads with the alternate allele are still kept. I used:
-L chr1:912029-912069 \
-forceActive \

Doing so I've being able to see that many reads with the alternate allele are indeed still kept. The gVCF file generated along with the bamout file contains now my variant of interest with the alternate allele:


chr1 912049 rs9803103 T C,NON_REF 1235.77 . BaseQRankSum=-0.677;ClippingRankSum=-0.942;DB;DP=61;MLEAC=1,0;MLEAF=0.500,0.00;MQ=54.88;MQRankSum=-0.600;ReadPosRankSum=-0.195 GT:AD:DP:GQ:PL:SB 0/1:19,42,0:61:99:1264,0,448,1321,574,1895:3,16,6,36

Please see an IGV screenshot:

Tracks are (from top to bottom):
* the original bam file
* the bamout for all captured regions (known from the file -L PATH/merged.bed)
* the forced bamout (at the location of the variant i.e -L chr1:912029-912069)
* merged.bed is the file used with the -L option.

Finally, I tried to call variants changing the -ip option to 100 and got the alternate allele called.

Please note that:
If I manually add/subtract 50bp to the closest target region boundaries, I've got the same result as with -ip 50.
If I manually add/subtract 100bp to the closest target region boundaries, I've got the same result as with -ip 100.

I tried several versions of GATK (3.46, 3.7, and always got the same results.

I may have miss something but so far I can't explain myself what's happening. Do you see any explanation for what I observe? Do you see any options I should use to overcome this?
Many thanks in advance for your help.

NB: java -version
openjdk version "1.8.0_171"
OpenJDK Runtime Environment (build 1.8.0_171-b10)
OpenJDK 64-Bit Server VM (build 25.171-b10, mixed mode)

Post edited by slegras on


  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭
    edited May 2018

    This problem is issued in github as a bug and still not fixed. As a workaround @shlee proposed to run exomes without bed files in HC and filter variants based on bed later on. I am trying to find out my samples to represent this problem but it seems they have gone with the wind and they are somewhere in my piles of archived sequences. Once I find them I will test this proposed solution.

  • slegrasslegras Member

    Thanks for your answer! I'll do like this then.


Sign In or Register to comment.