We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

problems emitting all sites with HaplotypeCaller


I'm running GATK v4.0.5.2 and Java v1.8.0_20

I'd like to get a VCF with all callable positions, including invariants. With GATK 3, using UnifiedGenotyper, this process works great. I'm trying to run with GATK 4 and I'm only getting the variant positions. Here is my command:

gatk HaplotypeCaller -R reference.fasta -I ECOLI_renamed_header.bam -O test.vcf --output-mode EMIT_ALL_SITES

The VCF only contains one position:



ADK1 460 . A G 81.28 . AC=2;AF=1.00;AN=2;DP=3;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=27.09;SOR=2.833 GT:AD:DP:GQ:PL 1/1:0,3:3:9:109,9,0

When I run a comparable command with GATK 3, I get 536 positions reported, with only one of them being a variant.

Am I running this correctly?


  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @jasonsahl,

    You are saying you are able to run GATK3 HaplotypeCaller to give calls for every site but the same command does not work in GATK4.

    Can you try adding --all-site-pls to your command and see if this works? Alternatively, try -ERC BP_RESOLUTION.

  • jasonsahljasonsahl Member

    Thank you for your response. This worked: "-ERC BP_RESOLUTION". It works without setting an output mode, so it is unclear if that command is now obsolete. One question about this output and I can open a new question if needed, is why I get the warning in a situation like this:

    ADK1 1 . G . . . GT:AD:DP:GQ:PL 0:1,0:1:40:0,40

    I'm now running the command with -ploidy 1 for Bacteria, so at a coverage of 1, how could the call not be homozygous reference?


  • jasonsahljasonsahl Member

    Sorry, not sure this formatted correctly. Here is the output from GATK 4.

    'ADK1 1 . G NON_REF . . . GT:AD:DP:GQ:PL 0:1,0:1:40:0,40'

    Why would this be NON_REF at a coverage of 1?


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I think you misunderstand the formatting of the GVCF record, @jasonsahl. That call is a reference call (hence the genotype assignment of 0) but since you're emitting a GVCF (not a regular VCF) you get a NON_REF symbolic allele included for every record, even the reference calls. We have documentation on this on the website, have a look at the docs + the videos on the presentations page.

  • jasonsahljasonsahl Member

    Ok, I tried to read through the documentation prior to submitting this question, but was confused. I'd rather not output GVCF, but looks like I must as emit_all_sites is not working as expected. Is there a way to run GentotypeGVCFs on my GVCF and include the invariant sites? I see that as an option in GATK 3, but looks like it's missing in GATK 4 and I don't see a comparable option.

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭


    Have a look at this thread.


Sign In or Register to comment.