problems emitting all sites with HaplotypeCaller

Hi,

I'm running GATK v4.0.5.2 and Java v1.8.0_20

I'd like to get a VCF with all callable positions, including invariants. With GATK 3, using UnifiedGenotyper, this process works great. I'm trying to run with GATK 4 and I'm only getting the variant positions. Here is my command:

gatk HaplotypeCaller -R reference.fasta -I ECOLI_renamed_header.bam -O test.vcf --output-mode EMIT_ALL_SITES

The VCF only contains one position:

source=HaplotypeCaller

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT test

ADK1 460 . A G 81.28 . AC=2;AF=1.00;AN=2;DP=3;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=27.09;SOR=2.833 GT:AD:DP:GQ:PL 1/1:0,3:3:9:109,9,0

When I run a comparable command with GATK 3, I get 536 positions reported, with only one of them being a variant.

Am I running this correctly?
thanks,
Jason

Answers

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @jasonsahl,

    You are saying you are able to run GATK3 HaplotypeCaller to give calls for every site but the same command does not work in GATK4.

    Can you try adding --all-site-pls to your command and see if this works? Alternatively, try -ERC BP_RESOLUTION.

  • jasonsahljasonsahl Member

    Thank you for your response. This worked: "-ERC BP_RESOLUTION". It works without setting an output mode, so it is unclear if that command is now obsolete. One question about this output and I can open a new question if needed, is why I get the warning in a situation like this:

    ADK1 1 . G . . . GT:AD:DP:GQ:PL 0:1,0:1:40:0,40

    I'm now running the command with -ploidy 1 for Bacteria, so at a coverage of 1, how could the call not be homozygous reference?

    thanks,
    Jason

  • jasonsahljasonsahl Member

    Sorry, not sure this formatted correctly. Here is the output from GATK 4.

    'ADK1 1 . G NON_REF . . . GT:AD:DP:GQ:PL 0:1,0:1:40:0,40'

    Why would this be NON_REF at a coverage of 1?

    thanks,

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I think you misunderstand the formatting of the GVCF record, @jasonsahl. That call is a reference call (hence the genotype assignment of 0) but since you're emitting a GVCF (not a regular VCF) you get a NON_REF symbolic allele included for every record, even the reference calls. We have documentation on this on the website, have a look at the docs + the videos on the presentations page.

  • jasonsahljasonsahl Member

    Ok, I tried to read through the documentation prior to submitting this question, but was confused. I'd rather not output GVCF, but looks like I must as emit_all_sites is not working as expected. Is there a way to run GentotypeGVCFs on my GVCF and include the invariant sites? I see that as an option in GATK 3, but looks like it's missing in GATK 4 and I don't see a comparable option.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @jasonsahl
    Hi,

    Have a look at this thread.

    -Sheila

Sign In or Register to comment.