CombineGVCFs : key isn't defined in the VCFHeader

Hi there,

I am trying to combine gVCFs using CombineGVCFs and get the error: "Key END found in VariantContext field INFO at chr1:10439 but this key isn't defined in the VCFHeader".

The gVCFs are generated using HaplotypeCaller GATK4:
gatk --java-options "-Xmx50G" HaplotypeCaller -R Homo_sapiens_assembly38.fasta -I x.bam -O x.g.vcf.gz

In the gVCF headers are "contig=<ID=chr1,length=248956422>" and the position "chr1 10439 . AC A 359.73 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.836;ClippingRankSum=0.000;DP=36;ExcessHet=3.0103;FS=2.063;MLEAC=1;MLEAF=0.500;MQ=37.15;MQRankSum=0.425;QD=12.85;ReadPosRankSum=-0.401;SOR=1.445 GT:AD:DP:GQ:PL 0/1:7,21:28:99:397,0,129"

My question is: What Key is the error referring to (maybe: contig=<ID=chr1,length=248956422>) and how should it be defined in the header giving the above position?

Is there any obvious solution to this?

Thanks you!

Best Answer


  • estest Member

    So, I downloaded GATK 4 a week or so ago (v4.0.7.0) and I'm getting loci that complain about this error scattered throughout the thousands of gvcfs that I generated using haplotype caller when I try to merge them.

    The Caller is:
    java -Xmx6g -jar /data1/bin/gatk4.jar HaplotypeCaller \ --output-mode EMIT_ALL_CONFIDENT_SITES \ --ERC GVCF \ -R /data1/public/ref/hg38/gatk_bundle/Homo_sapiens_assembly38.fasta \ --genotyping-mode DISCOVERY \ -A BaseQuality \ -A MappingQuality \ -G StandardAnnotation \ --min-base-quality-score 20 \ --dbsnp /data1/public/ref/hg38/gatk_bundle/dbsnp_138.hg38.vcf.gz \ -I #{input_bam} \ -O #{name}.g.vcf

    And the combine call (whittled down to the smallest that exhibits the error)
    java -Xmx6g -jar /data1/bin/gatk4.jar CombineGVCFs -R /data1/public/ref/hg38/gatk_bundle/Homo_sapiens_assembly38.fasta -L chr1 -L chr2 -L chr3 -L chr4 -L chr5 -L chr6 -L ch
    r7 -L chr8 -L chr9 -L chr10 -L chr11 -L chr12 -L chr13 -L chr14 -L chr15 -L chr16 -L chr17 -L chr18 -L chr19 -L chr20 -L chr21 -L chr22 -L chrY -L chrX \
    -V /data1/stuff/intermediate/gvcf/drv-41826.g.vcf.gz \
    -V /data1/stuff/intermediate/gvcf/drv-250.g.vcf.gz -O drv-0.g.vcf

    Final error from GATK
    java.lang.IllegalStateException: Key END found in VariantContext field INFO at chr1:15811 but this key isn't defined in the VCFHeader. We require all VCFs to have complete VCF headers by default.

    The guilty line is:
    chr1 15811 . TCTG . 103.27 . AN=2;DP=26;MQ=27.29 GT:AD:DP:MBQ:MMQ 0/0:26:26:41

    chr1 15811 . TCTG . 208.27 . AN=2;DP=60;MQ=36.41 GT:AD:DP:MBQ:MMQ 0/0:60:60:41

    When I grepped that particular position, I see a handful of similar looking loci.

    I don't have END defined in the VCF header, but there are thousands of samples, so manually editing these files isn't terribly appealing if I can avoid it (I haven't done a test to verify that would actually fix the problem).

  • SheilaSheila Broad InstituteMember, Broadie, Moderator


    Hmm. Can you test the very latest version on a small snippet (from HaplotypeCaller to GenotypeGVCFs)?


  • estest Member

    I did download ealier this week (I didn't realize I've been working on this for a month or so). My tests are still ongoing, but so far, if I use the gatk wrapper script and not calling it directly using java -jar, it seems to be working (increasing the number of samples as they complete on a single machine (had too many resubmissions to want to submit everything to our cluster if it was going to have to be redone yet again).

    Is it possible that some of those additional flags that the wrapper script pass to the java call are protecting it from whatever is causing the problem? Just replacing the newer GATK jar in place of the old and running the scripts as they were (java -jar ....) had the same problem as with the version I was using last week.

Sign In or Register to comment.