Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

wonky vcf header :(

Hi,

I've a problem with a VCF when combiningGVCFs, gatk/v3.4. "headerKey END found in VariantContext field INFO at chr2L:64003 but this key isn't defined in the VCFHeader"

HaplotypeCaller
-nct 30 --analysis_type HaplotypeCaller --reference_sequence ../../reference_sequences/dmel/v6.0/dm6.fa \ --input_file ../../read_mapping/reference_line/RGnb/RGnb.bam \ --variant_index_type LINEAR --variant_index_parameter 128000 -mbq 10 -L chr2L \ --out ../../read_mapping/reference_line/RGnb/RGnb.g.vcf

VCF
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT RGnb
chr2L 43265 . A G 2147.77 . AC=2;AF=1.00;AN=2;DP=54;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=29.59;SOR=1.118 GT:AD:DP:GQ:PL 1/1:0,54:54:99:2176,162,0
chr2L 64003 . GT G 57.73 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.987;ClippingRankSum=-0.189;DP=45;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.85;MQRankSum=0.525;QD=1.28;ReadPosRankSum=1.149;SOR=0.622 GT:AD:DP:GQ:PL 0/1:25,8:33:95:95,0,521

Combine three gvcfs
--analysis_type CombineGVCFs --reference_sequence ../../reference_sequences/dmel/v6.0/dm6.fa \ --variant rg_GVCF.list --out ./RG.g.vcf

Error message:
java.lang.IllegalStateException: Key END found in VariantContext field INFO at chr2L:64003 but this key isn't defined in the VCFHeader. We require all VCFs to have complete VCF

Previous issues for "key isn't defined in the VCFHeader"
http://gatkforums.broadinstitute.org/discussion/3962/genotypeandvalidate-error-key-callstatus-found-in-variantcontext-field-info#latest
http://gatkforums.broadinstitute.org/discussion/2331/select-variants-says-vcf-header-is-missing
https://github.com/charite/jannovar/issues/70
https://groups.google.com/forum/#!topic/gawed-awesome-analyzers/RjXtIuYQ_V8

Could anyone advise me on how to fix this?

Sincerely,

Blue

Comments

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Blue
    Hi Blue,

    Is the VCF record you posted supposed to be part of one of the GVCFs you are trying to combine? The record you posted is not from a GVCF, it is from a VCF. CombineGVCFs only works on GVCFs.

    -Sheila

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Blue
    Hi again Blue,

    It seems like this may be a bug introduced in one of the newer versions. See this thread: http://gatkforums.broadinstitute.org/discussion/3962/genotypeandvalidate-error-key-callstatus-found-in-variantcontext-field-info#latest

    Can you submit a bug report? Instructions are here: http://gatkforums.broadinstitute.org/discussion/1894/how-do-i-submit-a-detailed-bug-report

    Thanks,
    Sheila

  • BlueBlue Member

    Hi again @Sheila

    I haven't submitted the bug report yet. I added --emitRefConfidence GVCF to fix the non-gvcf mistake.

    Now with Program Args:
    --analysis_type CombineGVCFs --reference_sequence ../../reference_sequences/dmel/v6.0/dm6.fa --variant rg_GVCF.list --out RG.g.vcf
    Returns:
    ERROR MESSAGE: Argument with name '--variant' (-V) is missing.

    Difficult to determine whether the cause is still the vcf header, or am I missing something obvious. I'll start to put-together a bug report.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Blue
    Hi Blue,

    Can you post the full stack trace you get?

    Thanks,
    Sheila

  • BlueBlue Member

    Hi @Sheila

    I assumed my initial problem was caused by inputting vcfs instead of gvcfs, I fixed that under your suggestion (with --emitRefConfidence GVCF). There was a problem in CombineGVCFs (recognising the vcf.list), and I had typos in my GenotypeGVCF command line. Thanks for your help. Apologies for the banal problem. Successful code is shown below.

    Sincerely,

    Blue

    Making bam list
    for i in ../../read_mapping/reference_line/*/RG??.bam
    do
    base_name=${i%.bam}
    Calling individual bams
    GenomeAnalysisTK -nct 30 -T HaplotypeCaller -R ../../reference_sequences/dmel/v6.0/dm6.fa -I $i --emitRefConfidence GVCF --variant_index_type LINEAR --variant_index_parameter 128000 -mbq 10 -L chr2L --out $base_name.g.vcf;
    done;

    Making gVCF list
    find ../../read_mapping/reference_line/RG??/ -type f \( -name "RG??.g.vcf" \) -exec ls "{}" \; > rg_GVCF.list
    Combining gVCFs
    GenomeAnalysisTK -T CombineGVCFs -R ../../reference_sequences/dmel/v6.0/dm6.fa -V rg_GVCF.list --out RG.g.vcf
    Joint genotyping
    GenomeAnalysisTK -T GenotypeGVCFs -R ../../reference_sequences/dmel/v6.0/dm6.fa -V RG.g.vcf --out RG_final_raw.vcf

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Blue
    Hi Blue,

    I am happy you solved your problem :smile:

    -Sheila

Sign In or Register to comment.