wonky vcf header :(

Hi,

I've a problem with a VCF when combiningGVCFs, gatk/v3.4. "headerKey END found in VariantContext field INFO at chr2L:64003 but this key isn't defined in the VCFHeader"

HaplotypeCaller
-nct 30 --analysis_type HaplotypeCaller --reference_sequence ../../reference_sequences/dmel/v6.0/dm6.fa \ --input_file ../../read_mapping/reference_line/RGnb/RGnb.bam \ --variant_index_type LINEAR --variant_index_parameter 128000 -mbq 10 -L chr2L \ --out ../../read_mapping/reference_line/RGnb/RGnb.g.vcf

VCF
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT RGnb
chr2L 43265 . A G 2147.77 . AC=2;AF=1.00;AN=2;DP=54;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=29.59;SOR=1.118 GT:AD:DP:GQ:PL 1/1:0,54:54:99:2176,162,0
chr2L 64003 . GT G 57.73 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.987;ClippingRankSum=-0.189;DP=45;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.85;MQRankSum=0.525;QD=1.28;ReadPosRankSum=1.149;SOR=0.622 GT:AD:DP:GQ:PL 0/1:25,8:33:95:95,0,521

Combine three gvcfs
--analysis_type CombineGVCFs --reference_sequence ../../reference_sequences/dmel/v6.0/dm6.fa \ --variant rg_GVCF.list --out ./RG.g.vcf

Error message:
java.lang.IllegalStateException: Key END found in VariantContext field INFO at chr2L:64003 but this key isn't defined in the VCFHeader. We require all VCFs to have complete VCF

Previous issues for "key isn't defined in the VCFHeader"
http://gatkforums.broadinstitute.org/discussion/3962/genotypeandvalidate-error-key-callstatus-found-in-variantcontext-field-info#latest
http://gatkforums.broadinstitute.org/discussion/2331/select-variants-says-vcf-header-is-missing
https://github.com/charite/jannovar/issues/70
https://groups.google.com/forum/#!topic/gawed-awesome-analyzers/RjXtIuYQ_V8

Could anyone advise me on how to fix this?

Sincerely,

Blue

Comments

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Blue
    Hi Blue,

    Is the VCF record you posted supposed to be part of one of the GVCFs you are trying to combine? The record you posted is not from a GVCF, it is from a VCF. CombineGVCFs only works on GVCFs.

    -Sheila

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Blue
    Hi again Blue,

    It seems like this may be a bug introduced in one of the newer versions. See this thread: http://gatkforums.broadinstitute.org/discussion/3962/genotypeandvalidate-error-key-callstatus-found-in-variantcontext-field-info#latest

    Can you submit a bug report? Instructions are here: http://gatkforums.broadinstitute.org/discussion/1894/how-do-i-submit-a-detailed-bug-report

    Thanks,
    Sheila

  • BlueBlue Member

    Hi again @Sheila

    I haven't submitted the bug report yet. I added --emitRefConfidence GVCF to fix the non-gvcf mistake.

    Now with Program Args:
    --analysis_type CombineGVCFs --reference_sequence ../../reference_sequences/dmel/v6.0/dm6.fa --variant rg_GVCF.list --out RG.g.vcf
    Returns:
    ERROR MESSAGE: Argument with name '--variant' (-V) is missing.

    Difficult to determine whether the cause is still the vcf header, or am I missing something obvious. I'll start to put-together a bug report.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Blue
    Hi Blue,

    Can you post the full stack trace you get?

    Thanks,
    Sheila

  • BlueBlue Member

    Hi @Sheila

    I assumed my initial problem was caused by inputting vcfs instead of gvcfs, I fixed that under your suggestion (with --emitRefConfidence GVCF). There was a problem in CombineGVCFs (recognising the vcf.list), and I had typos in my GenotypeGVCF command line. Thanks for your help. Apologies for the banal problem. Successful code is shown below.

    Sincerely,

    Blue

    Making bam list
    for i in ../../read_mapping/reference_line/*/RG??.bam
    do
    base_name=${i%.bam}
    Calling individual bams
    GenomeAnalysisTK -nct 30 -T HaplotypeCaller -R ../../reference_sequences/dmel/v6.0/dm6.fa -I $i --emitRefConfidence GVCF --variant_index_type LINEAR --variant_index_parameter 128000 -mbq 10 -L chr2L --out $base_name.g.vcf;
    done;

    Making gVCF list
    find ../../read_mapping/reference_line/RG??/ -type f \( -name "RG??.g.vcf" \) -exec ls "{}" \; > rg_GVCF.list
    Combining gVCFs
    GenomeAnalysisTK -T CombineGVCFs -R ../../reference_sequences/dmel/v6.0/dm6.fa -V rg_GVCF.list --out RG.g.vcf
    Joint genotyping
    GenomeAnalysisTK -T GenotypeGVCFs -R ../../reference_sequences/dmel/v6.0/dm6.fa -V RG.g.vcf --out RG_final_raw.vcf

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Blue
    Hi Blue,

    I am happy you solved your problem :smile:

    -Sheila

Sign In or Register to comment.