Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

CombineVariant Codec not working with VCFv4.1?

aminziaaminzia aminziaMember
edited June 2015 in Ask the GATK team

Hello there,

I am using CombineVariant to combine variants called by HaplotypCaller on seepage chromosomes. They come directory from the HC without any change. But my CombineVariant seems to not like the header. Please see the following errors I get which is a bit unexpected given that everything used to be working fine before GATK-v3.4 (except for now the merging options are modified and no longer working like before v3.3).

Thank you
Amin Zia

java -Xmx8g -Xms8g -jar ~/gatk-3.4.0/GenomeAnalysisTK.jar -R ucsc.hg19.fasta -T CombineVariants --variant:VCF1 chrM.gatk.vcf --variant:VCF2 chr1.gatk.vcf --variant:VCF3 chr2.gatk.vcf --assumeIdenticalSamples -genotypeMergeOptions PRIORITIZE -priority VCF1,VCF2,VCF3 -o genome.gatk.vcf

INFO 17:25:36,611 HelpFormatter - --------------------------------------------------------------------------------
INFO 17:25:36,613 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.4-0-g7e26428, Compiled 2015/05/15 03:25:41
INFO 17:25:36,613 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 17:25:36,613 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 17:25:36,619 HelpFormatter - Program Args: -R /srv/gs1/projects/scg/Resources/GATK/hg19-3.0//ucsc.hg19.fasta -T CombineVariants --variant:VCF1 chrM.gatk.vcf --variant:VCF2 chr1.gatk.vcf --variant:VCF3 chr2.gatk.vcf --assumeIdenticalSamples -genotypeMergeOptions PRIORITIZE -priority VCF1,VCF2,VCF3 -o genome.gatk.vcf
INFO 17:25:36,622 HelpFormatter - Executing as [email protected] on Linux 2.6.32-504.16.2.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_03-b04.
INFO 17:25:36,622 HelpFormatter - Date/Time: 2015/06/29 17:25:36
INFO 17:25:36,622 HelpFormatter - --------------------------------------------------------------------------------
INFO 17:25:36,622 HelpFormatter - --------------------------------------------------------------------------------
INFO 17:25:37,369 GenomeAnalysisEngine - Strictness is SILENT
INFO 17:25:37,518 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO 17:25:38,886 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 3.4-0-g7e26428):
ERROR
ERROR This means that one or more arguments or inputs in your command are incorrect.
ERROR The error message below tells you what is the problem.
ERROR
ERROR If the problem is an invalid argument, please check the online documentation guide
ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
ERROR
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
ERROR
ERROR MESSAGE: Unable to parse header with error: Your input file has a malformed header: This codec is strictly for VCFv3 and does not support VCFv4.1, for input source: chr2.gatk.vcf
ERROR ------------------------------------------------------------------------------------------

Best Answers

Answers

  • aminziaaminzia aminziaMember

    Thank you for your answer. It actually worked. And I'm still surprised how GATK assumes that label as name for a codec.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    edited July 2015

    @aminzia This is because the GATK engine will parse tags according to a position-sensitive logic:

    --variant:<FORMAT>,<NAME> file.vcf
    

    So if you only provide one tag, it will assume it is the format codec. It's not possible to specify a name tag without also specifying a format tag.

  • thibaultthibault Broad InstituteMember, Broadie, Dev admin

    Because the standard VCF 4 codec is named VCF, an alternate solution would be to specify the arguments like this:

    --variant:VCF,VCF1 chrM.gatk.vcf --variant:VCF,VCF2 chr1.gatk.vcf --variant:VCF,VCF3 chr2.gatk.vcf
    
  • aminziaaminzia aminziaMember

    Thank you all for your answers. I think this was not clear from the API pages specially because it's explicitly mentioned "-V:name,vcf" in CombineVariant page which seems to be the other way around and without any mention of how this information is decoded.

    But your explanations clear this. Thank you.

    Regards
    Amin

    Issue · Github
    by Geraldine_VdAuwera

    Issue Number
    1042
    State
    open
    Last Updated
Sign In or Register to comment.