To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

VCF header with AD format 'Number=R' causes error in VQSR VariantRecalibrator and ApplyRecalibration

gtiaogtiao Cambridge, MAMember

I have a VCF header with the following number annotation for the AD field:

##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">

Where R means the field has one value for each possible allele (see: http://samtools.github.io/hts-specs/VCFv4.2.pdf).

However, when I try to run VariantRecalibrator and ApplyRecalibration on this VCF, I get the error:

##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 3.3-0-g37228af): 
##### ERROR
##### ERROR This means that one or more arguments or inputs in your command are incorrect.
##### ERROR The error message below tells you what is the problem.
##### ERROR
##### ERROR If the problem is an invalid argument, please check the online documentation guide
##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
##### ERROR
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
##### ERROR
##### ERROR MESSAGE: Unable to parse header with error: For input string: "R", for input source: /path/to/myfile.vcf.gz
##### ERROR ------------------------------------------------------------------------------------------

When I reheader the VCF to fix the Number tag to Number=., the GATK modules work fine. Would it be possible to get this bug fixed? I'm working with a lot of large VCFs with this kind of header, and it's quite a headache to have to reheader these VCFs, when the header is actually a properly formatted VCF header.

Thanks!

Grace

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @gtiao
    Hi Grace,

    Sorry for the delay. We have been away at a workshop. I am pretty sure this issue is fixed in the latest version. If it is not, let me know.

    Thanks,
    Sheila

  • sheilaztsheilazt ValenciaMember
    edited January 2017

    Dear Sheila,
    I'm currently running the latest version (3.7-0-gcfedb67) and, at least with the HaplotypeCaller tool, the GVCF files still contain Number=R in the header for AD.

    ##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">

    Best regards,

    Sheila

    Issue · Github
    by Sheila

    Issue Number
    1702
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    vdauwera
  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @sheilazt
    Hi Sheila,

    Yes, I see that. Let me check with the team and get back to you.

    -Sheila

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @sheilazt
    Hi again Sheila,

    It looks like this is actually a VQSR issue. The VCF header is correct. I will put in a ticket.

    Thanks,
    Sheila

    Issue · Github
    by Sheila

    Issue Number
    1714
    State
    open
    Last Updated
    Assignee
    Array
    Milestone
    Array
  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @gtiao @sheilazt
    Hi Grace and Sheila,

    Our developers have not been able to reproduce this issue. Would you be able to share a test case with us that reproduces the error? Instructions are here.

    Thanks,
    Sheila

Sign In or Register to comment.