VCF header with AD format 'Number=R' causes error in VQSR VariantRecalibrator and ApplyRecalibration

gtiaogtiao Cambridge, MAMember

I have a VCF header with the following number annotation for the AD field:

##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">

Where R means the field has one value for each possible allele (see: http://samtools.github.io/hts-specs/VCFv4.2.pdf).

However, when I try to run VariantRecalibrator and ApplyRecalibration on this VCF, I get the error:

##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 3.3-0-g37228af): 
##### ERROR
##### ERROR This means that one or more arguments or inputs in your command are incorrect.
##### ERROR The error message below tells you what is the problem.
##### ERROR
##### ERROR If the problem is an invalid argument, please check the online documentation guide
##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
##### ERROR
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
##### ERROR
##### ERROR MESSAGE: Unable to parse header with error: For input string: "R", for input source: /path/to/myfile.vcf.gz
##### ERROR ------------------------------------------------------------------------------------------

When I reheader the VCF to fix the Number tag to Number=., the GATK modules work fine. Would it be possible to get this bug fixed? I'm working with a lot of large VCFs with this kind of header, and it's quite a headache to have to reheader these VCFs, when the header is actually a properly formatted VCF header.

Thanks!

Grace

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @gtiao
    Hi Grace,

    Sorry for the delay. We have been away at a workshop. I am pretty sure this issue is fixed in the latest version. If it is not, let me know.

    Thanks,
    Sheila

  • sheilaztsheilazt ValenciaMember
    edited January 2017

    Dear Sheila,
    I'm currently running the latest version (3.7-0-gcfedb67) and, at least with the HaplotypeCaller tool, the GVCF files still contain Number=R in the header for AD.

    ##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">

    Best regards,

    Sheila

    Issue · Github
    by Sheila

    Issue Number
    1702
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    vdauwera
  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @sheilazt
    Hi Sheila,

    Yes, I see that. Let me check with the team and get back to you.

    -Sheila

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @sheilazt
    Hi again Sheila,

    It looks like this is actually a VQSR issue. The VCF header is correct. I will put in a ticket.

    Thanks,
    Sheila

    Issue · Github
    by Sheila

    Issue Number
    1714
    State
    open
    Last Updated
    Assignee
    Array
    Milestone
    Array
  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @gtiao @sheilazt
    Hi Grace and Sheila,

    Our developers have not been able to reproduce this issue. Would you be able to share a test case with us that reproduces the error? Instructions are here.

    Thanks,
    Sheila

Sign In or Register to comment.