To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

GATK v4 Variant Recalibrator command line

rohitmanderohitmande San Diego, CAMember

Could someone please provide me with a sample command line to run Variant Recalibrator for GATK v4? I am running the tool using GATK 4 Alpha with the following command line:

~/gatk-protected/gatk-launch VariantRecalibrator -R ~/MiSeq/Bioinformatics/Archive/ReferenceFiles/hg19/seq/hg19.fa -input Stromal-combined-New.vcf --resource hapmap,known=false,training=true,truth=true,prior=15.0 ~/MiSeq/Bioinformatics/Archive/ReferenceFiles/GATK/hapmap_3.3.hg19.sites.vcf --resource omni,known=false,training=true,truth=true,prior=12.0 ~/MiSeq/Bioinformatics/Archive/ReferenceFiles/GATK/1000G_omni2.5.hg19.sites.vcf --resource 1000G,known=false,training=true,truth=false,prior=10.0 ~/MiSeq/Bioinformatics/Archive/ReferenceFiles/GATK/1000G_phase1.snps.high_confidence.hg19.sites.vcf --resource dbsnp,known=true,training=false,truth=false,prior=2.0 ~/MiSeq/Bioinformatics/Archive/ReferenceFiles/GATK/dbsnp_138.hg19.vcf -an DP -an QD -an FS -an SOR -an MQ -an MQRankSum -an ReadPosRankSum -an InbreedingCoeff -mode SNP -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 -tranchesFile Stromal-combined-New.tranches --rscriptFile Stromal-combined-New.R

and I get the following error
A USER ERROR has occurred: Invalid argument '/home/galaxy/MiSeq/Bioinformatics/Archive/ReferenceFiles/GATK/hapmap_3.3.hg19.sites.vcf'.

The command syntax follows the same pattern as this
https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_gatk_tools_walkers_variantrecalibration_VariantRecalibrator.php

My Java version is java version "1.8.0_131"

Has the syntax been changed for GATK version 4?

Thank you very much.

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @rohitmande
    Hi,

    I think you need to change -resource HapMap to -resource:HapMap, known... . Let us know if that works.

    -Sheila

    P.S. We are working on the GATK4 documentation, which should be out soon.

  • rohitmanderohitmande San Diego, CAMember

    @Sheila

    Just to clarify, should there be a space in between HapMap and known? And should there be a comma? Also, do there need to be any tags before I pass in the actual VCF?

    Thank you,

    Rohit

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @rohitmande
    Hi Rohit,

    Sorry I misled you in my first post. It seems you do not need the space between prior=15.0 and ~/MiSeq/Bioinformatics/Archive/ReferenceFiles/GATK/hapmap_3.3.hg19.sites.vcf . So, your command should work if you use prior=15.0,~/MiSeq/Bioinformatics/Archive/ReferenceFiles/GATK/hapmap_3.3.hg19.sites.vcf.

    -Sheila

  • oskarvoskarv BergenMember

    I can't get it to work despite trying your example @Sheila, here's my command line:

    python gatk-launch VariantRecalibrator -resource:mills,VCF,known=false,training=true,truth=true,prior=12.0,/data/GRCH38-ref/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz -O output.recal --tranches_file tranches.tranches --variant GVCF.g.vcf -an QD -an DP -an FS -an SOR -an ReadPosRankSum -an MQRankSum --mode INDEL --reference Homo_sapiens_assembly38.fasta
    

    And here's the error message:

    A USER ERROR has occurred: No value found for tagged argument: resource:mills,VCF,known=false,training=true,truth=true,prior=12.0,/data/GRCH38-ref/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
    

    If I remove the colon like so:

    -resource mills,VCF,known=false,training=true,truth=true,prior=12.0,/data/GRCH38-ref/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
    

    It produces this error message:

    A USER ERROR has occurred: Couldn't read file file:///data/workspace/wdl_pipeline/mills,VCF,known=false,training=true,truth=true,prior=12.0,/data/GRCH38-ref/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz. Error was: It doesn't exist.
    

    And if I keep the colon but remove the comma before the vcf file like so:

    -resource:mills,VCF,known=false,training=true,truth=true,prior=12.0 /data/GRCH38-ref/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
    

    It give this error message:

    A USER ERROR has occurred: Invalid argument '/data/GRCH38-ref/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz'.
    

    So I've tried just about everything that makes sense to me, what am I missing?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @oskarv
    Hi,

    Sorry for the delay. The GATK4 beta documentation is here. The example command should help.

    -Sheila

  • oskarvoskarv BergenMember

    @Sheila
    My preliminary test worked! Your link points to "4.beta.2", it didn't work, but changing it to "4.beta.3" made it work.
    Thanks for the help, I'm looking forward to the first stable release of GATK 4!

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @oskarv
    Hi,

    Glad to hear :smile:

    -Sheila

Sign In or Register to comment.