GATK v4 Variant Recalibrator command line

rohitmanderohitmande San Diego, CAMember

Could someone please provide me with a sample command line to run Variant Recalibrator for GATK v4? I am running the tool using GATK 4 Alpha with the following command line:

~/gatk-protected/gatk-launch VariantRecalibrator -R ~/MiSeq/Bioinformatics/Archive/ReferenceFiles/hg19/seq/hg19.fa -input Stromal-combined-New.vcf --resource hapmap,known=false,training=true,truth=true,prior=15.0 ~/MiSeq/Bioinformatics/Archive/ReferenceFiles/GATK/hapmap_3.3.hg19.sites.vcf --resource omni,known=false,training=true,truth=true,prior=12.0 ~/MiSeq/Bioinformatics/Archive/ReferenceFiles/GATK/1000G_omni2.5.hg19.sites.vcf --resource 1000G,known=false,training=true,truth=false,prior=10.0 ~/MiSeq/Bioinformatics/Archive/ReferenceFiles/GATK/1000G_phase1.snps.high_confidence.hg19.sites.vcf --resource dbsnp,known=true,training=false,truth=false,prior=2.0 ~/MiSeq/Bioinformatics/Archive/ReferenceFiles/GATK/dbsnp_138.hg19.vcf -an DP -an QD -an FS -an SOR -an MQ -an MQRankSum -an ReadPosRankSum -an InbreedingCoeff -mode SNP -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 -tranchesFile Stromal-combined-New.tranches --rscriptFile Stromal-combined-New.R

and I get the following error
A USER ERROR has occurred: Invalid argument '/home/galaxy/MiSeq/Bioinformatics/Archive/ReferenceFiles/GATK/hapmap_3.3.hg19.sites.vcf'.

The command syntax follows the same pattern as this
https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_gatk_tools_walkers_variantrecalibration_VariantRecalibrator.php

My Java version is java version "1.8.0_131"

Has the syntax been changed for GATK version 4?

Thank you very much.

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @rohitmande
    Hi,

    I think you need to change -resource HapMap to -resource:HapMap, known... . Let us know if that works.

    -Sheila

    P.S. We are working on the GATK4 documentation, which should be out soon.

  • rohitmanderohitmande San Diego, CAMember

    @Sheila

    Just to clarify, should there be a space in between HapMap and known? And should there be a comma? Also, do there need to be any tags before I pass in the actual VCF?

    Thank you,

    Rohit

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @rohitmande
    Hi Rohit,

    Sorry I misled you in my first post. It seems you do not need the space between prior=15.0 and ~/MiSeq/Bioinformatics/Archive/ReferenceFiles/GATK/hapmap_3.3.hg19.sites.vcf . So, your command should work if you use prior=15.0,~/MiSeq/Bioinformatics/Archive/ReferenceFiles/GATK/hapmap_3.3.hg19.sites.vcf.

    -Sheila

  • oskarvoskarv BergenMember

    I can't get it to work despite trying your example @Sheila, here's my command line:

    python gatk-launch VariantRecalibrator -resource:mills,VCF,known=false,training=true,truth=true,prior=12.0,/data/GRCH38-ref/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz -O output.recal --tranches_file tranches.tranches --variant GVCF.g.vcf -an QD -an DP -an FS -an SOR -an ReadPosRankSum -an MQRankSum --mode INDEL --reference Homo_sapiens_assembly38.fasta
    

    And here's the error message:

    A USER ERROR has occurred: No value found for tagged argument: resource:mills,VCF,known=false,training=true,truth=true,prior=12.0,/data/GRCH38-ref/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
    

    If I remove the colon like so:

    -resource mills,VCF,known=false,training=true,truth=true,prior=12.0,/data/GRCH38-ref/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
    

    It produces this error message:

    A USER ERROR has occurred: Couldn't read file file:///data/workspace/wdl_pipeline/mills,VCF,known=false,training=true,truth=true,prior=12.0,/data/GRCH38-ref/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz. Error was: It doesn't exist.
    

    And if I keep the colon but remove the comma before the vcf file like so:

    -resource:mills,VCF,known=false,training=true,truth=true,prior=12.0 /data/GRCH38-ref/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
    

    It give this error message:

    A USER ERROR has occurred: Invalid argument '/data/GRCH38-ref/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz'.
    

    So I've tried just about everything that makes sense to me, what am I missing?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @oskarv
    Hi,

    Sorry for the delay. The GATK4 beta documentation is here. The example command should help.

    -Sheila

  • oskarvoskarv BergenMember

    @Sheila
    My preliminary test worked! Your link points to "4.beta.2", it didn't work, but changing it to "4.beta.3" made it work.
    Thanks for the help, I'm looking forward to the first stable release of GATK 4!

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @oskarv
    Hi,

    Glad to hear :smile:

    -Sheila

  • rourichrourich Member

    Hi,

    I am using VariantRecalibrator (my GATK version is 4.0.1.2) and I've got stuck with two errors with the command line concerning the files which I've specified in the "resource" option: "Invalid argument" and "The file xxx doesn't exist".

    I've found this link with some examples of the VariantRecalibrator usage but I think that the first example ("Recalibrating SNPs in exome data") is wrong because GATK throws the same errors (the specification of the resource files is the same as in the previous GATK versions).

    However, concerning the second example ("Allele-specific version of the SNP recalibration (beta)") I could run the tool. The difference is in the way you have to put the path to the resource file (you have to separate the options of the resource parameter and the path file by ":"). Perhaps there was and error in the documentation of the two examples in the previous link. That's why I've sent this message.

    Here is my command:

    gatk VariantRecalibrator -R ../../referencia/hg38.fa -V EX51-5717603_S1_raw_snps_indels.vcf --resource hapmap,known=false,training=true,truth=true,prior=15.0:../../ftp_gatk/hapmap_3.3.hg38.vcf --resource omni,known=false,training=true,truth=false,prior=12.0:../../ftp_gatk/1000G_omni2.5.hg38.vcf --resource 1000G,known=false,training=true,truth=false,prior=10.0:../../ftp_gatk/1000G_phase1.snps.high_confidence.hg38.vcf --resource dbsnp,known=true,training=false,truth=false,prior=2.0:../../ftp_gatk/dbsnp_146.hg38.vcf -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -an InbreedingCoeff --mode SNP --output output.recal --tranches-file output.tranches --rscript-file output.plots.R

    Best regards

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @rourich
    Hi,

    I hope some of the suggestions in this thread will help.

    However, concerning the second example ("Allele-specific version of the SNP recalibration (beta)") I could run the tool. The difference is in the way you have to put the path to the resource file (you have to separate the options of the resource parameter and the path file by ":").

    But, it looks like even when using the : you are still getting an error? Can you post the command that does work?

    Thanks,
    Sheila

Sign In or Register to comment.