Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office on November 11th and 13th 2019, due to the U.S. holiday(Veteran's day) and due to a team event(Nov 13th). We will return to monitoring the GATK forum on November 12th and 14th respectively. Thank you for your patience.

GATK v4 Variant Recalibrator command line

rohitmanderohitmande San Diego, CAMember

Could someone please provide me with a sample command line to run Variant Recalibrator for GATK v4? I am running the tool using GATK 4 Alpha with the following command line:

~/gatk-protected/gatk-launch VariantRecalibrator -R ~/MiSeq/Bioinformatics/Archive/ReferenceFiles/hg19/seq/hg19.fa -input Stromal-combined-New.vcf --resource hapmap,known=false,training=true,truth=true,prior=15.0 ~/MiSeq/Bioinformatics/Archive/ReferenceFiles/GATK/hapmap_3.3.hg19.sites.vcf --resource omni,known=false,training=true,truth=true,prior=12.0 ~/MiSeq/Bioinformatics/Archive/ReferenceFiles/GATK/1000G_omni2.5.hg19.sites.vcf --resource 1000G,known=false,training=true,truth=false,prior=10.0 ~/MiSeq/Bioinformatics/Archive/ReferenceFiles/GATK/1000G_phase1.snps.high_confidence.hg19.sites.vcf --resource dbsnp,known=true,training=false,truth=false,prior=2.0 ~/MiSeq/Bioinformatics/Archive/ReferenceFiles/GATK/dbsnp_138.hg19.vcf -an DP -an QD -an FS -an SOR -an MQ -an MQRankSum -an ReadPosRankSum -an InbreedingCoeff -mode SNP -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 -tranchesFile Stromal-combined-New.tranches --rscriptFile Stromal-combined-New.R

and I get the following error
A USER ERROR has occurred: Invalid argument '/home/galaxy/MiSeq/Bioinformatics/Archive/ReferenceFiles/GATK/hapmap_3.3.hg19.sites.vcf'.

The command syntax follows the same pattern as this
https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_gatk_tools_walkers_variantrecalibration_VariantRecalibrator.php

My Java version is java version "1.8.0_131"

Has the syntax been changed for GATK version 4?

Thank you very much.

Answers

  • SheilaSheila Broad InstituteMember, Broadie admin

    @rohitmande
    Hi,

    I think you need to change -resource HapMap to -resource:HapMap, known... . Let us know if that works.

    -Sheila

    P.S. We are working on the GATK4 documentation, which should be out soon.

  • rohitmanderohitmande San Diego, CAMember

    @Sheila

    Just to clarify, should there be a space in between HapMap and known? And should there be a comma? Also, do there need to be any tags before I pass in the actual VCF?

    Thank you,

    Rohit

  • SheilaSheila Broad InstituteMember, Broadie admin

    @rohitmande
    Hi Rohit,

    Sorry I misled you in my first post. It seems you do not need the space between prior=15.0 and ~/MiSeq/Bioinformatics/Archive/ReferenceFiles/GATK/hapmap_3.3.hg19.sites.vcf. So, your command should work if you use prior=15.0,~/MiSeq/Bioinformatics/Archive/ReferenceFiles/GATK/hapmap_3.3.hg19.sites.vcf.

    -Sheila

  • oskarvoskarv BergenMember

    I can't get it to work despite trying your example @Sheila, here's my command line:

    python gatk-launch VariantRecalibrator -resource:mills,VCF,known=false,training=true,truth=true,prior=12.0,/data/GRCH38-ref/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz -O output.recal --tranches_file tranches.tranches --variant GVCF.g.vcf -an QD -an DP -an FS -an SOR -an ReadPosRankSum -an MQRankSum --mode INDEL --reference Homo_sapiens_assembly38.fasta
    

    And here's the error message:

    A USER ERROR has occurred: No value found for tagged argument: resource:mills,VCF,known=false,training=true,truth=true,prior=12.0,/data/GRCH38-ref/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
    

    If I remove the colon like so:

    -resource mills,VCF,known=false,training=true,truth=true,prior=12.0,/data/GRCH38-ref/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
    

    It produces this error message:

    A USER ERROR has occurred: Couldn't read file file:///data/workspace/wdl_pipeline/mills,VCF,known=false,training=true,truth=true,prior=12.0,/data/GRCH38-ref/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz. Error was: It doesn't exist.
    

    And if I keep the colon but remove the comma before the vcf file like so:

    -resource:mills,VCF,known=false,training=true,truth=true,prior=12.0 /data/GRCH38-ref/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
    

    It give this error message:

    A USER ERROR has occurred: Invalid argument '/data/GRCH38-ref/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz'.
    

    So I've tried just about everything that makes sense to me, what am I missing?

  • SheilaSheila Broad InstituteMember, Broadie admin

    @oskarv
    Hi,

    Sorry for the delay. The GATK4 beta documentation is here. The example command should help.

    -Sheila

  • oskarvoskarv BergenMember

    @Sheila
    My preliminary test worked! Your link points to "4.beta.2", it didn't work, but changing it to "4.beta.3" made it work.
    Thanks for the help, I'm looking forward to the first stable release of GATK 4!

  • SheilaSheila Broad InstituteMember, Broadie admin

    @oskarv
    Hi,

    Glad to hear :smile:

    -Sheila

  • rourichrourich Member

    Hi,

    I am using VariantRecalibrator (my GATK version is 4.0.1.2) and I've got stuck with two errors with the command line concerning the files which I've specified in the "resource" option: "Invalid argument" and "The file xxx doesn't exist".

    I've found this link with some examples of the VariantRecalibrator usage but I think that the first example ("Recalibrating SNPs in exome data") is wrong because GATK throws the same errors (the specification of the resource files is the same as in the previous GATK versions).

    However, concerning the second example ("Allele-specific version of the SNP recalibration (beta)") I could run the tool. The difference is in the way you have to put the path to the resource file (you have to separate the options of the resource parameter and the path file by ":"). Perhaps there was and error in the documentation of the two examples in the previous link. That's why I've sent this message.

    Here is my command:

    gatk VariantRecalibrator -R ../../referencia/hg38.fa -V EX51-5717603_S1_raw_snps_indels.vcf --resource hapmap,known=false,training=true,truth=true,prior=15.0:../../ftp_gatk/hapmap_3.3.hg38.vcf --resource omni,known=false,training=true,truth=false,prior=12.0:../../ftp_gatk/1000G_omni2.5.hg38.vcf --resource 1000G,known=false,training=true,truth=false,prior=10.0:../../ftp_gatk/1000G_phase1.snps.high_confidence.hg38.vcf --resource dbsnp,known=true,training=false,truth=false,prior=2.0:../../ftp_gatk/dbsnp_146.hg38.vcf -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -an InbreedingCoeff --mode SNP --output output.recal --tranches-file output.tranches --rscript-file output.plots.R

    Best regards

  • SheilaSheila Broad InstituteMember, Broadie admin

    @rourich
    Hi,

    I hope some of the suggestions in this thread will help.

    However, concerning the second example ("Allele-specific version of the SNP recalibration (beta)") I could run the tool. The difference is in the way you have to put the path to the resource file (you have to separate the options of the resource parameter and the path file by ":").

    But, it looks like even when using the : you are still getting an error? Can you post the command that does work?

    Thanks,
    Sheila

  • hkewardhkeward Member

    To future visitors to this thread, the format that it wants appears to be:

    --resource:hapmap,known=false,training=true,truth=true,prior=15 /path/to/hapmap.sites.vcf
    

    The beta docs specify two different formats, neither of which worked for me, but this does! Trial and error until it accepted it!

  • KatieKatie United StatesMember ✭✭

    Thank you for the above!

  • olavurolavur Member

    As already mentioned, the correct syntax is:

    -resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.hg38.sites.vcf.gz
    

    While the documentation says:

    --resource hapmap,known=false,training=true,truth=true,prior=15.0:hapmap_3.3.hg38.sites.vcf.gz
    

    Which does not work.

  • bshifawbshifaw Member, Broadie, Moderator admin

    correct olavur,
    The release notes for gatk4.1 mentions the change in syntax for entering in --resources parameter. We are aware of the error in the tool documentation and do have an issue ticket to fix this. Thanks.

  • wangchengshiwangchengshi Member
    /trainee/wes20190401/bin/gatk-4.1.0.0/gatk-4.1.0.0/gatk VariantRecalibrator \
    -R /trainee/ref/Homo_sapiens_assembly38.fasta \
    -V /trainee/ckzhu/wes/L01501/V100004251L01501/gatk/V100004251L01501.HC.vcf.gz \
    --resource hapmap,known=false,training=true,truth=true,prior=15.0:/trainee/ref/hapmap_3.3.hg38.vcf \
    --resource omni,known=false,training=true,truth=false,prior=12.0:/trainee/ref/1000G_omni2.5.hg38.vcf \
    --resource 1000G,known=false,training=true,truth=false,prior=10.0:/trainee/ref/1000G_phase1.snps.high_confidence.hg38.vcf \
    --resource dbsnp,known=true,training=false,truth=false,prior=2.0:/trainee/ref/dbsnp_146.hg38.vcf \
    -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR \
    -mode SNP \
    -O /trainee/ckzhu/wes/L01501/gatk/V100004251L01501.snp.recal \
    --tranches-file /trainee/ckzhu/wes/L01501/gatk/V100004251L01501.snp.tranches \
    --rscript-file /trainee/ckzhu/wes/L01501/gatk/V100004251L01501.snp.plots.R

    A USER ERROR has occurred: Couldn't read file file:///trainee/ref/hapmap,known=false,training=true,truth=true,prior=15.0:/trainee/ref/hapmap_3.3.hg38.vcf. Error was: It doesn't exist.

    How to solve this problem? This was mentioned above many times, but not be finished to a satisfactory extent.
  • cnormancnorman United StatesMember, Broadie, Dev ✭✭

    @wangchengshi As mentioned above, the syntax for specifying argument tags has changed (and the documentation was out of sync for a while, though it is now fixed). The tags must now be specified with the argument name, not with the argument value, like this:

    --resource:hapmap,known=false,training=true,truth=true,prior=15.0 /trainee/ref/hapmap_3.3.hg38.vcf

    Note that the ":" and tags are listed with the argument name ("-resource"), not with the file name.

  • I found this bug in a WDL script downloaded from GitHub.
    Other parts of the script is correct (space separation), only this part uses colon separation.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited September 23

    Hi @Chatchawit

    Can you please share the link to the WDL you are referring to.

    Please not that before the GATK 4.1.1.0 version, that was the correct format for VariantRecalibrator. It is only since GATK v4.1.1.0 that format has changed.

    So if the WDL that you are referring to has < 4.1.1.0v then it should be fine.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    That WDL is using GATK 4.1.0.0

  • cnormancnorman United StatesMember, Broadie, Dev ✭✭

    @bhanuGandham It does look like the WDL is inconsistent, though. The SNPsVariantRecalibratorCreateModel task uses the new syntax, but the SNPsVariantRecalibrator task still has the old syntax.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    @Chatchawit and @cnorman sorry about that confusion.

    GATKv 4.1.0.0 and before, the format was:
    --resource hapmap,known=false,training=true,truth=true,prior=15.0:hapmap_3.3.hg38.sites.vcf.gz

    GATKv4.1.1.0 and after, the format is:
    --resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.hg38.sites.vcf.gz

    Looks like both tasks SNPsVariantRecalibratorCreateModel and SNPsVariantRecalibrator are using the wrong syntax, since the WDL uses the default GATKv4.1.0.0 docker. Hence the syntax should be this:
    --resource hapmap,known=false,training=true,truth=true,prior=15:${hapmap_resource_vcf}

    @cnorman can you confirm if that makes sense and I will make that correction in the WDL.

  • cnormancnorman United StatesMember, Broadie, Dev ✭✭
    edited September 25

    @bhanuGandham Yeah, its confusing. I think the new syntax was introduced in 4.1.0.0, not 4.1.1.0, and this WDL is using 4.1.0.0, so it should use the new syntax. There are 3 tasks in there that use VariantRecalibrator: IndelsVariantRecalibrator, SNPsVariantRecalibratorCreateModel, SNPsVariantRecalibrator. The first two use the old syntax, and SNPsVariantRecalibrator uses a weird hybird (colons everywhere). So I think all three are incorrect for version 4.1.0.0.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @Chatchawit and @cnorman

    Thank you for bringing this to our notice and for the information.

    I have put in a request for the correction.

Sign In or Register to comment.