To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at

Germline VQSR recommended settings

Hi, we've been looking at the new Best Practices pages and at the WDLs linked there. In particular, we looked at the settings for VariantRecalibrator in this WDL. We ran germline analyses on samples HG001 and HG002 with versions 4.beta.2 and The 4.beta.2 VariantRecalibrator used 3.7 Best Practices parameters, while the VariantRecalibrator used settings from the mentioned WDL. We noticed a decrease in SNP recall from to 4.beta.2 and an increase in INDEL recall from to 4.beta.2. For example, for sample HG001, the scores are:

Version SNP Precision SNP Recall INDEL Precision INDEL Recall
4.beta.2 0.999058 0.997499 0.993983 0.986496 0.998427 0.985755 0.993613 0.993145

Are these scores to be expected and if not are there other VariantRecalibrator settings that we should set for germline analysis?

Note: Except VariantRecalibrator, the settings for all of the tools are the same. HaplotypeCaller was ran with --interval-set-rule UNION --genotyping-mode DISCOVERY --emit-ref-confidence GVCF. Also the precision and recall scores for raw VCFs outputed by HaplotypeCaller/GenotypeGVCFs are close to identical between 4.beta.2 and for these samples. The exact command lines for VQSR are:

./gatk --java-options "-Xmx2048M" VariantRecalibrator --rscript-file snp_hg001.recal.R --tranches-file snp_hg001.tranches --output snp_hg001.recal --use-annotation QD --use-annotation MQRankSum --use-annotation FS --use-annotation DP --use-annotation ReadPosRankSum --use-annotation SOR --use-annotation MQ --variant hg001.vcf --resource dbsnp,prior=7,truth=false,training=false,known=true:dbsnp_137.b37.vcf --resource 1000G,prior=10,truth=true,training=true,known=false:1000G_phase1.snps.high_confidence.b37.vcf --resource omni,prior=12,truth=true,training=true,known=false:1000G_omni2.5.b37.vcf --resource hapmap,prior=15,truth=true,training=true,known=false:hapmap_3.3.b37.vcf --truth-sensitivity-tranche 100 --truth-sensitivity-tranche 99.95 --truth-sensitivity-tranche 99.9 --truth-sensitivity-tranche 99.8 --truth-sensitivity-tranche 99.6 --truth-sensitivity-tranche 99.5 --truth-sensitivity-tranche 99.4 --truth-sensitivity-tranche 99.3 --truth-sensitivity-tranche 99 --truth-sensitivity-tranche 98 --truth-sensitivity-tranche 97 --truth-sensitivity-tranche 90 --trust-all-polymorphic --reference human_g1k_v37_decoy.fasta --mode SNP --max-gaussians 6

./gatk --java-options "-Xmx2048M" VariantRecalibrator --rscript-file indel_hg001.recal.R --tranches-file indel_hg001.tranches --output indel_hg001.recal --use-annotation DP --use-annotation FS --use-annotation ReadPosRankSum --use-annotation MQRankSum --use-annotation QD --use-annotation SOR --variant hg001.vcf --resource dbsnp,prior=2,truth=false,training=false,known=true:dbsnp_137.b37.vcf --resource mills,prior=12,truth=true,training=true,known=false:Mills_and_1000G_gold_standard.indels.b37.sites.vcf --truth-sensitivity-tranche 100 --truth-sensitivity-tranche 99.95 --truth-sensitivity-tranche 99.9 --truth-sensitivity-tranche 99.5 --truth-sensitivity-tranche 99 --truth-sensitivity-tranche 97 --truth-sensitivity-tranche 96 --truth-sensitivity-tranche 95 --truth-sensitivity-tranche 94 --truth-sensitivity-tranche 93.5 --truth-sensitivity-tranche 93 --truth-sensitivity-tranche 92 --truth-sensitivity-tranche 91 --truth-sensitivity-tranche 90 --trust-all-polymorphic --reference human_g1k_v37_decoy.fasta --mode INDEL --max-gaussians 4

./gatk --java-options "-Xmx2048M" ApplyVQSR --output hg001.vcf --variant hg001.vcf --truth-sensitivity-filter-level 99.7 --tranches-file snp_hg001.tranches --reference human_g1k_v37_decoy.fasta --recal-file snp_hg001.recal --mode SNP

./gatk --java-options "-Xmx2048M" ApplyVQSR --output hg001.vcf --variant hg001.vcf --truth-sensitivity-filter-level 99.7 --tranches-file indel_hg001.tranches --reference human_g1k_v37_decoy.fasta --recal-file indel_hg001.recal --mode INDEL

Issue · Github
by Sheila

Issue Number
Last Updated


  • SheilaSheila Broad InstituteMember, Broadie, Moderator
    edited January 24


    Interesting. I am going to bring this up to the team, and someone will get back to you soon.


    EDIT: Can you explain what you mean by "Also the precision and recall scores for raw VCFs outputed by HaplotypeCaller/GenotypeGVCFs are close to identical between 4.beta.2 and for these samples. " Thanks

  • Hi Sheila,

    What we mean is that the raw VCFs (after calling, before recalibration) generated by 4.beta.2 and give very similar results. That is what led us to believe that VQSR is the one causing the big difference in recall. For example, for the same HG001 sample mentioned above, the scores for raw VCFs are:

    Version SNP Precision SNP Recall INDEL Precision INDEL Recall
    4.beta.2 0.997819 0.999701 0.993438 0.993349 0.997818 0.999716 0.993419 0.993488

    As you can see, the differences are minor compared to the ones in the table above.


  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    Hi Teodora,

    Okay. Thanks for the table. I will have to ask the team and get back to you.


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi @teodora_aleksic, sorry to be getting back to you so late. You mentioned you're running the VR with different settings between beta.2 and The first step here is to figure out whether it's the software version that causes the differences or the settings. Have you checked whether swapping those settings explains what you're seeing?

  • Hi @Geraldine_VdAuwera, we did experiment with the settings. What worked best for us in the end was to default all parameters for SNP VR and for INDEL VR to also default everything except max-gaussians, which we set to 4. The annotations we used for both are: QD, MQRankSum, FS, DP, ReadPosRankSum and SOR. This gave us similar or slightly better results than 4.beta.2.

    You can look at our settings in more detail here.

Sign In or Register to comment.