I am using VQSR for non human species, I have the below error message.
((ERROR MESSAGE: Invalid command line: No training set found! Please provide sets of known polymorphic loci marked with the training=true ROD binding tag. For example, -resource:hapmap,VCF,known=false,training=true,truth=true,prior=12.0 hapmapFile.vcf))

my script ((java -d64 -Xmx48g -jar /home/mbxao2/R-drive/tools/GATK/GenomeAnalysisTK.jar -T VariantRecalibrator -R Gallus_gallus.Gallus_gallus-5.0.dna.chromosome.1.fa -input saudi.vcf -resource:dbsnp,known=true,training=true,truth=true,prior=2.0 /home/mbxao2/R-drive/known_sites_db/dbSNPs.vcf -an DP -an QD -an FS -an SOR -an MQ -an MQRankSum -an ReadPosRankSum -mode SNP -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 -recalFile recalibrate_SNP.recal -tranchesFile recalibrate_SNP.tranches -rscriptFile recalibrate_SNP_plots.R))

the literature says dbsnp, known=true, which gave me the error above, when I change dbsnp=false and prior=12.0 works fine without any error.

my question here is my last move correct or it can affect something at the next stage.

thank you,


  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    Hi Ahmed,

    I don't think you will get correct results using just one resource file that is not high confidence. Can you please post the plots you get from VQSR? Also, how many samples are in your input VCF? Are they whole genome or whole exome? You may be better off using hard filtering if you don't have more known variation resource files. Have a look at this article for more information on hard filtering.


  • Hi Sheila,
    Thanks for replying. unfortunately, I have only one plot (recalibrate_SNP.tranches.pdf) maybe due to R problem, please see the attached file.
    I have 5 samples only in my vcf file, this data is whole genome sequence.



  • Thanks Geraldine. I will try with hard filtering.

