To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at



I am working on dog genome and trying to use VQSR on my data.

Here is the command i have used:

java -Xmx4G -jar GenomeAnalysisTK.jar -R genome.fa -T VariantRecalibrator -input GATK-snp.vcf -resource:dbsnp,known=false,training=true,truth=true,prior=6.0 canFam3_SNP.vcf -mode SNP -recalFile output.recal -tranchesFile output.tranches -rscriptFile output.plots.R -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an FS -an MQ -an Inbreed

  1. I have only dbSNP file as training set and i have set the options, known=true,training=false,truth=false,prior=6.0 in the command line as per the documentation. But that doesn't work and instead suggested to use known=false,training=true,truth=true,prior=6.0. What is the prior =6.0 here? is there any threshold for prior?

2.The above command produces empty tranches and recal file.

3.Even though the files are empty i have proceeded to ApplyRecalibration with the below command:

java -Xmx4G -jar GenomeAnalysisTK.jar -R genome.fa -T ApplyRecalibration -input GATK-snp.vcf --ts_filter_level 99.0 -tranchesFile output.tranches -recalFile output.recal -mode SNP -o recalibrated.filtered.vcf.

It gives the error:

ERROR MESSAGE: Invalid command line: No tribble type was provided on the command line and the type of the file could not be determined dynamically. Please add an explicit type tag :NAME listing the correct type from among the supported types:

ERROR Name FeatureType Documentation
ERROR BCF2 VariantContext
ERROR VCF VariantContext
ERROR VCF3 VariantContext

Any help to fix these?


Best Answer


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi there,

    1. You have to specify at least one training set containing truth variants for VQSR to work. The prior is the prior likelihood that you assign to variants in the truth set. It represents the probability that a variant in that set is indeed true and not an artifact. The value depends mainly on how confident you are about the quality of the call set. See more discussion on this here.

    2. What was the console output? Did you get any warnings or error message?

    3. If the files are empty there is no point in running the next step, it will not work.

  • meharmehar Member

    Thanks. there seems to be error with -an Inbreed annotation. I have removed this and it works now. I have added the option -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 to the VariantRecalibrator along with the above command, followed by ApplyRecalibration. Now i have the recalibrated scores. Could you let me know how to interpret VQSLOD scores and the PASS or fail filter?

    Does it mean the higher the score, the variant is more reliable? or the other way?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    That is addressed in the documentation for the VQSR method. We are happy to answer detailed questions, but please read the method documentation before asking general questions.

  • meharmehar Member

    A most frequent question about the filtering parameters is, what are the ideal thresholds for filtering such as QUAL (quality of the SNP), Mapping quality(MQ) and the most frequent answer is, it depends on the dataset :)

    QUAL and MQ are the phred-scaled probability scores for the variant. Can we use QUAL > 40 and MQ>40 to get a good set of filtered variants irrespective of the dataset?

Sign In or Register to comment.