The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

#### ☞ Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

#### ☞ Got a problem?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

#### ☞ Did we ask for a bug report?

Then follow instructions in Article#1894.

#### ☞ Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ` ) each to make a code block as demonstrated here.

##### Jump to another community
Picard 2.10.2 is now available at https://github.com/broadinstitute/picard/releases.
GATK version 4.beta.2 (i.e. the second beta release) is out. See the GATK4 BETA page for download and details.

# VQSR

Member

Hi,

I am working on dog genome and trying to use VQSR on my data.

Here is the command i have used:

java -Xmx4G -jar GenomeAnalysisTK.jar -R genome.fa -T VariantRecalibrator -input GATK-snp.vcf -resource:dbsnp,known=false,training=true,truth=true,prior=6.0 canFam3_SNP.vcf -mode SNP -recalFile output.recal -tranchesFile output.tranches -rscriptFile output.plots.R -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an FS -an MQ -an Inbreed

1. I have only dbSNP file as training set and i have set the options, known=true,training=false,truth=false,prior=6.0 in the command line as per the documentation. But that doesn't work and instead suggested to use known=false,training=true,truth=true,prior=6.0. What is the prior =6.0 here? is there any threshold for prior?

2.The above command produces empty tranches and recal file.

3.Even though the files are empty i have proceeded to ApplyRecalibration with the below command:

java -Xmx4G -jar GenomeAnalysisTK.jar -R genome.fa -T ApplyRecalibration -input GATK-snp.vcf --ts_filter_level 99.0 -tranchesFile output.tranches -recalFile output.recal -mode SNP -o recalibrated.filtered.vcf.

It gives the error:

ERROR MESSAGE: Invalid command line: No tribble type was provided on the command line and the type of the file could not be determined dynamically. Please add an explicit type tag :NAME listing the correct type from among the supported types:

##### ERROR

Any help to fix these?

Tagged:

## Answers

• Cambridge, MAMember, Administrator, Broadie

Hi there,

1. You have to specify at least one training set containing truth variants for VQSR to work. The prior is the prior likelihood that you assign to variants in the truth set. It represents the probability that a variant in that set is indeed true and not an artifact. The value depends mainly on how confident you are about the quality of the call set. See more discussion on this here.

2. What was the console output? Did you get any warnings or error message?

3. If the files are empty there is no point in running the next step, it will not work.

• Member

Thanks. there seems to be error with -an Inbreed annotation. I have removed this and it works now. I have added the option -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 to the VariantRecalibrator along with the above command, followed by ApplyRecalibration. Now i have the recalibrated scores. Could you let me know how to interpret VQSLOD scores and the PASS or fail filter?

Does it mean the higher the score, the variant is more reliable? or the other way?

• Cambridge, MAMember, Administrator, Broadie

That is addressed in the documentation for the VQSR method. We are happy to answer detailed questions, but please read the method documentation before asking general questions.

• Member

A most frequent question about the filtering parameters is, what are the ideal thresholds for filtering such as QUAL (quality of the SNP), Mapping quality(MQ) and the most frequent answer is, it depends on the dataset

QUAL and MQ are the phred-scaled probability scores for the variant. Can we use QUAL > 40 and MQ>40 to get a good set of filtered variants irrespective of the dataset?

• Cambridge, MAMember, Administrator, Broadie

Unfortunately there is no absolute rule that will yield a good set of filtered variants irrespective of the dataset. Part of the problem is how do you qualify a good set? Is it a very sensitive set, or very specific set? If you use very high quality filters, you will probably get a very specific set, but you will miss out variants that are real despite having low scores. If you lower the filter thresholds to retrieve those variants, you also let in false positives.

That is the point of VQSR, to be able to identify patterns of covariation that are more informative than simply filtering on quality scores, and to fine-tune the filtering to achieve your desired compromise between sensitivity and specificity. But it is not perfect, and it is not possible to use with every dataset. In any case, you need to experiment with the settings to find what works for you.

Sign In or Register to comment.