The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

#### ☞ Did you remember to?

1. Search using the upper-right search box, e.g. using the error message.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

#### ☞ Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ` ) each to make a code block as demonstrated here.

GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

# VQSR

Posts: 101

Hi,

I am working on dog genome and trying to use VQSR on my data.

Here is the command i have used:

java -Xmx4G -jar GenomeAnalysisTK.jar -R genome.fa -T VariantRecalibrator -input GATK-snp.vcf -resource:dbsnp,known=false,training=true,truth=true,prior=6.0 canFam3_SNP.vcf -mode SNP -recalFile output.recal -tranchesFile output.tranches -rscriptFile output.plots.R -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an FS -an MQ -an Inbreed

1. I have only dbSNP file as training set and i have set the options, known=true,training=false,truth=false,prior=6.0 in the command line as per the documentation. But that doesn't work and instead suggested to use known=false,training=true,truth=true,prior=6.0. What is the prior =6.0 here? is there any threshold for prior?

2.The above command produces empty tranches and recal file.

3.Even though the files are empty i have proceeded to ApplyRecalibration with the below command:

java -Xmx4G -jar GenomeAnalysisTK.jar -R genome.fa -T ApplyRecalibration -input GATK-snp.vcf --ts_filter_level 99.0 -tranchesFile output.tranches -recalFile output.recal -mode SNP -o recalibrated.filtered.vcf.

It gives the error:

ERROR MESSAGE: Invalid command line: No tribble type was provided on the command line and the type of the file could not be determined dynamically. Please add an explicit type tag :NAME listing the correct type from among the supported types:

##### ERROR

Any help to fix these?

Tagged:

Hi there,

1. You have to specify at least one training set containing truth variants for VQSR to work. The prior is the prior likelihood that you assign to variants in the truth set. It represents the probability that a variant in that set is indeed true and not an artifact. The value depends mainly on how confident you are about the quality of the call set. See more discussion on this here.

2. What was the console output? Did you get any warnings or error message?

3. If the files are empty there is no point in running the next step, it will not work.

Geraldine Van der Auwera, PhD

• Posts: 101

Thanks. there seems to be error with -an Inbreed annotation. I have removed this and it works now. I have added the option -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 to the VariantRecalibrator along with the above command, followed by ApplyRecalibration. Now i have the recalibrated scores. Could you let me know how to interpret VQSLOD scores and the PASS or fail filter?

Does it mean the higher the score, the variant is more reliable? or the other way?

Geraldine Van der Auwera, PhD

• Posts: 101

A most frequent question about the filtering parameters is, what are the ideal thresholds for filtering such as QUAL (quality of the SNP), Mapping quality(MQ) and the most frequent answer is, it depends on the dataset

QUAL and MQ are the phred-scaled probability scores for the variant. Can we use QUAL > 40 and MQ>40 to get a good set of filtered variants irrespective of the dataset?