The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Powered by Vanilla. Made with Bootstrap.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.
Register now for the upcoming GATK Best Practices workshop, Feb 20-22 in Leuven, Belgium. Open to all comers! More info and signup at

best way of filtering out common SNPs in the GATK outputted VCF file

rcholicrcholic DenverMember Posts: 68

In my PiCard/GATK pipeline, I already include the 1000G_gold_standard and dbsnp files in my VQSR step, I am wondering if I should further filter the final vcf files. The two files I use are Mills_and_1000G_gold_standard.indels.hg19.vcf and dbsnp_137.hg19.vcf, downloaded from the GATK resource bundle.

I recently came across the NHLBI exome seq data, and the more complete 1000G variants

These made me wonder if I should use these available VCFs to further filter my VCF files to remove the common SNPs. If so, can I use the "--mask" parameter in VariantFiltration of GATK to do the filtration? Examples below copied from documentation page:

    java -Xmx2g -jar GenomeAnalysisTK.jar \
       -R ref.fasta \
       -T VariantFiltration \
       -o output.vcf \
       --variant input.vcf \
       --filterExpression "AB < 0.2 || MQ0 > 50" \
       --filterName "Nov09filters" \
       --mask mask.vcf \
       --maskName InDel
Sign In or Register to comment.