We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Inconsistency in picard CollectVariantCallingMetrics
I hope this is the right forum to ask about picard these days.
I run CollectVariantCallingMetrics on a one-sample vcf file and get both the *. variant_calling_summary_metrics and *. variant_calling_detail_metrics. I would expect these to be close to identical, when only one sample is used, however, most values are different.
Command:java -Djava.awt.headless=true -Xmx1500m -jar picard.jar CollectVariantCallingMetrics INPUT=${samplename}.final_variants.vcf.gz DBSNP=${DBSNP} OUTPUT=${samplename}.CollectVariantCallingMetrics
output:
METRICS CLASS picard.vcf.CollectVariantCallingMetrics$VariantCallingDetailMetrics
SAMPLE_ALIAS HET_HOMVAR_RATIO TOTAL_SNPS NUM_IN_DB_SNP NOVEL_SNPS FILTERED_SNPS PCT_DBSNP DBSNP_TITV
SAMPLENAME 1.515108 80563 79938 625 27362 0.992242 2.431405
METRICS CLASS picard.vcf.CollectVariantCallingMetrics$VariantCallingSummaryMetrics
TOTAL_SNPS NUM_IN_DB_SNP NOVEL_SNPS FILTERED_SNPS PCT_DBSNP DBSNP_TITV
84172 82347 1825 29284 0.978318 2.401504
picard version= 1.141
Best Answer
-
Sheila Broad Institute admin
@vang
Hi,Yes, Picard's latest version uses Java 1.8. When I tested this yesterday, I used a single sample VCF. I got consistent results in both outputs of CollectVariantCallingMetrics. However, my colleague tested it and said she got inconsistent results.
After looking into this, I think the inconsistency comes from homozygous reference sites in your VCF. The detail file does not include homozygous reference sites, but the summary file does include homozygous reference sites. No-call sites are excluded from both files.
I hope this clears things up.
-Sheila
Answers
Hi @vang, this is indeed the right place to ask about Picard tools.
I'm not sure what's going on here but we'll look into it. Have you tried using any other tools to evaluate which set of numbers might be the most accurate?
@vang
Hi,
I just tried with the latest version of Picard, and I get the exact same numbers in both output files. I'm not sure if it is indeed an issue/bug with the version you are using, but can you try again with the latest version?
Thanks,
Sheila
Hi Sheila
too bad, i just tried the 2.0.1 version and it gave the exact same output as version 1.141. So I still get inconsistency. Could we try on the same vcf file and compare results? Is the one you used publicly available?
Btw, did picard 2 move to java 8? I use to use java 7, but now it only works with version 8.
Thanks
@vang
Hi,
Yes, Picard's latest version uses Java 1.8. When I tested this yesterday, I used a single sample VCF. I got consistent results in both outputs of CollectVariantCallingMetrics. However, my colleague tested it and said she got inconsistent results.
After looking into this, I think the inconsistency comes from homozygous reference sites in your VCF. The detail file does not include homozygous reference sites, but the summary file does include homozygous reference sites. No-call sites are excluded from both files.
I hope this clears things up.
-Sheila
Hi Sheila,
I removed all homozygote refs with GATK's SelectVariants --excludeNonVariants and now everything fits perfectly. Great!
The results from picard and GATK VariantEval differ some. Using the same vcf and dbSNP file I get these TiTvRatio results:
GATK:
dbSNP: 2.41
Novel: 1.51
picard:
dbSNP: 2.431405
Novel: 1.753304
Are there any differences in the way the results are calculated in the two methods?
Thanks
@vang
Hi!
I suspect you used -comp for the dbsnp file you input to VariantEval. If you use --dbsnp "your dbsnp file", you should get the same number from both tools
-Sheila
Hmm, no. These are my parameters:
-R ucsc.hg19.fasta -T VariantEval --eval:set1 final_variants.vcf.gz --dbsnp dbsnp_138.hg19.vcf -o final_variants.vcf.gz.eval.grp
-Søren