VariantEval on MultiSample calling VCF

thomas_wthomas_w Posts: 17Member


I want to know what's the best way to use VariantEval to get statistics for each sample in a multisample VCF file. If I call it like this:

java -jar GenomeAnalysisTK.jar \
-R ucsc.hg19.fasta \
-T VariantEval \
-o multisample.eval.gatkreport \
--eval annotated.combined.vcf.gz \
--dbsnp dbsnp_137.hg19.vcf

where annotated.combined.vcf.gz is a VCF file that contains ~1Mio variants for ~800 samples I get statistics for all samples combined, e.g.

#:GATKTable:CompOverlap:The overlap between eval and comp sites
CompOverlap CompRod EvalRod JexlExpression Novelty nEvalVariants ...
CompOverlap dbsnp eval none all 471704 191147
CompOverlap dbsnp eval none known 280557 0
CompOverlap dbsnp eval none novel 191147 191147

But I would like to get one such entry per sample. Is there an easy way to do this?


Best Answer


  • thomas_wthomas_w Posts: 17Member

    Thanks, I'll give it a try! I tried that one already yesterday, but in combination with some other modules and it said it would take something like 6 days. But with your combination the running time seems to be reasonable.

    However, I would like to get some more information on the single modules, but the links on the [](manual page) don't work.

  • pdexheimerpdexheimer Posts: 526Member, Dev ✭✭✭✭

    Yeah, I don't think there's ever been real comprehensive documentation on them - they kind of fall into that low-priority class with ROD Codecs and VariantAnnotator annotations. I've had pretty good success figuring things out through a combination of source diving and experimentation, though that obviously takes some time and effort

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 9,344Administrator, Dev admin

    That's right, unfortunately they've just not been a priority -- those links are placeholders for when we eventually get around to documenting them.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.