Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Best workflow for VQSR when you eventually want individual sample exome VCFs?

Dear GATK!

We are running an exome sequencing project where we have between 30 and 50 exomes in total. For optimal variant quality score recalibration we should use as much data as possible in the VariantRecalibrator step. However, for downstream analysis purposes, we want individual exome VCFs, and UnifiedGenotyper has been run individually for each sample. Our plan was to feed all data into VariantRecalibrator, and then run ApplyRecalibration on the individual raw VCFs. But VariantRecalibrator takes only one VCF as input, right? So what would be the best workflow for this scenario? Could we run UnifiedGenotyper to create a common VCF for Recalibration purposes only, and then apply this to the individual VCFs? Or would this somehow create an invalid input for the RECAL-file? Is it better to run variant calling and recalibration both on multi-sample VCFs and split the VCFs sample-wise later?


Lasse P

Best Answer


Sign In or Register to comment.