To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at

Single Sample VQSR

Hi all!

I've got a questions concerning the VQSR.

The situation is as follows:

  • I've got more than 100 Single Sample VCFs
  • Unfortunately I wont be able to re-call the VCFs
  • Merging the Files into a single Multi-Sample VCF is, in my opinion, a bad idea due to the loss of the information stored in the INFO field
  • Creating Multi-Sample VCFs with the help of 1000G would require re-calling or merging, so this also no option.

Therefore, more or less just to see what happens, I specified multiple inputs for the VariantRecalibrator Walker and was able to produce a recal and tranches file. However, its probably still a bad idea to use the recal file for Recalibration since now there are multiple entries for the same variant (this is most likely due to the same variant in multiple single-sample VCFs?)

chr1 871334 . N . . END=871334;POSITIVE_TRAIN_SITE;VQSLOD=1.9214;culprit=MQRankSum
chr1 871334 . N . . END=871334;POSITIVE_TRAIN_SITE;VQSLOD=2.0305;culprit=MQ

I guess during the ApplyRecalibration, its not possible to decide which entry for a variant in Single Sample VCF X1 is the correct one. However this would be crucial since the entries show different VQSLOD values.

So in my opinion, its probably not possible to use VQSR in my specific case. However, since I really would like to use it, I thought maybe one of you guys knows a possibility to use it despite all the problems.

Thanks a lot!


Best Answer


Sign In or Register to comment.