If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Single Sample VQSR

Hi all!

I've got a questions concerning the VQSR.

The situation is as follows:

  • I've got more than 100 Single Sample VCFs
  • Unfortunately I wont be able to re-call the VCFs
  • Merging the Files into a single Multi-Sample VCF is, in my opinion, a bad idea due to the loss of the information stored in the INFO field
  • Creating Multi-Sample VCFs with the help of 1000G would require re-calling or merging, so this also no option.

Therefore, more or less just to see what happens, I specified multiple inputs for the VariantRecalibrator Walker and was able to produce a recal and tranches file. However, its probably still a bad idea to use the recal file for Recalibration since now there are multiple entries for the same variant (this is most likely due to the same variant in multiple single-sample VCFs?)

chr1 871334 . N . . END=871334;POSITIVE_TRAIN_SITE;VQSLOD=1.9214;culprit=MQRankSum
chr1 871334 . N . . END=871334;POSITIVE_TRAIN_SITE;VQSLOD=2.0305;culprit=MQ

I guess during the ApplyRecalibration, its not possible to decide which entry for a variant in Single Sample VCF X1 is the correct one. However this would be crucial since the entries show different VQSLOD values.

So in my opinion, its probably not possible to use VQSR in my specific case. However, since I really would like to use it, I thought maybe one of you guys knows a possibility to use it despite all the problems.

Thanks a lot!


Best Answer


Sign In or Register to comment.