Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!

Single Sample VQSR

Hi all!

I've got a questions concerning the VQSR.

The situation is as follows:

  • I've got more than 100 Single Sample VCFs
  • Unfortunately I wont be able to re-call the VCFs
  • Merging the Files into a single Multi-Sample VCF is, in my opinion, a bad idea due to the loss of the information stored in the INFO field
  • Creating Multi-Sample VCFs with the help of 1000G would require re-calling or merging, so this also no option.

Therefore, more or less just to see what happens, I specified multiple inputs for the VariantRecalibrator Walker and was able to produce a recal and tranches file. However, its probably still a bad idea to use the recal file for Recalibration since now there are multiple entries for the same variant (this is most likely due to the same variant in multiple single-sample VCFs?)

chr1 871334 . N . . END=871334;POSITIVE_TRAIN_SITE;VQSLOD=1.9214;culprit=MQRankSum
chr1 871334 . N . . END=871334;POSITIVE_TRAIN_SITE;VQSLOD=2.0305;culprit=MQ

I guess during the ApplyRecalibration, its not possible to decide which entry for a variant in Single Sample VCF X1 is the correct one. However this would be crucial since the entries show different VQSLOD values.

So in my opinion, its probably not possible to use VQSR in my specific case. However, since I really would like to use it, I thought maybe one of you guys knows a possibility to use it despite all the problems.

Thanks a lot!


Best Answer


Sign In or Register to comment.