Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
MuTect2 and VQSR: anyway of calling VQSLOD for MuTect2 ?
Since my last question ( here ), I am trying to build a workflow which can process all my samples with a snakemake workflow. From the previous question testing on samples, I corrected the $line and also more variants "passed" the filtering step by realigning giving the Read Group. I hope that this post can help any person having trouble to run MuTect2 at first. Please correct me if there's anything wrong, I will describe in bullets the steps so far:
- Align using BWA MEM using FastQ reads --> output in SAM
- SAM to BAM --> using samtools
- Sort the BAM --> using Picard
- Remove duplicates from BAM --> using Picard
- Add Read Group to BAM --> using Picard
- Base Recalibration of BAM --> using BaseRecalibrator and PrintReads
- Indexing BAM --> using samtools
- Building a PON for BAM --> using MuTect2 in artifact_detection_mode on the normal samples
- Merge all the PON --> using CombineVariants
- Run MuTect2 --> Produce raw VCF files with variants that are "PASS"
Following the GATK Best-Practices workflow for Exome Datasets using MuTect2, I am at the last step of variant filtering except if I omitted a step or did a mistake. Following this thread, I assume that there are no further processing but to extract variants that have "PASS" in the VCF files.
I was reading a bit more on the forum when I stumbled upon the VariantRecalibrator for Haplotype Caller (HC). As I understood that this tool assign a new score called VQSLOD which is supposed to be a well-calibrated score that comes from training sets. So I was thinking:
- is there's anyway of calling this VQSLOD on the VCF output from MuTect2 ?
- Would this produce a more "confident" result to interpret during the analysis of the VCF files of MuTect2 ?
- Are you planning to include this option in GATK 4.0 ?
I know the HC is meant for SNPs and indels but being able to produce a VQSLOD for variants on MuTect2 by training sets would supposedly have a more confident result in term of specificity and sensitivity. The only way (as far as I have read) to produce the VQSLOD is by running first the HC on the samples using -ERC GVCF for each sample and then use GenotypeGVCFs on all GVCF files together. Also, I read that the argument for MuTect2 to produce the output in GVCF is not available.
Thank you for your time and effort !