MuTect2 and VQSR: anyway of calling VQSLOD for MuTect2 ?

alaabadrealaabadre FranceMember
edited April 2017 in Ask the GATK team

Hello GATK Team ! @Sheila @Geraldine_VdAuwera

Since my last question ( here ), I am trying to build a workflow which can process all my samples with a snakemake workflow. From the previous question testing on samples, I corrected the $line and also more variants "passed" the filtering step by realigning giving the Read Group. I hope that this post can help any person having trouble to run MuTect2 at first. Please correct me if there's anything wrong, I will describe in bullets the steps so far:

  • Align using BWA MEM using FastQ reads --> output in SAM
  • SAM to BAM --> using samtools
  • Sort the BAM --> using Picard
  • Remove duplicates from BAM --> using Picard
  • Add Read Group to BAM --> using Picard
  • Base Recalibration of BAM --> using BaseRecalibrator and PrintReads
  • Indexing BAM --> using samtools
  • Building a PON for BAM --> using MuTect2 in artifact_detection_mode on the normal samples
  • Merge all the PON --> using CombineVariants
  • Run MuTect2 --> Produce raw VCF files with variants that are "PASS"

Following the GATK Best-Practices workflow for Exome Datasets using MuTect2, I am at the last step of variant filtering except if I omitted a step or did a mistake. Following this thread, I assume that there are no further processing but to extract variants that have "PASS" in the VCF files.

I was reading a bit more on the forum when I stumbled upon the VariantRecalibrator for Haplotype Caller (HC). As I understood that this tool assign a new score called VQSLOD which is supposed to be a well-calibrated score that comes from training sets. So I was thinking:

  1. is there's anyway of calling this VQSLOD on the VCF output from MuTect2 ?
  2. Would this produce a more "confident" result to interpret during the analysis of the VCF files of MuTect2 ?
  3. Are you planning to include this option in GATK 4.0 ?

I know the HC is meant for SNPs and indels but being able to produce a VQSLOD for variants on MuTect2 by training sets would supposedly have a more confident result in term of specificity and sensitivity. The only way (as far as I have read) to produce the VQSLOD is by running first the HC on the samples using -ERC GVCF for each sample and then use GenotypeGVCFs on all GVCF files together. Also, I read that the argument for MuTect2 to produce the output in GVCF is not available.

Thank you for your time and effort !

Best regards,
Alaa

Best Answer

Answers

  • Hello Sheila

    I cannot find any information about tools for filtering mutect2 vcf result, please let us know:

    • Which software we could use for filtering mutect2 result?
    • Could we use Vcf-filter or linux script for that?

    Thanks

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Ahmadsam66
    Hi,

    Sorry for the delay. We have been away at a workshop.
    If you are using GATK3, you simply need to check the filter column. If you are using GATK4, there are two tools for filtering, FilterMutectCalls and FilterByOrientationBias.
    If you are using GATK4, have a look at the tutorial we presented in the UK here.

    -Sheila

Sign In or Register to comment.