MuTect2 downsampling (-dfrac) - numbers don't match?

papiliopapilio SwedenMember

Hi - I have a pilot normal-tumor paired samples that are sequenced to ~300X. We are now doing downsampling and see what's the minimum coverage we need to capture the SNPs/INDELs found in the original samples. But with -dfrac 0.25, which is supposed to downsample the original samples to 25% depth, gave a higher number of sites - 2659 sites were detected in the original, and 3321 sites with downsampling. Only very few of them overlap.

I also ran the downsampling once again, to get a "replicate" of it. the numbers roughly match but still only some of sites overlap.

What might have caused this discrepancy?

java -jar GATK.jar -R REFERENCE.fa -T MuTect2 -nct 8 -L INTERVAL.bed -I:tumor TUMOR.bam -I:normal NORMAL.bam -o OUTPUT.vcf -gt_mode DISCOVERY -stand_call_conf 10 --heterozygosity 0.00001 -dfrac 0.25


  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin


    It is possible the multithreading is causing the issue. However, we do not recommend using downsampling options in MuTect2 right now. There are a little finicky. I think the GATK4 version of MuTect2 will have more stable downsampling options.

    You can check if removing -nct 8 helps for now.


