Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

MuTect2 downsampling (-dfrac) - numbers don't match?

papiliopapilio SwedenMember

Hi - I have a pilot normal-tumor paired samples that are sequenced to ~300X. We are now doing downsampling and see what's the minimum coverage we need to capture the SNPs/INDELs found in the original samples. But with -dfrac 0.25, which is supposed to downsample the original samples to 25% depth, gave a higher number of sites - 2659 sites were detected in the original, and 3321 sites with downsampling. Only very few of them overlap.

I also ran the downsampling once again, to get a "replicate" of it. the numbers roughly match but still only some of sites overlap.

What might have caused this discrepancy?

java -jar GATK.jar -R REFERENCE.fa -T MuTect2 -nct 8 -L INTERVAL.bed -I:tumor TUMOR.bam -I:normal NORMAL.bam -o OUTPUT.vcf -gt_mode DISCOVERY -stand_call_conf 10 --heterozygosity 0.00001 -dfrac 0.25

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @papilio
    Hi,

    It is possible the multithreading is causing the issue. However, we do not recommend using downsampling options in MuTect2 right now. There are a little finicky. I think the GATK4 version of MuTect2 will have more stable downsampling options.

    You can check if removing -nct 8 helps for now.

    -Sheila

Sign In or Register to comment.