Number of KEEP vs. REJECT calls

I'm wondering what people are seeing in terms of the number of keep vs. reject calls in a typical Illumina exome sequencing tumor/normal pair. Briefly, I'm using a BFAST for aligning followed by the general GATK steps for clean-up; the data looks clean with no quality control issues. MuTect (with dbsnp and cosmic for hg19) and the default parameters (high confidence mode, without panel of normals) generally returns a total of ~600K mutations; on average, only 100 of these are marked as keep! I would expect this number to be higher. I'm currently re-running MuTect with the extended output turned on to see the detailed rejection criteria, but is there anything specific (i.e. obvious that I am missing) that I should look closely at first to make sure that there is not a technical or processing error? I'm also setting up running orthogonal analyses with somaticsniper and jointsnvmix for comparison. Thanks!

  • Thanks, @vyellapa. Good to know about the FPs in those regions; I will examine some of our data for similar effects. I have not looked at Strelka our Seurat, getting ready to do JointSNVmix this week. There was much more documentation for running than I was expecting! Cheers!

    Hi @vyellapa -- are you saying that you see more false positives when the coverage of the control (normal) is higher than the tumor? This is quite unexpected! In the reverse situation (higher tumor coverage than normal) I would expect more false positives as there is more power to detect (tumor) than there is to reject (normal). The opposite situation should be very clean, or if anything have a slight decrease in sensitivity. Would be interested in hearing more about what you've seen?

    As for comparisons to Strelka and JointSNVMix, we compared to those methods in our publication -- but of course always good to verify in your own hands!

  • Hi Kristian, I had it in the reverse order. As you pointed out, it is hard to reject the mutation due to low normal coverage. Reiterating the point, there would be more false positives if the control coverage is low. Thanks for pointing out.

    I wanted to know if anyone tried downsampling to overcome this.

    Does REJECT means that it was called by mutect standard mode and then it got rejected by mutect high confidence filters. I am asking this as I want to know how to refer to those calls which got rejected.

