Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
mutect2 - regions absent from bamoutput file
I'm running mutect2 (gatk3.7) on amplicon data of artificial sequencings that mimick tumour samples. (the issue was also noticed in real sequencings)
I notice that quite a few regions (>80% of targeted amplicons) are missing in the bamoutput file, compared to the bam file used as input for mutect2. This leads to false negative hits, even with allelic frequency of 10%. Those regions are marked as active in the --activeRegionOut file.
I have quite high depth sequencings (1000 - 2000 reads). It seems that these variants are usually within 20 bases from the end of reads (and covered by only one in the pair).
Command used (with gatk3.7):
-T MuTect2 -L regions.bed -R human_g1k_v37.fa -I:tumor recalibrated.bam --dbsnp 1000G_phase1.snps.high_confidence.b37.vcf -maxReadsInRegionPerSample 1000000 -o variant_call.vcf --activeRegionOut active_regions.igv --bamOutput bamOutput.bam
I have two questions:
1) are the regions absent because the region is not active, and if so, what would cause regions with variants not to be active?
2) If the regions are active, why is there no data in the bam files at these positions?
Below is an example of region with this issue. The insert is present at 10% in the recalibrated bam file but the region is absent from bamoutput file. Quality of the inserted base is not different to that of other bases.