Different variant calling result depending on the interval length using Mutect2


we are working with Mutect2 to call somatic variants (snvs and indels) on targeted sequencing. We recently have noticed an issue related to an specific variant detection, so that we detect it if we have an small interval within is the variant, but if this interval is greater, it is not detected.

These are both intervals:
chr14 104780203 104780225
chr14 104780078 104780226

The interest variant is in the chr14 104780214 genomic position (within both intervals), and we are able to see it using IGV with an AF of 66%, as it is shown in the image (sample_bam_igv.png).

The command we are using is the following (we also run this with last gatk version, gatk-, and have the same result):
gatk- --java-options "-Xmx30g" Mutect2 --native-pair-hmm-threads 20 -R ~/Homo_sapiens.GRCh38.fa -I sample.bam -L interval1.bed -tumor sample --max-reads-per-alignment-start 0 -O sample_interval1.vcf -bamout sample_interval1.bam

gatk- --java-options "-Xmx30g" Mutect2 --native-pair-hmm-threads 20 -R ~/Homo_sapiens.GRCh38.fa -I sample.bam -L interval2.bed -tumor sample --max-reads-per-alignment-start 0 -O sample_interval2.vcf -bamout sample_interval2.bam

As a result:
chr14 104780214 . C T . . DP=145;ECNT=1;POP_AF=5.000e-08;TLOD=269.13 GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:49,96:0.661:0,0:49,96:28:150,150:60:26:0.657,0.626,0.662:0.027,0.026,0.947

-Interval2: Nothing detected

The IGV images with the both bamouts are also provided (bamouts_igv.png).

We are using bamclipper software as a bam preprocessing step for removing primers and maybe this is interfering in the variant calling result.

We really appreciate your help if you help us to understand why we are getting these results.



  • xiuczxiucz Member
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @jmartin_incliva

    Would you please try the same with the latest version of GATK and let us know if the problem persists.

  • Thank you @bhanuGandham

    I have tried with GATK version and I got the same result as expressed above in the post. The variant is detected for interval1, but missed for interval2.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @jmartin_incliva

    I brought this issue up with the dev team and this is what they have to say:

    Yes, there are some unpleasant pathologies in the gatk graph assembly, which powers both Mutect2 and HaplotypeCaller, that can lead to variants being called or missed depending on the intervals over which calls are being made. These situations are rare, but they do happen. Some of these issues have been fixed recently (even since the release of gatk 4.1), so if this is causing an issue for the user they can build the gatk master branch, or wait for the next release, and see if that fixes their particular case. If not, I can only say we are aware of these types of issues, and there is active work to fix as many of the pathologies causing them as possible.

    I hope this helps for now and sorry about the inconvenience.

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @jmartin_incliva In the gatk master branch and in the upcoming release there's a decent chance that the command line flag --recover-all-dangling-branches would help. Unfortunately, this parameter increase the false discovery rate by 1-2%, which is unacceptable. We will make this default along with a couple of other enhancements once we can eliminate these extra false positives.

Sign In or Register to comment.