Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Mutect2, PIK3CA E545K not detected; low MQ reads being filtered out even if MQ filters are disabled

mack812mack812 SpainMember
edited August 29 in Ask the GATK team

Hi,

I am analyzing a dataset from a small panel of cancer hotspots (target-enrichment by amplicons, coverage > 1000x, single-end read technology).

Right now I am struggling to get propper results with Mutect2 from a commercial reference sample (with many validated known mutations). One of the mutations in the reference, which is validated to be present in the sample with 9% allelic frequence, gets consistently ignored by Mutect2. The mutation (PIK3CA E545K, one of the most frequent mutation hotspots in cancer) can be easily seen in the input bam with IGV.

When checking the bamout file I find that there are almost no reads in the region (less than 10) while the input bam had more than 2000x in depth. I am pretty sure that the problem is the mapping quality: this is a highly pseudogenized region (PIK3CA exon 10), and mapping quality of reads is around 15. Indeed the only few reads left in the bamout are those with higher MQ values.

Things I have tried: --alleles mode, --force-active,--disable-tool-default-read-filters, --f1r2-median-mq 5

This is an example command of what I am running (using GATK 4.1.2.0):

gatk --java-options "-Xmx${mem_alloc_4k}" \
  Mutect2 \
    -R $ref_fasta \
    -I $input_path/$tumor_bam \
    -germline-resource $gnomeAD_af \
    -L $intervals \
    --alleles $hotspots_vcf \
    --max-reads-per-alignment-start 0 \
    --f1r2-median-mq 5 \
    -O $outpath/${bam_basename}.unfiltered.vcf \
    -bamout $outpath/${bam_basename}.bamout.bam \
    --disable-tool-default-read-filters true

The other version substitutes the --alleles option with --force-active. I have checked and the region is present in the intervals_listfile and the mutation is correctly described in the .vcf passed with the --alleles argument. The rest of reference mutations in the reference sample get detected with alleleic frequences very close to the expected values.

Apart from --f1r2-median-mq, are there any additional "hidden" options removing reads with poor mapping quality? It seems like the MQ filter is still there

Answers

  • amjaddamjadd FinlandMember ✭✭

    @mack812 Try setting --max-reads-per-alignment-start 0 in Mutect2 command. It should fix this issue given my previous experience.

  • mack812mack812 SpainMember

    Thanks @amjadd but I already included that option as you can see in the command from my previous post.

  • 29043594952904359495 Member

    will this affect?

  • mack812mack812 SpainMember
    edited August 29

    Hi @2904359495,

    I added the --disable-tool-default-read-filters argument as a brute-force way of avoiding having any read filtered out (not planning to keep it once I solve the issue). Anyway, the rest of the validated mutations in the sample are correctly found so it does not seem to be causing any major mulfunction of Mutect2.

  • 29043594952904359495 Member

    @mack812 thanks a lot, in our national reference sample test, we can also see some variants lost

  • mack812mack812 SpainMember
    edited August 30

    By the way I just discovered yet another hidden option in M2 that could potentially be filtering out low MQ reads, which I found by running:

    gatk Mutect2 --showHidden 2>&1 | grep 'mapping'
    

    The option is --minimum-mapping-quality. From grep output:

    --minimum-mapping-quality:Integer
                                  Minimum mapping quality to keep (inclusive)  Default value: 20.
    

    However, including it in the command with value == 0 does not prevent reads mapping to PIK3CA Exon 10 from disappearing from the bamout. I have tried many different combinations with the other options but no luck in saving those reads from being removed.

    BTW, why are there 3 different options to filter by mapping quality in M2? so far I have --disable-read-filter MappingQualityReadFilter, --minimum-mapping-quality, and --f1r2-median-mq. Are maybe some of them legacy arguments not functional in the last release?

  • mack812mack812 SpainMember
    edited August 30

    As a final proof that the MQ is the cause of this issue, I have done the following:

    1.- Hack the input BAM file, extracting reads mapping to PIK3CA ex10 and changing the MQ value of every read to 99 by running:

    samtools view -h $tumor_bam "chr3:178936037-178936190" | awk -F"\t" '{OFS=FS} $5 = 99' | samtools view -1 > ${tumor_bam}_PIK3CA_ex10_MQ-99.bam
    
    samtools index ${tumor_bam}_PIK3CA_ex10_MQ-99.bam
    

    2.- After that, I use this mini-BAM with "fixed" MQs to run the following Mutect2 command, this time not removing any read filter:

    gatk --java-options "-Xmx${mem_alloc_4k}" \
      Mutect2 \
        -R $ref_fasta \
        -I $input_path/${tumor_bam}_PIK3CA_ex10_MQ-99.bam \
        -germline-resource $gnomeAD_af \
        -L $intervals \
        --max-reads-per-alignment-start 0 \
        -O $outpath/${bam_basename}.unfiltered.vcf \
        -bamout $outpath/${bam_basename}.bamout.bam
    

    Result: Not only this time the reads in PIK3CA ex10 have not dissapeared from the bamout (depth close to 2000x), but also the unfiltered.vcf file contains the PIK3CA E545K variant at an allele frequency reasonably close to the expected one (hg19 coordinates):

    chr3    178936091       .       G       A       .       .       DP=1924;ECNT=3;MBQ=33,32;MFRL=0,0;MMQ=99,99;MPOS=40;POPAF=7.30;TLOD=316.16      GT:AD:AF:
                    0/1:1771,150:0.078:1921:898,67:871,82:872,899,83,67
    

    3.- If I do the same without "fixing" the low MQ values (simply working with a mini-bam with the reads in the region and keeping their original MQs, command below), and then run the sameMutect2 command as above, the reads are again lost in the bamout and the variant is not called:

    samtools view -1 -h $tumor_bam "chr3:178936037-178936190" > ${tumor_bam}_PIK3CA_ex10.bam
    
    samtools index ${tumor_bam}_PIK3CA_ex10.bam
    

    So, please please let me know how to effectively switch-off the MQ read filter or, even better, how to reduce its cutoff value :)

    (FYI @2904359495 and @amjadd )

  • 29043594952904359495 Member

    @mack812 , I also do not know, but there is some materials about mapq, I do not know if helps
    http://www.acgt.me/blog/2014/12/16/understanding-mapq-scores-in-sam-files-does-37-42

Sign In or Register to comment.