If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office on November 11th and 13th 2019, due to the U.S. holiday(Veteran's day) and due to a team event(Nov 13th). We will return to monitoring the GATK forum on November 12th and 14th respectively. Thank you for your patience.

MuTect2 Haplotypes

micknudsenmicknudsen DenmarkMember ✭✭


I am struggling to figure out, how MuTect2 calculates allele frequencies. Based on a previous problem [1] caused by non-deterministic calls when run on multiple CPU cores, I have decided to run MuTect2 in single-core mode. However, I still see some rather peculiar variant calls.

Here is an example (in fact, the same variant as in [1]):

java -Xmx16g -jar GenomeAnalysisTK.jar -T MuTect2 --reference_sequence hg19.fa --dbsnp dbsnp_138.hg19.vcf --cosmic hg19_cosmic_v54_120711.chr.vcf --input_file:normal normal.bam --input_file:tumor tumor.bam -L chr7 --out foo.vcf -bamout foo.bam

140453136 rs113488022 A T . PASS DB;ECNT=1;HCNT=14;MAX_ED=.;MIN_ED=.;NLOD=232.30;TLOD=232.93 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/1:1455,139:0.090:69:70:0.504:44626,4581:751:704 0/0:1063,0:0.00:0:0:.:33572,0:534:529

Here the reported allele frequency is AF=0.090, which is very close to 139/(139+1455)=0.087. However, if I restrict calling to a smaller part of the chromosome (2.5MB with the position roughly in the middle), I get the following:

java -Xmx16g -jar GenomeAnalysisTK.jar -T MuTect2 --reference_sequence hg19.fa --dbsnp dbsnp_138.hg19.vcf --cosmic hg19_cosmic_v54_120711.chr.vcf --input_file:normal normal.bam --input_file:tumor tumor.bam -L chr7:139404378-142048195 --out bar.vcf -bamout bar.bam

140453136 rs113488022 A T . PASS DB;ECNT=1;HCNT=22;MAX_ED=.;MIN_ED=.;NLOD=238.14;TLOD=234.88 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/1:1487,143:0.048:69:74:0.517:45473,4700:768:719 0/0:1089,0:0.00:0:0:.:34186,0:546:543

That is, the allele frequency is 0.048 (approximately half). However, still 143/(143+1487)=0.088.

Looking at the local reassemblies in the BAM files produced by the -bamout option, it appears that the activate region in the first case is chr7:140453084-140453266 (182bp), while in the second example it is only chr7:140453084-140453167 (83bp).

I have been experimenting with this issue for about a week now, and still the only solutionI have come up with is to run MuTect2 on a single core one chromosome at the time. Is the behavior above to be expected?

Michael Knudsen



Best Answer


Sign In or Register to comment.