We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

MuTect2 Haplotypes

micknudsenmicknudsen DenmarkMember ✭✭


I am struggling to figure out, how MuTect2 calculates allele frequencies. Based on a previous problem [1] caused by non-deterministic calls when run on multiple CPU cores, I have decided to run MuTect2 in single-core mode. However, I still see some rather peculiar variant calls.

Here is an example (in fact, the same variant as in [1]):

java -Xmx16g -Djava.io.tmpdir=tmp -jar GenomeAnalysisTK.jar -T MuTect2 --reference_sequence hg19.fa --dbsnp dbsnp_138.hg19.vcf --cosmic hg19_cosmic_v54_120711.chr.vcf --input_file:normal normal.bam --input_file:tumor tumor.bam -L chr7 --out foo.vcf -bamout foo.bam

140453136 rs113488022 A T . PASS DB;ECNT=1;HCNT=14;MAX_ED=.;MIN_ED=.;NLOD=232.30;TLOD=232.93 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/1:1455,139:0.090:69:70:0.504:44626,4581:751:704 0/0:1063,0:0.00:0:0:.:33572,0:534:529

Here the reported allele frequency is AF=0.090, which is very close to 139/(139+1455)=0.087. However, if I restrict calling to a smaller part of the chromosome (2.5MB with the position roughly in the middle), I get the following:

java -Xmx16g -Djava.io.tmpdir=tmp -jar GenomeAnalysisTK.jar -T MuTect2 --reference_sequence hg19.fa --dbsnp dbsnp_138.hg19.vcf --cosmic hg19_cosmic_v54_120711.chr.vcf --input_file:normal normal.bam --input_file:tumor tumor.bam -L chr7:139404378-142048195 --out bar.vcf -bamout bar.bam

140453136 rs113488022 A T . PASS DB;ECNT=1;HCNT=22;MAX_ED=.;MIN_ED=.;NLOD=238.14;TLOD=234.88 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/1:1487,143:0.048:69:74:0.517:45473,4700:768:719 0/0:1089,0:0.00:0:0:.:34186,0:546:543

That is, the allele frequency is 0.048 (approximately half). However, still 143/(143+1487)=0.088.

Looking at the local reassemblies in the BAM files produced by the -bamout option, it appears that the activate region in the first case is chr7:140453084-140453266 (182bp), while in the second example it is only chr7:140453084-140453167 (83bp).

I have been experimenting with this issue for about a week now, and still the only solutionI have come up with is to run MuTect2 on a single core one chromosome at the time. Is the behavior above to be expected?

Michael Knudsen

[1] http://gatkforums.broadinstitute.org/gatk/discussion/8352/inconsistent-calls-by-mutect2#latest


Best Answer


Sign In or Register to comment.