If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Mutect2 missed variant called by HaplotypeCaller
I am running GATK 3.5.0 with java version 1.8.0. I have two cell line samples that I paired with a promega baseline reference (its essentially a mixed germline sample) to run Mutect2 (which I am aware of is not a part of the Best Practices). I also ran the tumour sample a lone using the HaplotypeCaller and noticed a very clear ALK variant that was missed by Mutect2 but called by the HaplotypeCaller in both samples. Due to the nature of the cell line we also expected to see an ALK variant which is why it was detected.
What I find odd is that the local reassembly of Mutect2 seems to have discarded the variant as the bamout does not contain the variant (C > T) at loci chr2:29443695 whereas the HaplotypeCaller call does for both samples. I have read through the documentation and the specifics of the local reassembly and would be very interested in knowing at what stage this occurs and your suggestions on what can be done.
I will be trying GATK v.4.0 as well as some of the things mentioned here https://software.broadinstitute.org/gatk/documentation/article?id=1235 in the meantime I would be very greatful if someone could look into this. I will be posting the updates on my new tests as well. See details below on various metrics and IGV screenshots.
The chemistry is a DNA capture Kapa hyperplus kit, 75 paired end reads.
- Entire ALK covered up to 80X
- Mean/min coverage 1013/378
- BWA bam shows 50% allele frequency
HaplotypeCaller line Sample 945
- chr2 29443695 . G T 8496.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=5.863;ClippingRankSum=-0.368;DP=601;ExcessHet=3.0103;FS=0.536;MLEAC=1;MLEAF=0.500;MQ=62.46;MQRankSum=1.113;QD=14.21;ReadPosRankSum=0.502;SOR=0.76GT:AD:DP:GQ:PL 0/1:300,298:598:99:8525,0,8240
- Entire ALK covered up to 80x
- Mean/min coverage 523/204
- BWA bam shows 49% allele frequency
HaplotypeCaller line Sample 946
- chr2 29443695 . G T 5056.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=3.569;ClippingRankSum=-0.212;DP=397;ExcessHet=3.0103;FS=2.133;MLEAC=1;MLEAF=0.500;MQ=63.61;MQRankSum=-1.274;QD=13.00;ReadPosRankSum=0.063;SOR=0.595 GT:AD:DP:GQ:PL 0/1:199,190:389:99:5085,0,5319
Promega control sample
- Same control sample used as pair for both 945 and 946 using Mutect
- Coverage around ALK region ~200+
Please see IGV images of the various cases below. The --bamout (run together with disabling optimization and forcing output) command was run with a 500bp padding downstream and upstream of the target location that contains the variant (i.e the actual padding upstream and downstream the actual variant at loci 29443695 will be slighly more than 500bp). I also ran mutect with the adjust 500bp but included all the targets in chr2 without adding any padding on any other targets other than the one that contains the variant.
Sample945_bwaBAM - Bam output from BWA
Sample946_bwaBAM - Bam output from BWA
Thank you very much and I look forward hearing your thoughts on this