# I am not sure why HaplotypeCaller does not call my SNV mutation?

Member Posts: 4

Hi Guys,
I am using HaplotypeCaller to call mutations for some of patient samples. I know that at MSH2 intron5 near splicing site there is a point mutation in one of the sample. however, this is also a region in Mills-1000g known indel file. Not matter how I try, Haplotypecaller can NOT call the SNV but always call the indel. the bam file is realigned and recalibrated using mills-1000g indel gold standard file.
the position is chr2 47641560
the snapshot is attached.
I am using 3.0 and gvcf mode on one single sample.
notice there are 158X coverage at this position, 72 show allele base T, 72 are deletion. 84 show reference base A.
mutect can easily pick it up though the reject reason is nearby-gap-event.
Any input is highly appreciated. Thanks in advance.

• Member Posts: 4

btw, the lower panel in the igv snapshot shows the bam files by -outputbam. clearly haplotypecaller doesn't use the allele reads for local assembly.
also the position is a well-known position for colon cancer risk predisposition.

Hmm, what do the qualities look like? Can you post the gVCF record you get?

Geraldine Van der Auwera, PhD

• Member Posts: 4

Hi Gerald,
I attached the gvcf output for your reference.

Thanks. This looks pretty messy and I don't know what is the best way to deal with it, but I'll ask the team for some help. Might need to wait until after the weekend for an answer though.

Geraldine Van der Auwera, PhD

Hi there, we're not sure what's going on so we need a snippet of your bam file to debug this locally. Can you please upload a bug report to our FTP server? Instructions are here: http://www.broadinstitute.org/gatk/guide/article?id=1894

Geraldine Van der Auwera, PhD

• Member Posts: 4

Hello Geraldine:
I uploaded a file names msh2-intron.tgz onto gsa ftp. see detail below.
Thanks for your effort on this!
Zheng

Thanks for the test data. I've put this in the bug tracker; I can't guarantee we'll be able to look at this very soon as we are very busy right now, but I'll let you know in this thread when we know more.

Geraldine Van der Auwera, PhD

Sorry for the very late reply, your question dropped to the bottom of our priority queue by accident. I've had the devs look at your data; they don't think the SNP looks real, because there's a lot of PCR error in the data and none of the reads with the single mismatch actually reads through the repeat structure. To get GATK to call a SNP here, if it is actually real, you'd need to have cleaner data.

Geraldine Van der Auwera, PhD

• NorwayMember Posts: 5

Hi Geraldine,
I have a similar question,
I am also unable to call variant on the same position, In MSH2 gene at ch2:47641560 (c.942+3A>T)
It is a known pathogenic mutation. ( https://www.ncbi.nlm.nih.gov/clinvar/variation/36580/ )
and this variant have been mention in literature for not getting called ( http://www.sciencedirect.com/science/article/pii/S152515781630143X ), they say that "located in the 3′ end of exon 5 in a difficult-to-sequence homopolymer stretch of 27 adenines"
so, I am OK with the fact that my VC pipeline is not calling this variant. (though T being 42% and most of them passing the phread-quality score). We have verified this variants in this sample through Sanger sequencing.
attached images 1_forward, & 1_reverse), (Purple:forward strand, green:reverse strand).
BUT..........
just next to this position at ch2:47641561 (c.942+4A>T), a variant is called for the same sample, though T is only 3% (4/121) at that position.
(attached images 2_forward & 2_reverse, purple:forward strand, green:reverse strand).

I could not understand the reason...

CAN YOU EXPLAIN IT??

Thanks & Happy Chrismas

Ashish