The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

#### ☞ Did you remember to?

1. Search using the upper-right search box, e.g. using the error message.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

#### ☞ Formatting tip!

Surround blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ` ) each to make a code block.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

# I am not sure why HaplotypeCaller does not call my SNV mutation?

Member Posts: 4

Hi Guys,
I am using HaplotypeCaller to call mutations for some of patient samples. I know that at MSH2 intron5 near splicing site there is a point mutation in one of the sample. however, this is also a region in Mills-1000g known indel file. Not matter how I try, Haplotypecaller can NOT call the SNV but always call the indel. the bam file is realigned and recalibrated using mills-1000g indel gold standard file.
the position is chr2 47641560
the snapshot is attached.
I am using 3.0 and gvcf mode on one single sample.
notice there are 158X coverage at this position, 72 show allele base T, 72 are deletion. 84 show reference base A.
mutect can easily pick it up though the reject reason is nearby-gap-event.
Any input is highly appreciated. Thanks in advance.

Tagged:

• Member Posts: 4

btw, the lower panel in the igv snapshot shows the bam files by -outputbam. clearly haplotypecaller doesn't use the allele reads for local assembly.
also the position is a well-known position for colon cancer risk predisposition.

Hmm, what do the qualities look like? Can you post the gVCF record you get?

Geraldine Van der Auwera, PhD

• Member Posts: 4

Hi Gerald,
I attached the gvcf output for your reference.

Thanks. This looks pretty messy and I don't know what is the best way to deal with it, but I'll ask the team for some help. Might need to wait until after the weekend for an answer though.

Geraldine Van der Auwera, PhD

Hi there, we're not sure what's going on so we need a snippet of your bam file to debug this locally. Can you please upload a bug report to our FTP server? Instructions are here: http://www.broadinstitute.org/gatk/guide/article?id=1894

Geraldine Van der Auwera, PhD

• Member Posts: 4

Hello Geraldine:
I uploaded a file names msh2-intron.tgz onto gsa ftp. see detail below.
Thanks for your effort on this!
Zheng

220 ProFTPD 1.3.3g Server (Broad Institute of MIT and Harvard) [69.173.80.251]
230 Anonymous access granted, restrictions apply
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> put msh2-intron.tgz
local: msh2-intron.tgz remote: msh2-intron.tgz
227 Entering Passive Mode (140,163,99,30,68,216)
150 File status okay; about to open data connection.
100% |**************************************************************************************************************************************************************| 406 KiB 38.77 MiB/s 00:00 ETA
226 Transfer complete, closing data connection.
416015 bytes sent in 00:01 (335.73 KiB/s)
ftp>

Thanks for the test data. I've put this in the bug tracker; I can't guarantee we'll be able to look at this very soon as we are very busy right now, but I'll let you know in this thread when we know more.

Geraldine Van der Auwera, PhD

Sorry for the very late reply, your question dropped to the bottom of our priority queue by accident. I've had the devs look at your data; they don't think the SNP looks real, because there's a lot of PCR error in the data and none of the reads with the single mismatch actually reads through the repeat structure. To get GATK to call a SNP here, if it is actually real, you'd need to have cleaner data.

Geraldine Van der Auwera, PhD

• NorwayMember Posts: 5

Hi Geraldine,
I have a similar question,
I am also unable to call variant on the same position, In MSH2 gene at ch2:47641560 (c.942+3A>T)
It is a known pathogenic mutation. ( https://www.ncbi.nlm.nih.gov/clinvar/variation/36580/ )
and this variant have been mention in literature for not getting called ( http://www.sciencedirect.com/science/article/pii/S152515781630143X ), they say that "located in the 3′ end of exon 5 in a difficult-to-sequence homopolymer stretch of 27 adenines"
so, I am OK with the fact that my VC pipeline is not calling this variant. (though T being 42% and most of them passing the phread-quality score). We have verified this variants in this sample through Sanger sequencing.
attached images 1_forward, & 1_reverse), (Purple:forward strand, green:reverse strand).
BUT..........
just next to this position at ch2:47641561 (c.942+4A>T), a variant is called for the same sample, though T is only 3% (4/121) at that position.
(attached images 2_forward & 2_reverse, purple:forward strand, green:reverse strand).

I could not understand the reason...

CAN YOU EXPLAIN IT??

Thanks & Happy Chrismas

Ashish