We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Why is this variant not called?

robin@lindinglab.org[email protected] DenmarkMember
edited August 2015 in Ask the GATK team

Hi there,

I've been playing around with RNA-seq data and came across what seems to be a high quality single nucleotide variant but that isn't detected by HaplotypeCaller.

I followed the Calling variants in RNAseq method, except for the alignment which was carried out by tophat2 (and looks good). Then I used MarkDuplicates, ReorderSam, SplitNCigarReads, BaseRecalibrator, PrintReads and finally HaplotypeCaller.

When I look at the reads that come out of the PrintReads steps with IGV, I can see a lot of high quality reads with the variant sequence (see attachment).

However the HaplotypeCaller doesn't report this site as variant in the VCF output. I first thought it was filtered but setting stand_call_conf and stand_emit_conf to 0.0, and adding EMIT_ALL_SITES doesn't make a difference on this site (although I can see lots of low quality sites appear as expected).

java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -I Cet1_2.OS9_rg.bam -dontUseSoftClippedBases -stand_call_conf 0.0 -stand_emit_conf 0.0 -o Cet1_2.OS9_all_sites.vcf -R Homo_sapiens.GRCh38.dna.primary_assembly.fa -out_mode EMIT_ALL_SITES

This site is called in a replicate experiment (same cell line, different sample prep) and is assigned a very high quality (QD = 25.34, DP=1298). The two samples look very similar and I can't understand what's going on and why the variant isn't called?

Any idea on how to tackle this issue would be welcome.


Best Answer


Sign In or Register to comment.