Why is this variant not called?
I've been playing around with RNA-seq data and came across what seems to be a high quality single nucleotide variant but that isn't detected by HaplotypeCaller.
I followed the Calling variants in RNAseq method, except for the alignment which was carried out by tophat2 (and looks good). Then I used MarkDuplicates, ReorderSam, SplitNCigarReads, BaseRecalibrator, PrintReads and finally HaplotypeCaller.
When I look at the reads that come out of the PrintReads steps with IGV, I can see a lot of high quality reads with the variant sequence (see attachment).
However the HaplotypeCaller doesn't report this site as variant in the VCF output. I first thought it was filtered but setting stand_call_conf and stand_emit_conf to 0.0, and adding EMIT_ALL_SITES doesn't make a difference on this site (although I can see lots of low quality sites appear as expected).
java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -I Cet1_2.OS9_rg.bam -dontUseSoftClippedBases -stand_call_conf 0.0 -stand_emit_conf 0.0 -o Cet1_2.OS9_all_sites.vcf -R Homo_sapiens.GRCh38.dna.primary_assembly.fa -out_mode EMIT_ALL_SITES
This site is called in a replicate experiment (same cell line, different sample prep) and is assigned a very high quality (QD = 25.34, DP=1298). The two samples look very similar and I can't understand what's going on and why the variant isn't called?
Any idea on how to tackle this issue would be welcome.