The frontline support team will be unavailable to answer questions until May27th 2019. We will be back soon after. Thank you for your patience and we apologize for any inconvenience!
HaplotypeCaller silently dropping reads
I have generated a BAM file with 1442 heterozygous SNVs with data from real reads, where each variant has exactly 30 reads covering, and each read has a MQ of 20 or more. I ran HaplotypeCaller from GATK 3.4-46 over it, and happily it found each one correctly. However, it did not report the correct read depth (DP field) for those variants. Counting the variants, I get the following:
26 reads: 6/1442
27 reads: 44/1442
28 reads: 167/1442
29 reads: 448/1442
30 reads: 777/1442
So, just under half of the variants have a dropped read. On inspection of the -bamout results, it appears that the reads that are being dropped are those with a couple of base errors. There has been no realignment of the reads inside HaplotypeCaller. The summary at the end of the HaplotypeCaller log file says "0 reads were filtered out".
So, my question is if there is a way to get HaplotypeCaller to actually output the real read depth over each variant, minus the reads that are summarised at the end, without affecting the variant call at all.