If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Less unfiltered reads than filtered ones
we are working with GATK 3.2.2 and focus on an optimization of the SNP and indel calling in the context of our data. Thereby, we also determine the frequency of a certain variant. However, we came across some strange results in this case.
In the vcf files there is of course the DP in the info field, representing the unfiltered depth, and the DP in the format field, representing the filtered depth. Yet, we also calculated the depth by ourselves in the bam file and looked it up in the IGV.
One especially strange case is the following:
For a sample GATK calls a deletion. The unfiltered DP (info field) is DP=644. The filtered DP (format field) is DP=505. It consists of 438 reads with the original genotype and 67 reads containing the variant. However, in our raw bam file and in the IGV things are different. There we see 499 reads in total. 425 containing a C (reference), one with an N and 73 with a deletion. We checked and the results are reproduceable.
Furthermore, we see similar differences as well. There are many cases in which the number of reads containing the variant or the reference allele as we count it and as it is observable in the IGV is smaller than the reported number of filtered reads.
How can this happen? I'm really looking forward to your answers.