Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
gatk 3.6 MIN_DP much lower then DP
I'm analysing a set of bacterial isolates, some which are (almost) identical to the reference, and some which are very different. Despite the fact that the identical isolates have good coverage (80x), I end up filtering a lot of the SNPs for the identical isolates due to lack of depth (cutoff of 10). I was wondering if this is due to the way the g.vcf files are used.
Below is a typical part of the g.vcf file for one of the identical isolates
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample1 REF 1 . A <NON_REF> . . END=296981 GT:DP:GQ:MIN_DP:PL 0:82:99:8:0,252
The average depth(DP) is 82, but the lowest depth (MIN_DP)in that region of 300kb is 8. If any of the other samples in the same analysis have a SNP in this region, what will be the DP for
sample1 for that snip? Will it be 82 or 8?
If it is 8, every SNP in that regions for
sample1 will be hard filtered, even though the actual coverage in that region (and most likely for that SNP) is a lot higher. How can I prevent discarding all that data for samples that are highly similar to the reference used?
I'm using 3.6-44-ge7d1cd2