We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!
gatk 3.6 MIN_DP much lower then DP
I'm analysing a set of bacterial isolates, some which are (almost) identical to the reference, and some which are very different. Despite the fact that the identical isolates have good coverage (80x), I end up filtering a lot of the SNPs for the identical isolates due to lack of depth (cutoff of 10). I was wondering if this is due to the way the g.vcf files are used.
Below is a typical part of the g.vcf file for one of the identical isolates
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample1 REF 1 . A <NON_REF> . . END=296981 GT:DP:GQ:MIN_DP:PL 0:82:99:8:0,252
The average depth(DP) is 82, but the lowest depth (MIN_DP)in that region of 300kb is 8. If any of the other samples in the same analysis have a SNP in this region, what will be the DP for
sample1 for that snip? Will it be 82 or 8?
If it is 8, every SNP in that regions for
sample1 will be hard filtered, even though the actual coverage in that region (and most likely for that SNP) is a lot higher. How can I prevent discarding all that data for samples that are highly similar to the reference used?
I'm using 3.6-44-ge7d1cd2