Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

GATK resource bundles scattered_calling_intervals exclude small contigs

Hi there,

I was just going over some Haplotypecaller and VQSR results generated using your best practices Cromwell workflows, and found that the scattered_calling_intervals files you provide (and which those workflows use to operate over) do not cover the whole genome. For hg38, chrM and all of the alt/unplaced contigs are excluded. For b37, chrY is also excluded.

https://console.cloud.google.com/storage/browser/gatk-legacy-bundles/b37
https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0/

This seems like a fairly major bug that would cause people running your best practices to lose a good number of potentially important variants.

Best Answer

Answers

  • oneillkzaoneillkza Member

    Thank you for your reply, @AdelaideR

    It would definitely be helpful if that document were easy to find (for example, linked from the bundle page ), and more explicit. The implication is that only low-complexity regions such as centromeres are filtered out, which I don't think most people would expect to include genic regions.

    It's also frustrating that your WDL workflow ( haplotypecaller-gvcf-gatk4.wdl ) wants these in the format of 50 directories each with its own Picard interval list file, each containing around 10 intervals, with a text file specifying links to each of these files (which themselves have to be set up to be relative to the execution environment.

  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin

    @oneillkza I agree that a readme file of some type would be helpful for the resource bundle, we have bounced around a few ideas about how best to maintain the document among our diverse teams. I will pass along the comment about the WDL workflow to see where adjustments can be made to streamline this process.

Sign In or Register to comment.