Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Can I use arbitrary genomic intervals in GemomicDBImport
I'm facing a problem of making variant calls from ~1500 WGS samples. I have successfully run HaplotypeCaller by breaking the "wgs_calling_regions.hg38.interval_list" which comes with the GATK resource bundle into individual chromosomes and used each sub file as the value for the "-L" parameter in HaplotypeCaller. Now I'm facing merging the gvcf files using GemomicDBImport under GATKv220.127.116.11. With gvcf files from ~1500 samples, merging them will take large memory (Tried 128GB but sill not enough) machines and the run time will take many days. Our local cluster only allows 72 hours for any computing job. Does anybody have a solution for this? I've beening thinking of using the genomic intervals in the "wgs_calling_regions.hg38.interval_list" file, however, the longest interval is 141,414,040bp for chr2 (97489619 238903659 + . intersection ACGTmer). So my question is: can I further break the larger single intervals into multiple smaller ones and use each of them to run GenomicDBImport?