This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Can I use arbitrary genomic intervals in GemomicDBImport
I'm facing a problem of making variant calls from ~1500 WGS samples. I have successfully run HaplotypeCaller by breaking the "wgs_calling_regions.hg38.interval_list" which comes with the GATK resource bundle into individual chromosomes and used each sub file as the value for the "-L" parameter in HaplotypeCaller. Now I'm facing merging the gvcf files using GemomicDBImport under GATKv18.104.22.168. With gvcf files from ~1500 samples, merging them will take large memory (Tried 128GB but sill not enough) machines and the run time will take many days. Our local cluster only allows 72 hours for any computing job. Does anybody have a solution for this? I've beening thinking of using the genomic intervals in the "wgs_calling_regions.hg38.interval_list" file, however, the longest interval is 141,414,040bp for chr2 (97489619 238903659 + . intersection ACGTmer). So my question is: can I further break the larger single intervals into multiple smaller ones and use each of them to run GenomicDBImport?