Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Minimum depth requirement when creating gVCFs files?
I am interested in creating genome-wide gVCFs from various sources, including exome samples. The rationale is that it is not completely clear what the target region actually is all the time, and it would be good to keep some flexibility about what I capture.
Nevertheless, these single sample gVCF files can become quite large, but most of the lines in the file are used to store very low depth data ( DP <= 2) which is probably not critical for downstream analysis. I would be interested to have, in the HaplotypeCaller gVCF data generation process, an additional parameter that sets as missing regions with, say, 2 reads or less. That would considerably reduce the storage at little cost in terms of discarded information.
Does such a parameter exist already? Would it make sense at all to add this? I am not sure but right now it seems to me it might be useful.