Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
speed up variant detection by splitting genome into chromosomes
I have a a really deep (150x coverage) data for which I need to perform variant detection. Which of the two options is more effective to speed up the variant detection:
1. I run the whole data in one go and use -nt and -nct options wherever possible.
2. Or, I split up the genome bam files into 3 or 4 sets of chromosomes and then run them in parallel (with lower number of -nt and -nct).
If I go with option 2, can I merge the vcf files from all parallel runs (from different chromosomes) right after running HaplotypeCaller? Is that what is recommended to make sure that I dont have too small of a variant set necessary for recalibration (which is the issue I am facing right now)?