We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Hi GATK experts,
I have generated individual gVCFs for 384 samples using HaplotypeCaller. These gVCF files were then processed through CombineGVCFs for creating combined gVCF files for each chromosome. I am using GATK 4 tools, so only one gVCF file can be input to GenotypeGVCFs. Do I need to run intermediate chromosome gVCFs through CombineGVCFs again to create an all chromosome gVCF or is there some tool to concatenate gVCF files? Intermediate chromosome-wise files were created to save time as each chromosome was taking 4-5 days to run and there are 12 chromosomes in total. I am not sure if running these intermediate files again through CombineGVCFs will be any faster or will end up taking same amount of time if the gVCFs were not split on chromosomes?
I thought about using CatVariants but realised it only takes vcf files as input.
My samples have mix ploidies (2, 4 and 6) so can't use GenomicsDB option.
The doc says "dbSNP is not used in any way for the calculations themselves. --dbsnp binds reference ordered data". Does it mean that the determination of whether a locus is a variant is not influenced by whether that variant is present at dbSNP? what does "--dbsnp binds reference ordered data" mean?
Also why isn't there a --indel option?