Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
GenotypeGVCFs - Sites missing
I'm trying to call genotypes at a set of given positions from ~5,000 samples. To do so, I first generate gvcf files using HaplotypeCaller for the region around a gene for all samples separately with HaplotypeCaller. Then I'm merging the single gvcf files into batches of 1000 and then I'm generating the genotypes at ~180 sites using GenotypeGVCFs and the flag --includeNonVariantSites. Everything runs without errors, but in the end I noticed that ~20 sites were missing from the final VCF file. I then looked into these sites and the single files in detail and found out that there are simply no genotypes called for some samples at some positions.
I attached a small example. When I'm running the following command, no variant is in the output file:
java -jar GenomeAnalysisTK.jar \ -R hg19.fa \ -L region.bed --includeNonVariantSites \ -T GenotypeGVCFs \ --variant wrong.gvcf --variant correct.gvcf \ -o out.vcf
The same command works when running it with only "correct.gvcf" as an input, but obviously not when running it with only "wrong.gvcf". I noticed that wrong.gvcf might show a variant at this exact position.