Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Difference in GenotypeGVCFs generated VCF after consolidation with GenomicsDBimport and CombineGVCF
I had a set of total 81 GVCFs that I first consolidated using GenomicsDBimport and then using CombineGVCF and then GenotypeGVCF was run in both cases. For GenomicsDBimport, I ran the command per contig and then I ran GenotypeGVCF on each database to get the final VCF file. Then I used Picard GatherGVCF to make the final VCF. The commands I used are wriiten below:
java -Xmx90g -jar gatk-package-126.96.36.199-local.jar GenomicsDBImport -R water_buffalo_re_arranged_chrom_ref_genome.fa --TMP_DIR ./tmp --sample-name-map sample_names_map_new.txt --reader-threads 2 --genomicsdb-workspace-path "$contig" -L "$contig"
java -Xmx8G -XX:ConcGCThreads=1 -jar gatk-package-188.8.131.52-local.jar GenotypeGVCFs -R /water_buffalo_re_arranged_chrom_ref_genome.fa -new-qual -V gendb://"$contig" -O "$contig"_variants.vcf.gz
java -jar picard.jar GatherVcfs INPUT=list.txt OUTPUT=Final_med_buffalo_variants_81_samples.vcf.gz
java -Xmx200g -XX:ConcGCThreads=1 -jar gatk-package-184.108.40.206-local.jar CombineGVCFs -R water_buffalo_re_arranged_chrom_ref_genome.fa --variant All_gvcf_gz.list -O combined_81.g.vcf.gz
java -Xmx8G -XX:ConcGCThreads=1 -jar gatk-package-220.127.116.11-local.jar GenotypeGVCFs -R water_buffalo_re_arranged_chrom_ref_genome.fa -new-qual -V combined_81.g.vcf.gz -O Final_variants_81_samples_using_CombineGVCF.vcf.gz
The final VCF in both the cases should be the same. Unfortunately, it was not. On running bcftools isec, I found that some variants were common to one VCF and some were in other. What could be the reason behind this discrepancy?
Kindly let me know if you need more information.