The frontline support team will be unavailable to answer questions on April 15th and 17th 2019. We will be back soon after. Thank you for your patience and we apologize for any inconvenience!
Difference in GenotypeGVCFs generated VCF after consolidation with GenomicsDBimport and CombineGVCF
I had a set of total 81 GVCFs that I first consolidated using GenomicsDBimport and then using CombineGVCF and then GenotypeGVCF was run in both cases. For GenomicsDBimport, I ran the command per contig and then I ran GenotypeGVCF on each database to get the final VCF file. Then I used Picard GatherGVCF to make the final VCF. The commands I used are wriiten below:
java -Xmx90g -jar gatk-package-184.108.40.206-local.jar GenomicsDBImport -R water_buffalo_re_arranged_chrom_ref_genome.fa --TMP_DIR ./tmp --sample-name-map sample_names_map_new.txt --reader-threads 2 --genomicsdb-workspace-path "$contig" -L "$contig"
java -Xmx8G -XX:ConcGCThreads=1 -jar gatk-package-220.127.116.11-local.jar GenotypeGVCFs -R /water_buffalo_re_arranged_chrom_ref_genome.fa -new-qual -V gendb://"$contig" -O "$contig"_variants.vcf.gz
java -jar picard.jar GatherVcfs INPUT=list.txt OUTPUT=Final_med_buffalo_variants_81_samples.vcf.gz
java -Xmx200g -XX:ConcGCThreads=1 -jar gatk-package-18.104.22.168-local.jar CombineGVCFs -R water_buffalo_re_arranged_chrom_ref_genome.fa --variant All_gvcf_gz.list -O combined_81.g.vcf.gz
java -Xmx8G -XX:ConcGCThreads=1 -jar gatk-package-22.214.171.124-local.jar GenotypeGVCFs -R water_buffalo_re_arranged_chrom_ref_genome.fa -new-qual -V combined_81.g.vcf.gz -O Final_variants_81_samples_using_CombineGVCF.vcf.gz
The final VCF in both the cases should be the same. Unfortunately, it was not. On running bcftools isec, I found that some variants were common to one VCF and some were in other. What could be the reason behind this discrepancy?
Kindly let me know if you need more information.