Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

GenotypeGVCFs tool gives different output depending on the order of input GVCFs?

serhat_tserhat_t TurkeyMember
edited August 2017 in Ask the GATK team

Hi,
I have been using GATK GenotypeGVCFs tool (versions 3.5, 3.7 and 4.0). It has come to my attention that depending on the order of input GVCFs, the output slightly changes, i.e. the total number of variants in the output VCF changes. For example, everything else kept constant, the following two command line arguments output slightly different VCFs.

1)
java -jar GenomeAnalysisTK.jar -T GenotypeGVCFs -R reference.fasta --variant sample1.g.vcf --variant sample2.g.vcf -o output.vcf
2)
java -jar GenomeAnalysisTK.jar -T GenotypeGVCFs -R reference.fasta --variant sample2.g.vcf --variant sample1.g.vcf -o output.vcf

I have observed this in GATK 3.5 and 3.7 versions. GATK 4 for some reason does not work with multiple GVCFs, which I talk about in a different question. There is no parallelization applied whatsoever. Does anyone have any idea what's going on?

Thanks a lot.

Answers

Sign In or Register to comment.