GenotypeGVCFs variant IDs

Hi there,

I am trying to use GenotypeGVCFs to perform joint genotyping on 16 samples. These 16 samples were sequenced twice on two different machines, so I actually have 32 readsets. I called variants for each using HaplotypeCaller, producing GVCFs and am now trying to combine these into a single multi-sample VCF, wherein the resultant multisample file will contain information for all variant loci across the cohort. However, since the samples have the same names, when I try to use GenotypeGVCFs, they are seemingly collapsed, so I only have 16 samples recorded in my output VCF. I tried specifying variant names in the format --variant:name input1.g.vcf with both GenotypeGVCFs and CombineGVCFs but had the same result - half the samples missing in the output. I know it is possible to do this using CombineVariants, but this will not take GVCF input. Is it possible to specify names for the variants when using GenotypeGVCFs?

I appreciate your help, many thanks in advance.


  • SheilaSheila Broad InstituteMember, Broadie, Moderator


    You need to specify the sample names in the read groups. However, the easiest thing to do now is manually edit the sample names in the GVCFs. As long as the sample names are different in the GVCFs, GenotypeGVCFs will process them as different samples.


