To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

GenotypeGVCFs variant IDs

Hi there,

I am trying to use GenotypeGVCFs to perform joint genotyping on 16 samples. These 16 samples were sequenced twice on two different machines, so I actually have 32 readsets. I called variants for each using HaplotypeCaller, producing GVCFs and am now trying to combine these into a single multi-sample VCF, wherein the resultant multisample file will contain information for all variant loci across the cohort. However, since the samples have the same names, when I try to use GenotypeGVCFs, they are seemingly collapsed, so I only have 16 samples recorded in my output VCF. I tried specifying variant names in the format --variant:name input1.g.vcf with both GenotypeGVCFs and CombineGVCFs but had the same result - half the samples missing in the output. I know it is possible to do this using CombineVariants, but this will not take GVCF input. Is it possible to specify names for the variants when using GenotypeGVCFs?

I appreciate your help, many thanks in advance.

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @Caoimhe
    Hi,

    You need to specify the sample names in the read groups. However, the easiest thing to do now is manually edit the sample names in the GVCFs. As long as the sample names are different in the GVCFs, GenotypeGVCFs will process them as different samples.

    -Sheila

Sign In or Register to comment.