Genotyping known variants with GenotypeGVCFs
Hi Geraldine, Sheila, and others,
Now that it seems that a fix for the "GGA" feature in HaplotypeCaller may be forthcoming in some future GATK release has been made in a recent GATK 4 release, I was wondering about the prospects for a
GENOTYPE_GIVEN_ALLELES mode for GenotypeGVCFs. There understandably seems to be a significant amount of interest in this sort of capability, e.g. here, and here.
To be more concrete, I'm thinking about something that would function like the following example:
Given the following as known variants of interest, specified in an
#CHROM POS ID REF ALT QUAL FILTER INFO 20 10000694 . G A . . . 20 10001661 . T C . . .
...GenotypeGVCFs would take an input gVCF, like the simplified single-sample example below:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1 20 10000694 . G T,<NON_REF> . . . GT:DP:GQ 0/1:29:99 20 10000695 . G <NON_REF> . . END=10001999 GT:DP:GQ 0/0:0:0
...and produce something like the following as output:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1 20 10000694 . G A,<NON_REF> . . . GT 0/2 20 10001661 . T C,<NON_REF> . . . GT ./.
With GATK 3, I believe there was a partial solution involving use of
-allSites -L sites.vcf, but from the current GenotypeGVCFs source code, it looks like the
-allSites option is now ignored and unsupported (and is also omitted from the GATK 4 docs).
Should I be writing my own tools to genotype known variants from the gVCFs, or is this something where we might expect an official GenotypeGVCFs feature in the not-too-distant future? Or maybe there is some straightforward way of doing this currently (aside from using UnifiedGenotyper or HaplotypeCaller with BAM file input and
GENOTYPE_GIVEN_ALLELES) that I have overlooked?
Thanks very much in advance,