Support for gVCF

Hi team,

Will piping into gvcftools ( be supported soon for the gVCF format? There is a quick fix to make GATK work with it: ... It'd just be a matter of getting it integrated in the main branch



  • ebanksebanks Broad InstituteMember, Broadie, Dev ✭✭✭✭

    Hey Konrad,

    gVCF adheres to the standard 4.1 format, so I'm not quite sure why those 2 patches are needed (and in fact I think they would produce technically incorrect results). This should just work as is (but I could be wrong so feel free to correct me).

  • csaunderscsaunders Member

    Hi Eric and Konrad,

    The modifications to GATK are described on the gvcftools site ( as follows:
    Note that this branch of the GATK source has been slightly modified to print out two additional items at all non-variant sites (1) genotype quality (tag: GQ) and (2) unfiltered allele count (tag: AD).

    These are required to distinguish confident reference calls from no-calls, which is critical in a clinical pipeline.

    Eric -- How specifically does this modification produce incorrect results? I'd be happy to correct it if this is the case.

    Konrad -- I hope the GATK team will correct me if I'm misrepresenting this, but I believe the hesitation in providing GQ at non-variant sites in GATK is that there are segments of the genome which are not properly assembled (such as the Mullikin fosmid examples), and in this case the GQ computed for a site is not accurate. I believe by a simple extension of this argument, we should never report GQ for any SNPs in the genome either, because these also occasionally occur in improperly assembled regions. Perhaps Eric or others from the GATK team could clarify this? Is the distinction that only variant sites are subject to VQSR?



