Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Support for gVCF

Hi team,

Will piping into gvcftools (https://sites.google.com/site/gvcftools/home) be supported soon for the gVCF format? There is a quick fix to make GATK work with it: https://github.com/ctsa/gatk/compare/master...gvcf ... It'd just be a matter of getting it integrated in the main branch

Thanks!
-Konrad

Answers

  • ebanksebanks Broad InstituteMember, Broadie, Dev ✭✭✭✭

    Hey Konrad,

    gVCF adheres to the standard 4.1 format, so I'm not quite sure why those 2 patches are needed (and in fact I think they would produce technically incorrect results). This should just work as is (but I could be wrong so feel free to correct me).

  • csaunderscsaunders Member

    Hi Eric and Konrad,

    The modifications to GATK are described on the gvcftools site (https://sites.google.com/site/gvcftools/home/configuration-and-analysis/gatk_to_gvcf-usage) as follows:
    """
    Note that this branch of the GATK source has been slightly modified to print out two additional items at all non-variant sites (1) genotype quality (tag: GQ) and (2) unfiltered allele count (tag: AD).
    """

    These are required to distinguish confident reference calls from no-calls, which is critical in a clinical pipeline.

    Eric -- How specifically does this modification produce incorrect results? I'd be happy to correct it if this is the case.

    Konrad -- I hope the GATK team will correct me if I'm misrepresenting this, but I believe the hesitation in providing GQ at non-variant sites in GATK is that there are segments of the genome which are not properly assembled (such as the Mullikin fosmid examples), and in this case the GQ computed for a site is not accurate. I believe by a simple extension of this argument, we should never report GQ for any SNPs in the genome either, because these also occasionally occur in improperly assembled regions. Perhaps Eric or others from the GATK team could clarify this? Is the distinction that only variant sites are subject to VQSR?

    Best,

    -Chris

Sign In or Register to comment.