We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Is VariantAnnotator compatible with the GVCF workflow?

hrbigelowhrbigelow San FranciscoMember

Hi,

In the Best Practices for Germline SNP & Indel Discovery in Whole Genome and Exome Sequence, https://software.broadinstitute.org/gatk/best-practices/bp_3step.php?case=GermShortWGS&p=2,

the steps are listed as:

  1. PRE-PROCESSING
    Map to reference with BWA mem - Mark duplicates with Picard - Base quality score recalibration

  2. VARIANT DISCOVERY
    Generate GVCF per-sample with HaplotypeCaller - Perform joint genotyping - Filter variants

  3. CALLSET REFINEMENT
    (Optional) Refine genotypes - Annotate variants - Evaluate callset

However, after step 2, we have a multi-sample VCF file. But, in step 3, it is suggested to use VariantAnnotator to annotate the variants, but in the tool documentation it seems VariantAnnotator accepts VCF files (not GVCF files) and seems to assume they are single-sample VCF files.

Thanks,

Henry

Answers

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    @hrbigelow
    Hi Henry,

    The joint genotyping step with GenotypeGVCFs outputs a multi-sample VCF, not a GVCF. The GVCFs are intermediate files not meant to be used in final analysis. You can read more about them here. You can also read more about step 2 here.

    VariantAnnotator can accept multi-sample VCFs, not just single sample VCFs :smile:

    -Sheila

  • hrbigelowhrbigelow San FranciscoMember

    Hi Sheila,

    Thanks for the response. Yes, I'm aware that GenotypeGVCFs outputs a multi-sample VCF. I asked about the possibility of annotating GVCFs because it would be much more parallizable to annotate on a single-sample basis.

    But in any case, I might suggest updating the documentation page.

    The first command-line example says:

    "Annotate a VCF with dbSNP IDs and depth of coverage for each sample"

    and then shows a command-line with a single "-I input.bam" argument. It would be much clearer to have an example command-line showing multiple input bam files to emphasize that point. Also, I didn't find anywhere in the docs to explain how VariantAnnotator associates each .bam file with a particular sample in the VCF file. I suppose it is @RG SM: or some such field?

    Also, I think the VariantAnnotator page at https://software.broadinstitute.org/gatk/documentation/tooldocs/org_broadinstitute_gatk_tools_walkers_annotator_VariantAnnotator.php
    gives the wrong link in the first section; it says:

    This tool is designed to annotate variant calls based on their context (as opposed to functional annotation). Various annotation modules are available; see the for a complete list.

    But, the link simply links to the same page.

    Thanks,

    Henry

    Issue · Github
    by Sheila

    Issue Number
    1495
    State
    open
    Last Updated
  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    @hrbigelow
    Hi Henry,

    1) Got it. Yes, you can run VariantAnnotator on GVCFs, as they are valid VCFs.

    2) Hmm. We usually like to stick to the bare bones minimum for example commands, but I will bring it up to the team. Indeed, the SM field tells the tool what sample is in each BAM file.

    3) I will make a note to fix the link.

    Thanks for the suggestions.
    Sheila

Sign In or Register to comment.