Bug Bulletin: we have identified a bug that affects indexing when producing gzipped VCFs. This will be fixed in the upcoming 3.2 release; in the meantime you need to reindex gzipped VCFs using Tabix.

Using Variant Annotator

delangeldelangel Posts: 71GSA Member mod
edited December 2012 in Methods and Workflows

2 SNPs with significant strand bias image

Several SNPs with excessive coverage image

For a complete, detailed argument reference, refer to the GATK document page here.

Introduction

In addition to true variation, variant callers emit a number of false-positives. Some of these false-positives can be detected and rejected by various statistical tests. VariantAnnotator provides a way of annotating variant calls as preparation for executing these tests.

Description of the haplotype score annotation image

Examples of Available Annotations

The list below is not comprehensive. Please use the --list argument to get a list of all possible annotations available. Also, see the FAQ article on understanding the Unified Genotyper's VCF files for a description of some of the more standard annotations.

Note that technically the VariantAnnotator does not require reads (from a BAM file) to run; if no reads are provided, only those Annotations which don't use reads (e.g. Chromosome Counts) will be added. But most Annotations do require reads. When running the tool we recommend that you add the -L argument with the variant rod to your command line for efficiency and speed.

Post edited by Geraldine_VdAuwera on

Comments

  • KurtKurt Posts: 110Member ✭✭✭

    Broken link under the 2nd image: "For a complete, detailed argument reference, refer to the GATK document page here." directs to here; http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_gatk_walkers_annotator_VariantAnnotator.html which no longer exists

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,276Administrator, GSA Member admin

    Fixed, thanks for reporting it.

    Geraldine Van der Auwera, PhD

  • loranialorania Posts: 9Member

    I can't see any images in Firefox?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,276Administrator, GSA Member admin

    @lorania, the image links are temporarily broken. We'll try to get them back up soon.

    Geraldine Van der Auwera, PhD

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,276Administrator, GSA Member admin

    Whoops, sorry for the delay -- the image links are fixed now.

    Geraldine Van der Auwera, PhD

  • sirmarksirmark Posts: 4Member

    I don't understand which walker I must use for these annotation options...

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,276Administrator, GSA Member admin

    @sirmark, there are three walkers that can add almost any of these annotations to the data. UnifiedGenotyper and HaplotypeCaller can add most of them during the initial variant calling run. VariantAnnotator can add any of them as a separate annotation run. Please see the documentation for each of these walkers for details on how to specify which annotations you want added and which ones can be used.

    Geraldine Van der Auwera, PhD

  • tommycarstensentommycarstensen Posts: 49Member

    The advice "When running the tool we recommend that you add the -L argument with the variant rod to your command line for efficiency and speed." is so useful that it almost belongs in the documentation: http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_annotator_VariantAnnotator.html

    My expected run time went from 4 weeks to 2 hours.

    Thanks for posting this advice in bold.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,276Administrator, GSA Member admin

    The -L argument is indeed very useful at many stages; we'll try to highlight these uses better. Thanks for pointing this out.

    Geraldine Van der Auwera, PhD

  • mikemike Posts: 103Member

    Hi,

    I thought HyplotypeCaller can add annotation to the variants, as pointed out by Geraldine in this thread: there are three walkers that can add almost any of these annotations to the data. UnifiedGenotyper and HaplotypeCaller can add most of them during the initial variant calling run. VariantAnnotator can add any of them as a separate annotation run.

    However, after HaplotypeCaller calling, one of my variant callsets got caught in VQSR VariantRecalibrator step, I used GATK version 2.5-2 and the error message is below:

    ERROR MESSAGE: Bad input: Values for HaplotypeScore annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations. See http://gatkforums.broadinstitute.org/discussion/49/using-variant-annotator

    But sometime, I do have cases that my HaplotypeCaller derived variant callset actually can pass the VQSR step without any problem. Any suggestion on this? Do I have to use VariantAnnotator if not work, or some other issues?

    Thanks and best

    Mike

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,276Administrator, GSA Member admin

    Hi Mike,

    Are you working with a small callset? This can happen when your callset is too small and the only sites in the training set are ones without the HS annotation, even though other sites in your callset may have the HS annotation.

    Geraldine Van der Auwera, PhD

  • mikemike Posts: 103Member

    Hi, Geraldine:

    Thx for the input. Yes, it is relatively not so big, only 17 samples. How do I solve the issue? Have to use VariantAnnotator?

    Thx again

    Mike

  • mikemike Posts: 103Member

    Also in my VQSR VariantRecalibrator step, I used options: -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an FS -an MQ -an InbreedingCoeff

    Since there is no HaplotypeScore annoation in my callset, does it make sense in my VQSR VariantRecalibrator step, I just take out the -an HaplotypeScore option, and VQSR shall be OK with it? If this works, I do not have to run through VariantAnnotator?

    Which way is better? Run VariantAnnotator, or skip it and not use -an HaplotypeScore option in my VQSR VariantRecalibrator step? Hope taking out -an HaplotypeScore does not impact much on the VQSR model.

    Thanks for your advice in advance!

    Best

    Mike

  • KurtKurt Posts: 110Member ✭✭✭

    You don't use HaplotypeScore when running VQSR on HaplotypeCaller calls (Under FAQs);

    Important notes about annotations Some of these annotations might not be the best for your particular dataset. For example, InbreedingCoeff is a population level statistic that requires at least 10 samples in order to be calculated. If your study design has more than 10 samples then it is recommended to be included.

    Depth of coverage (the DP annotation invoked by Coverage) should not be used when working with hybrid capture datasets since there is extreme variation in the depth to which targets are captured! In whole genome experiments this variation is indicative of error but that is not the case in capture experiments.

    Additionally, the UnifiedGenotyper produces a statistic called the HaplotypeScore which should be used for SNPs. This statistic isn't necessary for the HaplotypeCaller because that mathematics is already built into the likelihood function itself when calling full haplotypes.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,276Administrator, GSA Member admin

    Mike, the issue of dealing with small callsets has been addressed many times on this forum and in the documentation. Please search for existing answers. You are welcome to ask further questions if there are details in those answers that you have trouble with, but we cannot take the time to explain everything from scratch to each person.

    And @Kurt makes a very good point about the HS annotation; you don't actually need to use this annotation if you are calling variants with HaplotypeCaller.

    Geraldine Van der Auwera, PhD

  • LaviniaLavinia Posts: 37Member

    Hi, I'm just using VariantAnnotator and I must be missing something as my additional annotations are not being added. E.g. I've asked for BaseCounts, I can see it in the new vcf header

    INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">

    but nowhere else in the file. My command is

    java -jar /mnt/storage/system/usr/local/GenomeAnalysisTK-2.3-9/GenomeAnalysisTK.jar -R /mnt/storage/shared/genomes/hg19/gatk/gatk.ucsc.hg19.fasta -T VariantAnnotator \ -V UnifiedGenotyper.output.J204.snps.raw.vcf --dbsnp /mnt/storage/shared/genomes/hg19/gatk/dbsnp_132.hg19.vcf -o VariantAnnotator.output.J204.snps.raw.vcf \ -A AlleleBalance -A BaseCounts -A BaseQualityRankSumTest -A ChromosomeCounts -A DepthOfCoverage -A FisherStrand -A GCContent -A HaplotypeScore -A IndelType -A MappingQualityRankSumTest -A MappingQualityZeroFraction -A ReadPosRankSumTest \ -A RMSMappingQuality -A QualByDepth -A AlleleBalanceBySample -A DepthPerAlleleBySample -nt 20 -L truseq_exome_targeted_regions.hg19.bed.chr.bed Any advice appreciated, thanks.

  • LaviniaLavinia Posts: 37Member

    Having just read some other posts, I tried using the same annotations with UnifiedGenotyper, some of which work, e.g. I can see BaseCounts is annotated. But I still don't understand why the VariantAnnotator command didn't work.

  • pdexheimerpdexheimer Posts: 299Member, GSA Collaborator ✭✭✭

    The INFO line you posted is for BaseQRankSum, not BaseCounts. My suspicion is that the BaseCounts annotation needs the BAM file to actually make the counts. Unless I'm missing them somewhere, it doesn't look like you supplied the BAMs on your VariantAnnotator command line, but they would necessarily have been passed into UnifiedGenotyper

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,276Administrator, GSA Member admin

    Lavinia, please be a little more thoughtful with your posts. As @pdexheimer points out, you're asking about BaseCounts but then post a header line about an annotation that is completely unrelated apart from having "Base" in the name. In addition to that, if you think about it, how could BaseCounts be annotated if you don't pass in a bam file with the actual bases? We're happy to help out, but our resources are limited and this is really something you should think through a little more before posting.

    Geraldine Van der Auwera, PhD

  • LaviniaLavinia Posts: 37Member

    Sorry, that was a mistake with grepping the VCF, I must have just grepped for Base rather than BaseCounts. But your point is correct, I must have misread something and had got it into my head that VariantAnnotator was to add additional annotation to a VCF, not to the BAM files, sorry about that.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,276Administrator, GSA Member admin

    No, you're correct to assume that VariantAnnotator is for annotating a vcf; the point is that the tool cannot compute the base counts annotation if it does not see the read data. That is why it works with UG but not with VA.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.