The current GATK version is 3.3-0

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Using Variant Annotator

Posts: 71GATK Developer mod
edited March 27

2 SNPs with significant strand bias

Several SNPs with excessive coverage

For a complete, detailed argument reference, refer to the GATK document page here.

Introduction

In addition to true variation, variant callers emit a number of false-positives. Some of these false-positives can be detected and rejected by various statistical tests. VariantAnnotator provides a way of annotating variant calls as preparation for executing these tests.

Description of the haplotype score annotation

Examples of Available Annotations

The list below is not comprehensive. Please use the --list argument to get a list of all possible annotations available. Also, see the FAQ article on understanding the Unified Genotyper's VCF files for a description of some of the more standard annotations.

Note that technically the VariantAnnotator does not require reads (from a BAM file) to run; if no reads are provided, only those Annotations which don't use reads (e.g. Chromosome Counts) will be added. But most Annotations do require reads. When running the tool we recommend that you add the -L argument with the variant rod to your command line for efficiency and speed.

Post edited by Sheila on
Tagged:

• Posts: 196Member ✭✭✭

Broken link under the 2nd image: "For a complete, detailed argument reference, refer to the GATK document page here." directs to here; http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_gatk_walkers_annotator_VariantAnnotator.html which no longer exists

Fixed, thanks for reporting it.

Geraldine Van der Auwera, PhD

• Posts: 9Member

I can't see any images in Firefox?

@lorania, the image links are temporarily broken. We'll try to get them back up soon.

Geraldine Van der Auwera, PhD

Whoops, sorry for the delay -- the image links are fixed now.

Geraldine Van der Auwera, PhD

• Posts: 4Member

I don't understand which walker I must use for these annotation options...

@sirmark, there are three walkers that can add almost any of these annotations to the data. UnifiedGenotyper and HaplotypeCaller can add most of them during the initial variant calling run. VariantAnnotator can add any of them as a separate annotation run. Please see the documentation for each of these walkers for details on how to specify which annotations you want added and which ones can be used.

Geraldine Van der Auwera, PhD

• United KingdomPosts: 269Member ✭✭

The advice "When running the tool we recommend that you add the -L argument with the variant rod to your command line for efficiency and speed." is so useful that it almost belongs in the documentation:

My expected run time went from 4 weeks to 2 hours.

Thanks for posting this advice in bold.

The -L argument is indeed very useful at many stages; we'll try to highlight these uses better. Thanks for pointing this out.

Geraldine Van der Auwera, PhD

• Posts: 103Member

Hi,

I thought HyplotypeCaller can add annotation to the variants, as pointed out by Geraldine in this thread: there are three walkers that can add almost any of these annotations to the data. UnifiedGenotyper and HaplotypeCaller can add most of them during the initial variant calling run. VariantAnnotator can add any of them as a separate annotation run.

However, after HaplotypeCaller calling, one of my variant callsets got caught in VQSR VariantRecalibrator step, I used GATK version 2.5-2 and the error message is below:

ERROR MESSAGE: Bad input: Values for HaplotypeScore annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations. See http://gatkforums.broadinstitute.org/discussion/49/using-variant-annotator

But sometime, I do have cases that my HaplotypeCaller derived variant callset actually can pass the VQSR step without any problem. Any suggestion on this? Do I have to use VariantAnnotator if not work, or some other issues?

Thanks and best

Mike

Hi Mike,

Are you working with a small callset? This can happen when your callset is too small and the only sites in the training set are ones without the HS annotation, even though other sites in your callset may have the HS annotation.

Geraldine Van der Auwera, PhD

• Posts: 103Member

Hi, Geraldine:

Thx for the input. Yes, it is relatively not so big, only 17 samples. How do I solve the issue? Have to use VariantAnnotator?

Thx again

Mike

• Posts: 103Member

Also in my VQSR VariantRecalibrator step, I used options: -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an FS -an MQ -an InbreedingCoeff

Since there is no HaplotypeScore annoation in my callset, does it make sense in my VQSR VariantRecalibrator step, I just take out the -an HaplotypeScore option, and VQSR shall be OK with it? If this works, I do not have to run through VariantAnnotator?

Which way is better? Run VariantAnnotator, or skip it and not use -an HaplotypeScore option in my VQSR VariantRecalibrator step? Hope taking out -an HaplotypeScore does not impact much on the VQSR model.

Best

Mike

• Posts: 196Member ✭✭✭

You don't use HaplotypeScore when running VQSR on HaplotypeCaller calls (Under FAQs);

Some of these annotations might not be the best for your particular dataset. For example, InbreedingCoeff is a population level statistic that requires at least 10 samples in order to be calculated. If your study design has more than 10 samples then it is recommended to be included.

Depth of coverage (the DP annotation invoked by Coverage) should not be used when working with hybrid capture datasets since there is extreme variation in the depth to which targets are captured! In whole genome experiments this variation is indicative of error but that is not the case in capture experiments.

Additionally, the UnifiedGenotyper produces a statistic called the HaplotypeScore which should be used for SNPs. This statistic isn't necessary for the HaplotypeCaller because that mathematics is already built into the likelihood function itself when calling full haplotypes.

Mike, the issue of dealing with small callsets has been addressed many times on this forum and in the documentation. Please search for existing answers. You are welcome to ask further questions if there are details in those answers that you have trouble with, but we cannot take the time to explain everything from scratch to each person.

And @Kurt makes a very good point about the HS annotation; you don't actually need to use this annotation if you are calling variants with HaplotypeCaller.

Geraldine Van der Auwera, PhD

• Posts: 37Member

Hi, I'm just using VariantAnnotator and I must be missing something as my additional annotations are not being added. E.g. I've asked for BaseCounts, I can see it in the new vcf header

INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">

but nowhere else in the file. My command is

java -jar /mnt/storage/system/usr/local/GenomeAnalysisTK-2.3-9/GenomeAnalysisTK.jar -R /mnt/storage/shared/genomes/hg19/gatk/gatk.ucsc.hg19.fasta -T VariantAnnotator \
-V UnifiedGenotyper.output.J204.snps.raw.vcf --dbsnp /mnt/storage/shared/genomes/hg19/gatk/dbsnp_132.hg19.vcf -o VariantAnnotator.output.J204.snps.raw.vcf \
-A AlleleBalance -A BaseCounts -A BaseQualityRankSumTest -A ChromosomeCounts -A DepthOfCoverage -A FisherStrand -A GCContent -A HaplotypeScore -A IndelType -A MappingQualityRankSumTest -A MappingQualityZeroFraction -A ReadPosRankSumTest \
-A RMSMappingQuality -A QualByDepth -A AlleleBalanceBySample -A DepthPerAlleleBySample -nt 20 -L truseq_exome_targeted_regions.hg19.bed.chr.bed

• Posts: 37Member

Having just read some other posts, I tried using the same annotations with UnifiedGenotyper, some of which work, e.g. I can see BaseCounts is annotated. But I still don't understand why the VariantAnnotator command didn't work.

• Posts: 458Member, GSA Collaborator ✭✭✭✭

The INFO line you posted is for BaseQRankSum, not BaseCounts. My suspicion is that the BaseCounts annotation needs the BAM file to actually make the counts. Unless I'm missing them somewhere, it doesn't look like you supplied the BAMs on your VariantAnnotator command line, but they would necessarily have been passed into UnifiedGenotyper

Lavinia, please be a little more thoughtful with your posts. As @pdexheimer points out, you're asking about BaseCounts but then post a header line about an annotation that is completely unrelated apart from having "Base" in the name. In addition to that, if you think about it, how could BaseCounts be annotated if you don't pass in a bam file with the actual bases? We're happy to help out, but our resources are limited and this is really something you should think through a little more before posting.

Geraldine Van der Auwera, PhD

• Posts: 37Member

Sorry, that was a mistake with grepping the VCF, I must have just grepped for Base rather than BaseCounts. But your point is correct, I must have misread something and had got it into my head that VariantAnnotator was to add additional annotation to a VCF, not to the BAM files, sorry about that.

No, you're correct to assume that VariantAnnotator is for annotating a vcf; the point is that the tool cannot compute the base counts annotation if it does not see the read data. That is why it works with UG but not with VA.

Geraldine Van der Auwera, PhD

• Posts: 20Member

Hello,

How does VariantAnnotator work with HaplotypeCaller for annotations that require the BAMs? For example, I see cases where the HaplotypeCaller annotated FS score is different from the score produced by VariantAnnotator. I think this makes sense due to HC's not relying directly upon the mapped reads. Does this mean that VariantAnnotator should not be considered reliable for annotations that require read /BAM level information for HaplotypeCaller produced VCFs? Will you be deprecating this functionality going forward?

Hi @lmose, sorry for the late reply. You're correct that when using VA for annotations that require seeing the BAM, you may see differences due to the HC's reassembly/realignment process. You should be able to use the -bamout functionality to have HC write out the realigned regions to a BAM file, which would then be appropriate as an input to VA.

Geraldine Van der Auwera, PhD

• United KingdomPosts: 269Member ✭✭
edited November 2014

I tried running VariantAnnotator on a samtools generated VCF using the original BAM files and these annotations:

-A Coverage -A FisherStrand -A HaplotypeScore -A MappingQualityRankSumTest -A QualByDepth -A RMSMappingQuality -A ReadPosRankSumTest -A InbreedingCoeff -A ChromosomeCounts -A GenotypeSummaries


Afterwards I was able to run VariantRecalibrator for the SNPs, but I got this error message for the INDELs:

##### ERROR MESSAGE: Bad input: Values for MQRankSum annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations. See http://gatkforums.broadinstitute.org/discussion/49/using-variant-annotator


When using these VariantRecalibrator annotations:

-an DP -an QD -an FS -an MQRankSum -an ReadPosRankSum


I just noticed the documentation on MappingQualityRankSumTest linking to this question from November 2012:

http://gatkforums.broadinstitute.org/discussion/1842/variant-annotator-not-annotating-mappingqualityranksumtest-and-readposranksumtest-for-indels


Is the recommendation still to "run your calls through the UG in GENOTYPE_GIVEN_ALLELES mode"?

I have the same problem with FisherStrand.

Post edited by tommycarstensen on

Hi,

The issue with using Variant Annotator for MappingQualityRankSumTest is that it needs the actual reads (present in the bam file) that are not present in the input vcf file. You can read more about how MappingQualityRankSumTest works here: https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_annotator_MappingQualityRankSumTest.php
Because that crucial information is not present in the input to Variant Annotator, Variant Annotator cannot annotate MappingQualityRankSumTest. This is the case for any annotation that needs the reads for the calculation.

Yes, the best recommendation for this case is the use GENOTYPE_GIVEN_ALLELES_MODE.

-Sheila

Hi ,

I made a mistake. You knew all of what I posted originally! The problem here is that RankSumTest annotations do not work in Variant Annotator.

-Sheila

• Posts: 44GATK Developer mod

@Sheila the link to DepthOfCovarage in the main article is broken.