The Frontline Support team will be offline February 18 for President's Day but will be back February 19th. Thank you for your patience as we get to all of your questions!
HaplotypeCaller/ Variantannotator no allele balance tag for all SNPs

Version 3.1.1. Human normal samples.
I couldnt find AlleleBalance and AlleleBalanceBySample tags in my vcf outputs. Tags are not found even for single variant
I tried HaplotypeCaller with -all or directly with -A AlleleBalance or -A AlleleBalanceBySample.
Also I tried Variantannotator with -all or -A AlleleBalance or -A AlleleBalanceBySample.
Any help will be apreciated
Best Answers
-
jamie_a Cambridge, UK ✭
Hi there,
I can also report the same problem as above, Variant Annotator will not report AlleleBalance or AlleleBalanceBySample. The annotation is defined in the vcf header.
-
Sheila Broad Institute admin
Hi,
It turns out some of the annotations are not quite working right now. We are working on a new document that lists which annotations work and which ones do not work. This should be out by next week.
AB is one of the annotations I have found to show up in the header but not in the annotations.
We are working on a fix for this.
-Sheila
Answers
What is the way to have "AB-AlleleBalance" in the VCF output---?
@dvelayutham
Hi,
Can you please post your command line? How did you look for the annotation? Can you post an example of the biallelic heterozygous SNP from your VCF?
Thanks,
Sheila
Hi
$GATK3_1 -T HaplotypeCaller -R $genome --dbsnp $dbsnp -I input.bam -stand_call_conf 50 -stand_emit_conf 30 -o out.vcf -nct 12 -log vcf.log -L $interval -pcrModel AGGRESSIVE -XL $blacklist -PF $performance.log -A AlleleBalance -A AlleleBalanceBySample
and example SNP for the same is
hr1 6257826 rs2294714 T C 5002.44 AC=2;AF=0.500;AN=4;BaseQRankSum=2.877;DB;DP=342;FS=3.136;MLEAC=2;MLEAF=0.500;Q=59.92;MQ0=0;MQRankSum=2.078;QD=14.63;ReadPosRankSum=0.826 GT:AD:GQ:PL 0/1:79,90:99:2995,0,2453 0/1:103,70:99:2036,0,3253
Thanks
@dvelayutham
Hi,
Can you please check if the annotation is defined in the header of the VCF?
Thanks,
Sheila
Hi there,
I can also report the same problem as above, Variant Annotator will not report AlleleBalance or AlleleBalanceBySample. The annotation is defined in the vcf header.
Yep, infact header as "AB" is present,as well as AB het, AB hom is present in header..!
@jamie_a @dvelayutham
Hi,
It turns out some of the annotations are not quite working right now. We are working on a new document that lists which annotations work and which ones do not work. This should be out by next week.
AB is one of the annotations I have found to show up in the header but not in the annotations.
We are working on a fix for this.
-Sheila
Has this problem been fixed? I am seeing the same issue in GATK version 3.4-46.
Hi,
Just updating on @smgogarten observation issue. That issue was not related to the GATK version or the issue in this article.
Regards,
Kurt
@smgogarten @Kurt
Hi,
I think (hopefully!) if you upgrade to the latest version 3.5, you will find that you get all the annotations you request. Let us know if something is not working in the latest version.
Thanks,
Sheila
I should have been little more careful about this.
I called my variant using HTcaller which takes about 3 days per sample.
And, now I need the AB values for each samples - is it possible to calculate and add the AB fields in the VCF file post variant calling?
I found a python script online https://gist.github.com/mjclark/1057839 for the job, but its failing which I think has to do how vcf are handled in GATK vs. this program.
Thanks,
@everestial007
Hi,
You should be able to use VariantAnnotator.
-Sheila
Thank you @Sheila !
But the documentation link for VariantAnnotator isn't working.
@everestial007
Hi,
Perhaps try again? It is working for me now. The website sometimes does not work, but when you try again in a few minutes, it does work!
-Sheila
What do you mean by "doesn't work"? Does the page never load, is there an error message, or does the page load part but not all the content?
@Sheila @Geraldine_VdAuwera
Its working now. Previously the link would open a new page but would not display any contents.
Thanks its working !
@Geraldine_VdAuwera
I was trying to get the AlleleBalanceBySample information in my vcf, but for some reason I can not find the option to create that with VariantAnnotator.
I have a vcf file with all my samples and I have AD for each sample. However, I would like to have a field, per sample, indicating the proportion of each allele (in case of heterozygosity), to be able to filter variants that are false positives (per sample).
With AlleleBalanceBySample I was able to filter by AB in format field for each sample. (https://software.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_annotator_AlleleBalanceBySample.php)
Do you have other way to do this now?
Thanks,
Ana Marques
@armarkes
Hi Ana,
Did you input the BAM file when you ran VariantAnnotator?
Please also post the exact command you ran and the exact version of GATK you are using.
Thanks,
Sheila
I am using GATK version 3.4.
What I was trying to say is that in previous documentation you refer that we can annotate VCF with AlleleBalanceBySample information using VariantAnnotator. However I don´t find this option when running VariantAnnotator tool.
By the way I also don´t understand how can I give my BAM file, if I have a BAM file for each sample and my VCF in an output from GenotypeGVCFs tool after calling with HaplotypeCaller. Should I separate again by sample?
Thanks for helping.
@Sheila
Now I was able to do this step with this command:
java -jar /GenomeAnalysisTK.jar -T VariantAnnotator -R hg19.fa -I BAM.list -V ALL_samples_onlySNP_filter.vcf --dbsnp dbsnp_138.hg19.vcf -alwaysAppendDbsnpId -A AlleleBalanceBySample -o ALL_samples_onlySNP_annotated.vcf
I am sorry I did not understand the step of AlleleBalance at first, because for my INDEL vcf did not work, but it worked with SNPs.
Why does this function does not work with INDELS? My INDEL variants do not have 0.5 of heterozygoty for most of subjects (normally it is 0.2 of heterozygoty. Does this mean that is a false variant? Or do you recommend different threshold of minimum allele frequency for INDELS?
Thanks a lot.
Ana Rita
@armarkes
Hi Ana,
Ah, yes. The annotation is not available for indels currently. Have a look at the documentation.
-Sheila
@Shelia
Hi Sheila,
I saw in the documentation that AlleleBalance or AlleleBalanceBySample can't be calculated for indels and searched the forums to see if that is still true (in case the documentation is out of date) and found this thread. Is that still the case? If so, do you know the reason why? It doesn't seem to me that the formula contains anything that is not available for indels, unless I'm missing something. With some exome capture kits that have low median coverage, we have a high rate of indels with low allele balance, so it would be very helpful to have this annotation specifically for indels in order to use for filtering. Even if GATK maintains the disclaimer that it is experimental and results should be interpreted with caution, it would be nice if we could "experiment" with it.
Thanks,
Andrew
Hi @andrewo,
Could you figure out what you need using the related DepthPerAlleleBySample annotation? I suspect the reason this annotation isn't available for indels is that representation introduces ambiguity to the interpretation.
Hi @shlee,
Thanks for the reply. AlleleBalanceBySample could be helpful for manual correction of genotypes for individual samples and I could potentially use DepthPerAlleleBySample for this (by computing AlleleBalanceBySample annotations on the fly). However it seems that implementing the AlleleBalance annotation in VQSR would be much more powerful since this appears to be a systemic problem with this dataset, and I would prefer that over doing anything manual. One of the things I wanted to experiment with is adding ABHet to the suggested variant-level annotations so GATK can also learn what is a good allele balance from the positive variants, so we could get rid of the others. I can't think of what is ambiguous about the interpretation of AlleleBalance for indels compared to SNPs -- can you explain?
An alternative might be using allele count (AC) annotation, but it seems that with the variable depth in exome sequencing that might not work -- it only makes sense as a fraction of the depth. If you have any other ideas I'm open to them. (Sorry, just realizing I maybe should have opened a new thread.)
Issue · Github
by shlee
Hi @andrewo,
I'll ask someone on the team to followup with you.
@andrewo As far as I can remember it's just that the indel case was not implemented at the time, and this annotation hasn't been developed any further due to lack of interest on our side. To be frank it hasn't even been ported to GATK4. It might still get ported in the future (or we would accept a pull-request if someone wants to take a stab at it) but considering it hasn't bubbled up as being worthwhile for us in the past, I wouldn't recommend holding your breath. In terms of filtering, we've put a lot of our eggs in the deep learning basket, and we have a prototype tool that is intended to replace VQSR that is performing much better, especially on indels. I realize that doesn't help you solve your right-now problem of course... Unfortunately we have limited resources, considering all the work that needs to be done, and so we have to prioritize our efforts quite brutally.