To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

HaplotypeCaller/ Variantannotator no allele balance tag for all SNPs

Version 3.1.1. Human normal samples.

I couldnt find AlleleBalance and AlleleBalanceBySample tags in my vcf outputs. Tags are not found even for single variant
I tried HaplotypeCaller with -all or directly with -A AlleleBalance or -A AlleleBalanceBySample.
Also I tried Variantannotator with -all or -A AlleleBalance or -A AlleleBalanceBySample.

Any help will be apreciated

Best Answers

Answers

  • dvelayuthamdvelayutham MilanMember

    What is the way to have "AB-AlleleBalance" in the VCF output---?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @dvelayutham‌

    Hi,

    Can you please post your command line? How did you look for the annotation? Can you post an example of the biallelic heterozygous SNP from your VCF?

    Thanks,
    Sheila

  • dvelayuthamdvelayutham MilanMember

    Hi
    $GATK3_1 -T HaplotypeCaller -R $genome --dbsnp $dbsnp -I input.bam -stand_call_conf 50 -stand_emit_conf 30 -o out.vcf -nct 12 -log vcf.log -L $interval -pcrModel AGGRESSIVE -XL $blacklist -PF $performance.log -A AlleleBalance -A AlleleBalanceBySample

    and example SNP for the same is
    hr1 6257826 rs2294714 T C 5002.44 AC=2;AF=0.500;AN=4;BaseQRankSum=2.877;DB;DP=342;FS=3.136;MLEAC=2;MLEAF=0.500;Q=59.92;MQ0=0;MQRankSum=2.078;QD=14.63;ReadPosRankSum=0.826 GT:AD:GQ:PL 0/1:79,90:99:2995,0,2453 0/1:103,70:99:2036,0,3253

    Thanks

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @dvelayutham‌
    Hi,

    Can you please check if the annotation is defined in the header of the VCF?

    Thanks,
    Sheila

  • dvelayuthamdvelayutham MilanMember

    Yep, infact header as "AB" is present,as well as AB het, AB hom is present in header..!

  • smgogartensmgogarten University of WashingtonMember

    Has this problem been fixed? I am seeing the same issue in GATK version 3.4-46.

  • Hi,

    Just updating on @smgogarten observation issue. That issue was not related to the GATK version or the issue in this article.

    Regards,

    Kurt

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @smgogarten @Kurt
    Hi,

    I think (hopefully!) if you upgrade to the latest version 3.5, you will find that you get all the annotations you request. Let us know if something is not working in the latest version.

    Thanks,
    Sheila

  • everestial007everestial007 GreensboroMember

    I should have been little more careful about this.
    I called my variant using HTcaller which takes about 3 days per sample.
    And, now I need the AB values for each samples - is it possible to calculate and add the AB fields in the VCF file post variant calling?
    I found a python script online https://gist.github.com/mjclark/1057839 for the job, but its failing which I think has to do how vcf are handled in GATK vs. this program.

    Thanks,

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @everestial007
    Hi,

    You should be able to use VariantAnnotator.

    -Sheila

  • everestial007everestial007 GreensboroMember

    Thank you @Sheila !
    But the documentation link for VariantAnnotator isn't working.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @everestial007
    Hi,

    Perhaps try again? It is working for me now. The website sometimes does not work, but when you try again in a few minutes, it does work!

    -Sheila

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    What do you mean by "doesn't work"? Does the page never load, is there an error message, or does the page load part but not all the content?

  • everestial007everestial007 GreensboroMember

    @Sheila @Geraldine_VdAuwera
    Its working now. Previously the link would open a new page but would not display any contents.

    Thanks its working !

  • armarkesarmarkes LisbonMember

    @Geraldine_VdAuwera

    I was trying to get the AlleleBalanceBySample information in my vcf, but for some reason I can not find the option to create that with VariantAnnotator.
    I have a vcf file with all my samples and I have AD for each sample. However, I would like to have a field, per sample, indicating the proportion of each allele (in case of heterozygosity), to be able to filter variants that are false positives (per sample).

    With AlleleBalanceBySample I was able to filter by AB in format field for each sample. (https://software.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_annotator_AlleleBalanceBySample.php)

    Do you have other way to do this now?

    Thanks,
    Ana Marques

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @armarkes
    Hi Ana,

    Did you input the BAM file when you ran VariantAnnotator?

    Please also post the exact command you ran and the exact version of GATK you are using.

    Thanks,
    Sheila

  • armarkesarmarkes LisbonMember

    I am using GATK version 3.4.

    What I was trying to say is that in previous documentation you refer that we can annotate VCF with AlleleBalanceBySample information using VariantAnnotator. However I don´t find this option when running VariantAnnotator tool.

    By the way I also don´t understand how can I give my BAM file, if I have a BAM file for each sample and my VCF in an output from GenotypeGVCFs tool after calling with HaplotypeCaller. Should I separate again by sample?

    Thanks for helping.

  • armarkesarmarkes LisbonMember

    @Sheila

    Now I was able to do this step with this command:
    java -jar /GenomeAnalysisTK.jar -T VariantAnnotator -R hg19.fa -I BAM.list -V ALL_samples_onlySNP_filter.vcf --dbsnp dbsnp_138.hg19.vcf -alwaysAppendDbsnpId -A AlleleBalanceBySample -o ALL_samples_onlySNP_annotated.vcf

    I am sorry I did not understand the step of AlleleBalance at first, because for my INDEL vcf did not work, but it worked with SNPs.

    Why does this function does not work with INDELS? My INDEL variants do not have 0.5 of heterozygoty for most of subjects (normally it is 0.2 of heterozygoty. Does this mean that is a false variant? Or do you recommend different threshold of minimum allele frequency for INDELS?

    Thanks a lot.
    Ana Rita

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @armarkes
    Hi Ana,

    Ah, yes. The annotation is not available for indels currently. Have a look at the documentation.

    -Sheila

  • @Shelia
    Hi Sheila,
    I saw in the documentation that AlleleBalance or AlleleBalanceBySample can't be calculated for indels and searched the forums to see if that is still true (in case the documentation is out of date) and found this thread. Is that still the case? If so, do you know the reason why? It doesn't seem to me that the formula contains anything that is not available for indels, unless I'm missing something. With some exome capture kits that have low median coverage, we have a high rate of indels with low allele balance, so it would be very helpful to have this annotation specifically for indels in order to use for filtering. Even if GATK maintains the disclaimer that it is experimental and results should be interpreted with caution, it would be nice if we could "experiment" with it.
    Thanks,
    Andrew

  • shleeshlee CambridgeMember, Broadie, Moderator

    Hi @andrewo,

    Could you figure out what you need using the related DepthPerAlleleBySample annotation? I suspect the reason this annotation isn't available for indels is that representation introduces ambiguity to the interpretation.

  • Hi @shlee,
    Thanks for the reply. AlleleBalanceBySample could be helpful for manual correction of genotypes for individual samples and I could potentially use DepthPerAlleleBySample for this (by computing AlleleBalanceBySample annotations on the fly). However it seems that implementing the AlleleBalance annotation in VQSR would be much more powerful since this appears to be a systemic problem with this dataset, and I would prefer that over doing anything manual. One of the things I wanted to experiment with is adding ABHet to the suggested variant-level annotations so GATK can also learn what is a good allele balance from the positive variants, so we could get rid of the others. I can't think of what is ambiguous about the interpretation of AlleleBalance for indels compared to SNPs -- can you explain?
    An alternative might be using allele count (AC) annotation, but it seems that with the variable depth in exome sequencing that might not work -- it only makes sense as a fraction of the depth. If you have any other ideas I'm open to them. (Sorry, just realizing I maybe should have opened a new thread.)

    Issue · Github
    by shlee

    Issue Number
    2813
    State
    closed
    Last Updated
    Assignee
    Array
    Closed By
    vdauwera
  • shleeshlee CambridgeMember, Broadie, Moderator

    Hi @andrewo,

    I'll ask someone on the team to followup with you.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    @andrewo As far as I can remember it's just that the indel case was not implemented at the time, and this annotation hasn't been developed any further due to lack of interest on our side. To be frank it hasn't even been ported to GATK4. It might still get ported in the future (or we would accept a pull-request if someone wants to take a stab at it) but considering it hasn't bubbled up as being worthwhile for us in the past, I wouldn't recommend holding your breath. In terms of filtering, we've put a lot of our eggs in the deep learning basket, and we have a prototype tool that is intended to replace VQSR that is performing much better, especially on indels. I realize that doesn't help you solve your right-now problem of course... Unfortunately we have limited resources, considering all the work that needs to be done, and so we have to prioritize our efforts quite brutally.

Sign In or Register to comment.