Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!

Error with VariantAnnotator

I am having a problem with VariantAnnotator.

I am running the command:
java1.7 -jar /usr/local/packages/GATK3/GenomeAnalysisTK.jar \
-R /projects4/ruth/Burkholderia/cutadapt/cenocepacia/B_cenocepacia_J2315.fasta \
-T VariantAnnotator \
-I N501.C8967_R1.fastq_to_B_cenocepacia_J2315.sorted.RG.bam \
-o output2.vcf \
-V N501.C8967_R1.fastq_to_B_cenocepacia_J2315.sorted_GATK.vcf \
-A AlleleBalance \
-A BaseCounts \
-A Coverage \
-A FisherStrand \
-A GenotypeSummaries \
-A LowMQ \
-A RMSMappingQuality \
-A AlleleBalanceBySample

Where the .vcf file was made using GATK HaplotypeCaller.

The error I get is:

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.ArrayIndexOutOfBoundsException: 0
at org.broadinstitute.gatk.tools.walkers.annotator.AlleleBalanceBySample.annotateWithPileup(AlleleBalanceBySample.java:127)
at org.broadinstitute.gatk.tools.walkers.annotator.AlleleBalanceBySample.annotate(AlleleBalanceBySample.java:113)
at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotatorEngine.annotateGenotypes(VariantAnnotatorEngine.java:420)
at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:216)
at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:192)
at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotator.map(VariantAnnotator.java:312)
at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotator.map(VariantAnnotator.java:85)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.4-46-gbc02625):
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR ------------------------------------------------------------------------------------------

Interestingly the error does not occur when I run exactly the same command with a vcf file created using samtools (and the sam bam file).

I have installed the most up-to-date version of GATK (3.4-46) in case that was causing the error, but that does not seem to be the answer.

Any suggestions about what is causing this error would be greatly appreciated.




  • SheilaSheila Broad InstituteMember, Broadie admin
    edited November 2015

    Hi Ruth,

    I just ran the same command on some of my own test files and it ran with no error. Can you please try validating your bam file and VCF file? http://broadinstitute.github.io/picard/command-line-overview.html#ValidateSamFile



  • RuthRuth Member

    Thanks for getting back to me.

    You are right, the vcf file does fail validation - with the error:

    ERROR MESSAGE: File /projects4/ruth/Burkholderia/cutadapt/cenocepacia/bowtie/N501.C8967_R1.fastq_to_B_cenocepacia_J2315.sorted_GATK.vcf fails strict validation: one or more of the ALT allele(s) for the record at position B_cenocepacia_J2315.fasta:5487 are not observed at all in the sample genotypes

    However, I am a bit confused about this, as I created the vcf file using GATK and the command:
    gatk3 \
    -T HaplotypeCaller \
    -R /projects4/ruth/Burkholderia/cutadapt/cenocepacia/B_cenocepacia_J2315.fasta \
    -I N501.C8967_R1.fastq_to_B_cenocepacia_J2315.sorted.RG.bam \
    --emitRefConfidence GVCF \
    --variant_index_type LINEAR \
    --variant_index_parameter 128000 \
    -o N501.C8967_R1.fastq_to_B_cenocepacia_J2315.sorted_GATK.vcf

    Which appeared to run without any error. So why would this vcf fail validation?



  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Ah, that's actually a development oversight -- ValidateVariants was not updated to interpret the NON REF allele correctly in gVCFs.

    So this validation error does not explain the problem you're seeing. What happens if you run the gVCF through GenotypeGVCFs then run VariantAnnotator on that?

    Utlimately, the gVCF file is not meant to be used as an end product so it is possible that VariantAnnotator is also choking on it for some reason that we don't yet understand. What would be especially helpful would be if you can narrow down the error to a particular record or subset of records. If the problem is linked to the gVCF format, running on just a tiny region should still reproduce the error.

  • RuthRuth Member


    Yes, it seems to work OK if I run GenotypeGVCFs first. I just put one --variant in (which was my vcf file).
    However, running this looses a lot of information, as my vcf file now has a lot fewer lines (57985 as opposed to 554164)

    Is there any way to fix this that does not involve shortening my vcf file?



  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @Ruth, the final VCF is expected to be much shorter than the GVCF file -- this is completely normal and not a cause for concern. If this does not make sense to you, please read the explanation of the workflow that is given in the Best Practices documentation.

    You can have GenotypeGVCFs emit all sites including non-variant sites if that's what you want (see the tool documentation for available arguments), but it will take up a lot more space and may not be useful, depending on what you want to analyze.

  • RuthRuth Member


    My aim is to generate a complete genome sequence for each of my samples. I don't want to assume the reference sequence for all positions that are not SNVs, so I wanted to generate a gVCF file, so that I can then run the VariantAnnotator on every position, so I know which non-SNV positions are low confidence as well as SNV positions. Then I plan to run FastaAlternateReferenceMaker with a SNP mask on all sites that fail specific filters identified by the VariantAnnotator, whether they are SNVs or not.

    Thank you for pointing me to the best practices, as by reading through, I have realised that perhaps HaplotypeCaller with the option --emitRefConfidence BP_RESOLUTION is more appropriate for my question. However, this produces the same error when I put the output into VariantAnnotator.

    Do you know of any way I can annotate a vcf file with positions for the whole genome?



  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Ah, I see, that makes sense. Then what you need is definitely to run GenotypeGVCFs with -allSites output (be sure to check the argument name, I may have misspelled it). Non-variant sites will include the RGQ annotation which gives you an estimate of reference genotype confidence that you can use to filter and then mask out low-confidence ref sites.

  • RuthRuth Member

    That is exactly what I need. Thank you very much for your help.

Sign In or Register to comment.