To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at

ERROR : IndexOutOfBoundsException: Index: 3, Size: 3 in getAlternateAllele, when genotyping variant

I'm currently trying to use GATK to call variants from Human RNA seq data

So far, I've managed to do variant calling in all my samples following the GATK best practice guidelines. (using HaplotypeCaller in DISCOVERY mode on each sample separately)

But I'd like to go further and try to get the genotype in every sample, of each variant found in at least one sample.
This, to differentiate for each variant, samples where that variant is absent (homozygous for reference allele) from samples where it is not covered (and therefore note genotyped).

To do so, I've first used CombineVariants to merge variants from all my samples and to create the list of variants to be genotype ${ALLELES}.vcf

I then try to regenotype my samples with HaplotypeCaller using the GENOTYPE_GIVEN_ALLELES mode and the same settings as before:
my command is the following:

******java -jar ${GATKPATH}/GenomeAnalysisTK.jar -T HaplotypeCaller -R ${GENOMEFILE}.fa -I ${BAMFILE_CALIB}.bam
--genotyping_mode GENOTYPE_GIVEN_ALLELES -alleles ${ALLELES}.vcf -out_mode EMIT_ALL_SITES
-dontUseSoftClippedBases -stand_emit_conf 20 -stand_call_conf 20
-o ${SAMPLE}_genotypes_all_variants.vcf
-mbq 25 -L ${CDNA_BED}.bed --dbsnp ${DBSNP}.vc**f

In doing so I invariably get the same error after calling 0.2% of the genome.

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
at java.util.ArrayList.rangeCheck(
at java.util.ArrayList.get(
at htsjdk.variant.variantcontext.VariantContext.getAlternateAllele(
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(
at org.broadinstitute.gatk.engine.CommandLineGATK.main(

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.3-0-g37228af):
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions
ERROR MESSAGE: Index: 3, Size: 3
ERROR ------------------------------------------------------------------------------------------

because the problem seemed to originate from getAlternateAllele, I tried to play with --max_alternate_alleles by setting it to 2 or 10, without success.
I also checked my ${ALLELES}.vcf file to look for malformed Alternate alleles in the region where the GATK crashes (Chr 1, somewhere after 78Mb) , but I couldn't identify any... (I searched for Alternate alles that would not match the following extended regexpr '[ATGC,]+')

I would be grateful for any help you can provide.


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi there,

    This could be a bug. Unfortunately the GGA mode does not work well with HC at the moment, we need to fix it. For what you're trying to do, you could try to use the reference confidence / GVCF-based workflow that is now Best-Practice for DNAseq. We haven't yet validated it for RNAseq, but in principle it should work. Alternatively, if you have a reasonable number of samples, you may be able to run HC in regular mode on all your samples together, using your list of sites as interval list (passed to the -L argument) in order to limit analysis to just those sites of interest.

Sign In or Register to comment.