Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

ERROR : IndexOutOfBoundsException: Index: 3, Size: 3 in getAlternateAllele, when genotyping variant

Hi,
I'm currently trying to use GATK to call variants from Human RNA seq data

So far, I've managed to do variant calling in all my samples following the GATK best practice guidelines. (using HaplotypeCaller in DISCOVERY mode on each sample separately)

But I'd like to go further and try to get the genotype in every sample, of each variant found in at least one sample.
This, to differentiate for each variant, samples where that variant is absent (homozygous for reference allele) from samples where it is not covered (and therefore note genotyped).

To do so, I've first used CombineVariants to merge variants from all my samples and to create the list of variants to be genotype ${ALLELES}.vcf

I then try to regenotype my samples with HaplotypeCaller using the GENOTYPE_GIVEN_ALLELES mode and the same settings as before:
my command is the following:

******java -jar ${GATKPATH}/GenomeAnalysisTK.jar -T HaplotypeCaller -R ${GENOMEFILE}.fa -I ${BAMFILE_CALIB}.bam
--genotyping_mode GENOTYPE_GIVEN_ALLELES -alleles ${ALLELES}.vcf -out_mode EMIT_ALL_SITES
-dontUseSoftClippedBases -stand_emit_conf 20 -stand_call_conf 20
-o ${SAMPLE}_genotypes_all_variants.vcf
-mbq 25 -L ${CDNA_BED}.bed --dbsnp ${DBSNP}.vc**f


In doing so I invariably get the same error after calling 0.2% of the genome.

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at htsjdk.variant.variantcontext.VariantContext.getAlternateAllele(VariantContext.java:845)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCallerGenotypingEngine.assignGenotypeLikelihoods(HaplotypeCallerGenotypingEngine.java:248)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:1059)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:221)
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:709)
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:705)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:274)
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:78)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:319)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:107)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.3-0-g37228af):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Index: 3, Size: 3
ERROR ------------------------------------------------------------------------------------------

because the problem seemed to originate from getAlternateAllele, I tried to play with --max_alternate_alleles by setting it to 2 or 10, without success.
I also checked my ${ALLELES}.vcf file to look for malformed Alternate alleles in the region where the GATK crashes (Chr 1, somewhere after 78Mb) , but I couldn't identify any... (I searched for Alternate alles that would not match the following extended regexpr '[ATGC,]+')

I would be grateful for any help you can provide.
Thanks.

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi there,

    This could be a bug. Unfortunately the GGA mode does not work well with HC at the moment, we need to fix it. For what you're trying to do, you could try to use the reference confidence / GVCF-based workflow that is now Best-Practice for DNAseq. We haven't yet validated it for RNAseq, but in principle it should work. Alternatively, if you have a reasonable number of samples, you may be able to run HC in regular mode on all your samples together, using your list of sites as interval list (passed to the -L argument) in order to limit analysis to just those sites of interest.

Sign In or Register to comment.