Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office on November 11th and 13th 2019, due to the U.S. holiday(Veteran's day) and due to a team event(Nov 13th). We will return to monitoring the GATK forum on November 12th and 14th respectively. Thank you for your patience.

Error in UnifiedGenotyper when calling a haploid genome

gilgigilgi Member
edited July 2012 in Ask the GATK team

Dear GATK team,

I tried calling SNPs and indels for a haploid genome, according to the instructions:

java -jar GenomeAnalysisTK.jar -R fasta_file.fasta -pnrm POOL -T UnifiedGenotyper -I my_merged_recal_realigned.bam -o snps.raw.vcf --sample_ploidy 1 --genotype_likelihoods_model POOLBOTH

But I am getting the error below.

Am I missing something?

.
.
.
INFO 15:19:40,055 TraversalEngine - chr06:254172 3.72e+06 11.6 m 3.1 m 30.6% 37.7 m 26.2 m
INFO 15:19:44,943 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.ArrayIndexOutOfBoundsException: 0
at org.broadinstitute.sting.gatk.walkers.genotyper.PoolIndelGenotypeLikelihoods.getLikelihoodOfConformation(PoolIndelGenotypeLikelihoods.java:198)
at org.broadinstitute.sting.gatk.walkers.genotyper.PoolGenotypeLikelihoods.calculateACConformationAndUpdateQueue(PoolGenotypeLikelihoods.java:553)
at org.broadinstitute.sting.gatk.walkers.genotyper.PoolGenotypeLikelihoods.computeLikelihoods(PoolGenotypeLikelihoods.java:512)
at org.broadinstitute.sting.gatk.walkers.genotyper.PoolIndelGenotypeLikelihoods.add(PoolIndelGenotypeLikelihoods.java:171)
at org.broadinstitute.sting.gatk.walkers.genotyper.PoolIndelGenotypeLikelihoods.add(PoolIndelGenotypeLikelihoods.java:65)
at org.broadinstitute.sting.gatk.walkers.genotyper.PoolGenotypeLikelihoodsCalculationModel.getLikelihoods(PoolGenotypeLikelihoodsCalculationModel.java:242)
at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyperEngine.calculateLikelihoods(UnifiedGenotyperEngine.java:277)
at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyperEngine.calculateLikelihoodsAndGenotypes(UnifiedGenotyperEngine.java:190)
at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyper.map(UnifiedGenotyper.java:350)
at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyper.map(UnifiedGenotyper.java:117)
at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:65)
at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:18)
at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:62)
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:269)
at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146)
at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:94)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.0-0-g4c0ffd4):
ERROR
ERROR Please visit the wiki to see if this is a known problem
ERROR If not, please post the error, with stack trace, to the GATK forum
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: 0
ERROR ------------------------------------------------------------------------------------------

Best Answer

Answers

  • delangeldelangel Broad InstituteMember ✭✭

    This seems to be a genuine bug - just to help us narrow it down, do you see it when you specify -glm POOLSNP and/or -glm POOLINDEL as well?
    Is it one single sample or multiple samples you are calling simultaneously?

  • gilgigilgi Member

    Thank you so much for the quick reply!

    I have a merged bam of multiple samples (but each sample is an individual haploid strain - I still need to use the "pool" commands right?)

    I tried -glm POOLSNP and this works good.
    -glm POOLINDEL gave the same error as POOLBOTH.

    Please let me know if you want me to check more things on my side.

  • delangeldelangel Broad InstituteMember ✭✭

    Oh I see you're using 2.0-0. If you update to the latest published version (I think we're in 2.0-23) does the problem still happens? We fixed several UnifiedGenotyper-related bugs in the meanwhile.

  • gilgigilgi Member

    I downloaded the 2.0-23 and still getting the error:

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 2.0-23-ge9a19be):
    ERROR
    ERROR Please visit the wiki to see if this is a known problem
    ERROR If not, please post the error, with stack trace, to the GATK forum
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: 0
    ERROR ------------------------------------------------------------------------------------------

    Besides the error - just to make sure - conceptually - is it OK to use the "POOL" options, though my data isn't pool but rather individual strains?

  • delangeldelangel Broad InstituteMember ✭✭

    Yes - the "POOL" naming is a bit historical since the motivation for its development was to call pools and then later we realized we could use the same modules for generalized ploidy calling. In fact, we'll be simplifying the arguments in a future release.

  • gilgigilgi Member

    OK,thanks a lot. I thought so, but wanted to be sure.
    So do you think that currently there is a bug in calling indels for haploid genomes?

  • delangeldelangel Broad InstituteMember ✭✭

    Yes, but it's not present all the time - I'm suspecting it's a corner condition the code is not handling correctly. In the site where the error happens, do you have coverage in all samples? Does it only happen if you set -ploidy 1 but not a larger value? (even if larger values are non-sensical in your application, it may help us understand the problem).

  • delangeldelangel Broad InstituteMember ✭✭

    Also, if you specify "-maxAlleles 1" does the problem still happen?

  • gilgigilgi Member

    Thanks for all help!
    I checked, and I don't have coverage at all samples at the site it happens.
    I tried, and it happens also if I set it only happen if you set -ploidy 2
    It happens also when I add -maxAlleles 1

  • delangeldelangel Broad InstituteMember ✭✭
    Accepted Answer

    I put in a potential fix in the latest GATK - not sure if it'll solve your problem but can you pls download the latest and check? thanks

  • gilgigilgi Member

    Thanks a lot!!! This is working!

Sign In or Register to comment.