Bug Bulletin: The recent 3.2 release fixes many issues. If you run into a problem, please try the latest version before posting a bug report, as your problem may already have been solved.

NullPointerException in HaplotypeCaller for RNAseq

MaehlerMaehler NorwayPosts: 5Member

Hi,

I was trying to call variants in RNAseq data using GATK 3.0 when I got the following stack trace:

##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace 
java.lang.NullPointerException
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.computeDiploidHaplotypeLikelihoods(PairHMMLikelihoodCalculationEngine.java:421)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.computeDiploidHaplotypeLikelihoods(PairHMMLikelihoodCalculationEngine.java:395)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.calculateGLsForThisEvent(GenotypingEngine.java:385)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.assignGenotypeLikelihoods(GenotypingEngine.java:222)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:872)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:141)
        at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:708)
        at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:704)
        at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version nightly-2014-03-10-gf78001a):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Code exception (see stack trace for error itself)
##### ERROR ------------------------------------------------------------------------------------------

Here are the command line arguments:

Program Args: -T HaplotypeCaller -I in.bam -R ref.fa -o raw.snps.indels.vcf -nct 8 -recoverDanglingHeads -dontUseSoftClippedBases -stand_call_conf 20 -stand_emit_conf 20

As you can see, I got the error above from one of the nightly builds. Before that I also tried version 3.0-0-g6bad1c6, and this produced the exact same error. What's curious about this is that it didn't fail in the same place each time. I did this on 20 samples, and for the first run, 15 of the samples failed with this error. One of the samples failed after 7 minutes, so I decided to try that one again to see if I could reproduce it, but it went past the point (both in time and genomic position) where it failed the first time.

I decided to download a nightly build (version nightly-2014-03-10-gf78001a) and see if this had been fixed, but again, 15 of the samples failed. However, it was not the same set of samples that failed as with the other version.

The reads were aligned using STAR, and prior to this step I ran SplitNCigarReads and IndelRealigner.

Thanks, Niklas

Comments

  • ebanksebanks Posts: 678GATK Developer mod

    Hi there,

    Given your stack trace, I'm not sure that this is directly related to RNA-seq. However it is definitely a bug that we would like to fix. If you are able to help us by uploading some data that reproduces the issue we'd appreciate it. Instructions for doing this are detailed here: http://gatkforums.broadinstitute.org/discussion/1894/how-do-i-submit-a-detailed-bug-report

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • MaehlerMaehler NorwayPosts: 5Member

    Tomorrow I'll try to see if I can find an example that fails every time. As I said, it seems a bit random. I'll get back to you.

  • ebanksebanks Posts: 678GATK Developer mod

    That's probably because you are running with -nct. Turn that off to get deterministic behavior.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • MaehlerMaehler NorwayPosts: 5Member

    I'm afraid this can take a while. When not using -nct, it is expected to finish in about 6 days. Let's just hope that the estimate is way off or that it fails early. :)

  • ebanksebanks Posts: 678GATK Developer mod

    You might want to consider looking into using something like Queue to help "scatter" your jobs.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • MaehlerMaehler NorwayPosts: 5Member

    I ran a sample that previously failed for two different versions of GATK, not using -nct, and this time I did not get the error. I'll continue to run some tests to see if I can find any region that can reproduce this.

  • KStammKStamm Posts: 26Member

    Update, error still exists in version 3.1-1. I'm doing whole genomes, paired end, this is one sample with -nct 4

    INFO  14:07:57,616 ProgressMeter -     1:121479521        4.34e+08  111.7 m       15.0 s      4.2%        44.2 h    42.3 h
    INFO  14:08:57,616 ProgressMeter -     1:121482063        4.34e+08  112.7 m       15.0 s      4.2%        44.6 h    42.7 h
    INFO  14:09:33,053 GATKRunReport - Uploaded run statistics report to AWS S3
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR stack trace
    java.lang.NullPointerException
            at org.broadinstitute.sting.gatk.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.computeDiploidHaplotypeLikeli
    hoods(PairHMMLikelihoodCalculationEngine.java:443)
            at org.broadinstitute.sting.gatk.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.computeDiploidHaplotypeLikeli
    hoods(PairHMMLikelihoodCalculationEngine.java:417)
            at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.calculateGLsForThisEvent(GenotypingEngine.java:
    385)
            at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.assignGenotypeLikelihoods(GenotypingEngine.java
    :222)
            at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:880)
            at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:141)
            at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.jav
    a:708)
            at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.jav
    a:704)
            at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
            at java.util.concurrent.FutureTask.run(FutureTask.java:262)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
            at java.lang.Thread.run(Thread.java:744)
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A GATK RUNTIME ERROR has occurred (version 3.1-1-g07a4bf8):
    
  • MaehlerMaehler NorwayPosts: 5Member

    I have been unable to reproduce this while not using the -nct option, everything runs fine without it. Instead, as a workaround, I put some effort into learning how Queue works and parallelized my workflow with scatter-gather instead.

Sign In or Register to comment.