# NullPointerException in HaplotypeCaller for RNAseq

NorwayMember

Hi,

I was trying to call variants in RNAseq data using GATK 3.0 when I got the following stack trace:

##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace
java.lang.NullPointerException
at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:708) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:704)
at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version nightly-2014-03-10-gf78001a):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR
##### ERROR MESSAGE: Code exception (see stack trace for error itself)
##### ERROR ------------------------------------------------------------------------------------------


Here are the command line arguments:

Program Args: -T HaplotypeCaller -I in.bam -R ref.fa -o raw.snps.indels.vcf -nct 8 -recoverDanglingHeads -dontUseSoftClippedBases -stand_call_conf 20 -stand_emit_conf 20


As you can see, I got the error above from one of the nightly builds. Before that I also tried version 3.0-0-g6bad1c6, and this produced the exact same error. What's curious about this is that it didn't fail in the same place each time. I did this on 20 samples, and for the first run, 15 of the samples failed with this error. One of the samples failed after 7 minutes, so I decided to try that one again to see if I could reproduce it, but it went past the point (both in time and genomic position) where it failed the first time.

I decided to download a nightly build (version nightly-2014-03-10-gf78001a) and see if this had been fixed, but again, 15 of the samples failed. However, it was not the same set of samples that failed as with the other version.

The reads were aligned using STAR, and prior to this step I ran SplitNCigarReads and IndelRealigner.

Thanks,
Niklas

Hi there,

Given your stack trace, I'm not sure that this is directly related to RNA-seq. However it is definitely a bug that we would like to fix. If you are able to help us by uploading some data that reproduces the issue we'd appreciate it. Instructions for doing this are detailed here:

• NorwayMember

Tomorrow I'll try to see if I can find an example that fails every time. As I said, it seems a bit random. I'll get back to you.

That's probably because you are running with -nct. Turn that off to get deterministic behavior.

• NorwayMember

I'm afraid this can take a while. When not using -nct, it is expected to finish in about 6 days. Let's just hope that the estimate is way off or that it fails early.

You might want to consider looking into using something like Queue to help "scatter" your jobs.

• NorwayMember

I ran a sample that previously failed for two different versions of GATK, not using -nct, and this time I did not get the error. I'll continue to run some tests to see if I can find any region that can reproduce this.

• Member

Update, error still exists in version 3.1-1. I'm doing whole genomes, paired end, this is one sample with -nct 4

INFO  14:07:57,616 ProgressMeter -     1:121479521        4.34e+08  111.7 m       15.0 s      4.2%        44.2 h    42.3 h
INFO  14:08:57,616 ProgressMeter -     1:121482063        4.34e+08  112.7 m       15.0 s      4.2%        44.6 h    42.7 h
INFO  14:09:33,053 GATKRunReport - Uploaded run statistics report to AWS S3
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace
java.lang.NullPointerException
hoods(PairHMMLikelihoodCalculationEngine.java:443)
hoods(PairHMMLikelihoodCalculationEngine.java:417)
385)
:222)
at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.jav a:708) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.jav
a:704)
at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.1-1-g07a4bf8):

• NorwayMember

I have been unable to reproduce this while not using the -nct` option, everything runs fine without it. Instead, as a workaround, I put some effort into learning how Queue works and parallelized my workflow with scatter-gather instead.

• montrealMember

I have the exact same problem on DNA data and I use nct 15.

I'll try with nct 1