Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!

java.lang.NullPointerException with haplotypecaller gvcf mode

mmterpstrammterpstra NetherlandsMember ✭✭
edited March 2015 in Ask the GATK team

The following error i got when running on one of the cluster nodes on a single sample out of multiple. On the headnode the error resolved, but why I don't know.

Does this error sound familiar? If so, link me to the fix / discussion. If not, please do not spend too much time on it.

This is the error log:

INFO  21:49:26,894 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  21:49:26,898 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.3-0-g37228af, Compiled 2014/10/24 01:07:22 
INFO  21:49:26,898 HelpFormatter - Copyright (c) 2010 The Broad Institute 
INFO  21:49:26,903 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
INFO  21:49:26,909 HelpFormatter - Program Args: -T HaplotypeCaller -R human_g1k_v37.fasta --dbsnp dbsnp_138.b37.vcf -I RCC-ER.bam -stand_call_conf 10.0 -stand_emit_conf 30.0 -o RCC-ER.g.vcf -nct 8 --emitRefConfidence GVCF --variant_index_type LINEAR --variant_index_parameter 128000 
INFO  21:49:26,915 HelpFormatter - Executing as [email protected] on Linux 3.0.101-0.7.17-default amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_25-b15. 
INFO  21:49:26,915 HelpFormatter - Date/Time: 2015/03/03 21:49:26 
INFO  21:49:26,915 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  21:49:26,916 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  21:49:27,200 GenomeAnalysisEngine - Strictness is SILENT 
INFO  21:49:27,354 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 250 
INFO  21:49:27,366 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
INFO  21:49:27,436 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.07 
INFO  21:49:27,477 HCMappingQualityFilter - Filtering out reads with MAPQ < 20 
INFO  21:49:27,810 MicroScheduler - Running the GATK in parallel mode with 8 total threads, 8 CPU thread(s) for each of 1 data thread(s), of 48 processors available on this machine 
INFO  21:49:27,901 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files 
INFO  21:49:28,234 GenomeAnalysisEngine - Done preparing for traversal 
INFO  21:49:28,235 ProgressMeter -                 |      processed |    time |         per 1M |           |   total | remaining 
INFO  21:49:28,236 ProgressMeter -        Location | active regions | elapsed | active regions | completed | runtime |   runtime 
INFO  21:49:28,237 HaplotypeCaller - Standard Emitting and Calling confidence set to 0.0 for reference-model confidence output 
INFO  21:49:28,237 HaplotypeCaller - All sites annotated with PLs forced to true for reference-model confidence output 
INFO  21:49:28,445 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units 
INFO  21:49:28,447 PairHMM - Performance profiling for PairHMM is disabled because HaplotypeCaller is being run with multiple threads (-nct>1) option
Profiling is enabled only when running in single thread mode

INFO  21:49:51,883 VectorLoglessPairHMM - libVectorLoglessPairHMM unpacked successfully from GATK jar file 
INFO  21:49:51,884 VectorLoglessPairHMM - Using vectorized implementation of PairHMM 
INFO  21:49:55,536 GATKRunReport - Uploaded run statistics report to AWS S3 
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace 
    at java.lang.String.checkBounds(String.java:374)
    at java.lang.String.<init>(String.java:314)
    at htsjdk.samtools.util.StringUtil.bytesToString(StringUtil.java:301)
    at htsjdk.samtools.BAMRecord.decodeReadName(BAMRecord.java:331)
    at htsjdk.samtools.BAMRecord.getReadName(BAMRecord.java:220)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.readthreading.ReadThreadingGraph.addRead(ReadThreadingGraph.java:585)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.createGraph(ReadThreadingAssembler.java:178)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.assemble(ReadThreadingAssembler.java:117)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.LocalAssemblyEngine.runLocalAssembly(LocalAssemblyEngine.java:169)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.assembleReads(HaplotypeCaller.java:1163)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:1000)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:221)
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:709)
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:705)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.3-0-g37228af):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Code exception (see stack trace for error itself)
##### ERROR ------------------------------------------------------------------------------------------

Best Answers


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hmm, this looks like the program choked on one particular read, but no idea why. I don't remember seeing this before, sorry.

  • mmterpstrammterpstra NetherlandsMember ✭✭
    edited March 2015

    Thanks for answering,

    After closer inspection I see it with multiple samples... Can you suggest a way of debugging?
    # If good answer then i will flag the upper answer as sufficient :P

    more details:
    i'm using bwa version 0.7.10 picard version 1.102 btw

    in short this the workflow:

    This seems to work:
    The regular haplotypecaller on all samples after BSQR printreads runs fine (at the moment...)

  • ryanabashbashryanabashbash Oak Ridge National LaboratoryMember

    I second what @Kurt said. We run it with "-nct 1"; we see relatively frequent crashes otherwise. The process sometimes makes it to completion, so as an alternative you could keep submitting the parallelized job until it succeeds.

  • mmterpstrammterpstra NetherlandsMember ✭✭

    Mailed the internal resources: Could be because of missing filesystem mounts because of the out of memory killer killing the GPFS daemon.
    Cluster complexity at its best ;_;

    Sorry for the complaint => maybe check for this as an error

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Oh interesting -- thanks for letting us know!

Sign In or Register to comment.