To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at

java.lang.NullPointerException with haplotypecaller gvcf mode

mmterpstrammterpstra NetherlandsMember
edited March 2015 in Ask the GATK team

The following error i got when running on one of the cluster nodes on a single sample out of multiple. On the headnode the error resolved, but why I don't know.

Does this error sound familiar? If so, link me to the fix / discussion. If not, please do not spend too much time on it.

This is the error log:

INFO  21:49:26,894 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  21:49:26,898 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.3-0-g37228af, Compiled 2014/10/24 01:07:22 
INFO  21:49:26,898 HelpFormatter - Copyright (c) 2010 The Broad Institute 
INFO  21:49:26,903 HelpFormatter - For support and documentation go to 
INFO  21:49:26,909 HelpFormatter - Program Args: -T HaplotypeCaller -R human_g1k_v37.fasta --dbsnp dbsnp_138.b37.vcf -I RCC-ER.bam -stand_call_conf 10.0 -stand_emit_conf 30.0 -o RCC-ER.g.vcf -nct 8 --emitRefConfidence GVCF --variant_index_type LINEAR --variant_index_parameter 128000 
INFO  21:49:26,915 HelpFormatter - Executing as mterpstra@targetgcc09-mgmt on Linux 3.0.101-0.7.17-default amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_25-b15. 
INFO  21:49:26,915 HelpFormatter - Date/Time: 2015/03/03 21:49:26 
INFO  21:49:26,915 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  21:49:26,916 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  21:49:27,200 GenomeAnalysisEngine - Strictness is SILENT 
INFO  21:49:27,354 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 250 
INFO  21:49:27,366 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
INFO  21:49:27,436 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.07 
INFO  21:49:27,477 HCMappingQualityFilter - Filtering out reads with MAPQ < 20 
INFO  21:49:27,810 MicroScheduler - Running the GATK in parallel mode with 8 total threads, 8 CPU thread(s) for each of 1 data thread(s), of 48 processors available on this machine 
INFO  21:49:27,901 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files 
INFO  21:49:28,234 GenomeAnalysisEngine - Done preparing for traversal 
INFO  21:49:28,235 ProgressMeter -                 |      processed |    time |         per 1M |           |   total | remaining 
INFO  21:49:28,236 ProgressMeter -        Location | active regions | elapsed | active regions | completed | runtime |   runtime 
INFO  21:49:28,237 HaplotypeCaller - Standard Emitting and Calling confidence set to 0.0 for reference-model confidence output 
INFO  21:49:28,237 HaplotypeCaller - All sites annotated with PLs forced to true for reference-model confidence output 
INFO  21:49:28,445 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units 
INFO  21:49:28,447 PairHMM - Performance profiling for PairHMM is disabled because HaplotypeCaller is being run with multiple threads (-nct>1) option
Profiling is enabled only when running in single thread mode

INFO  21:49:51,883 VectorLoglessPairHMM - libVectorLoglessPairHMM unpacked successfully from GATK jar file 
INFO  21:49:51,884 VectorLoglessPairHMM - Using vectorized implementation of PairHMM 
INFO  21:49:55,536 GATKRunReport - Uploaded run statistics report to AWS S3 
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace 
    at java.lang.String.checkBounds(
    at java.lang.String.<init>(
    at htsjdk.samtools.util.StringUtil.bytesToString(
    at htsjdk.samtools.BAMRecord.decodeReadName(
    at htsjdk.samtools.BAMRecord.getReadName(
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler$
    at java.util.concurrent.Executors$
    at java.util.concurrent.FutureTask$Sync.innerRun(
    at java.util.concurrent.ThreadPoolExecutor.runWorker(
    at java.util.concurrent.ThreadPoolExecutor$
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.3-0-g37228af):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions
##### ERROR
##### ERROR MESSAGE: Code exception (see stack trace for error itself)
##### ERROR ------------------------------------------------------------------------------------------

Best Answers


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hmm, this looks like the program choked on one particular read, but no idea why. I don't remember seeing this before, sorry.

  • mmterpstrammterpstra NetherlandsMember
    edited March 2015

    Thanks for answering,

    After closer inspection I see it with multiple samples... Can you suggest a way of debugging?
    # If good answer then i will flag the upper answer as sufficient :P

    more details:
    i'm using bwa version 0.7.10 picard version 1.102 btw

    in short this the workflow:

    This seems to work:
    The regular haplotypecaller on all samples after BSQR printreads runs fine (at the moment...)

  • ryanabashbashryanabashbash Oak Ridge National LaboratoryMember

    I second what @Kurt said. We run it with "-nct 1"; we see relatively frequent crashes otherwise. The process sometimes makes it to completion, so as an alternative you could keep submitting the parallelized job until it succeeds.

  • mmterpstrammterpstra NetherlandsMember

    Mailed the internal resources: Could be because of missing filesystem mounts because of the out of memory killer killing the GPFS daemon.
    Cluster complexity at its best ;_;

    Sorry for the complaint => maybe check for this as an error

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Oh interesting -- thanks for letting us know!

Sign In or Register to comment.