java.lang.NullPointerException with haplotypecaller gvcf mode

mmterpstrammterpstra NetherlandsMember
edited March 2015 in Ask the GATK team

The following error i got when running on one of the cluster nodes on a single sample out of multiple. On the headnode the error resolved, but why I don't know.

Does this error sound familiar? If so, link me to the fix / discussion. If not, please do not spend too much time on it.

This is the error log:

INFO  21:49:26,894 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  21:49:26,898 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.3-0-g37228af, Compiled 2014/10/24 01:07:22 
INFO  21:49:26,898 HelpFormatter - Copyright (c) 2010 The Broad Institute 
INFO  21:49:26,903 HelpFormatter - For support and documentation go to 
INFO  21:49:26,909 HelpFormatter - Program Args: -T HaplotypeCaller -R human_g1k_v37.fasta --dbsnp dbsnp_138.b37.vcf -I RCC-ER.bam -stand_call_conf 10.0 -stand_emit_conf 30.0 -o RCC-ER.g.vcf -nct 8 --emitRefConfidence GVCF --variant_index_type LINEAR --variant_index_parameter 128000 
INFO  21:49:26,915 HelpFormatter - Executing as mterpstra@targetgcc09-mgmt on Linux 3.0.101-0.7.17-default amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_25-b15. 
INFO  21:49:26,915 HelpFormatter - Date/Time: 2015/03/03 21:49:26 
INFO  21:49:26,915 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  21:49:26,916 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  21:49:27,200 GenomeAnalysisEngine - Strictness is SILENT 
INFO  21:49:27,354 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 250 
INFO  21:49:27,366 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
INFO  21:49:27,436 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.07 
INFO  21:49:27,477 HCMappingQualityFilter - Filtering out reads with MAPQ < 20 
INFO  21:49:27,810 MicroScheduler - Running the GATK in parallel mode with 8 total threads, 8 CPU thread(s) for each of 1 data thread(s), of 48 processors available on this machine 
INFO  21:49:27,901 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files 
INFO  21:49:28,234 GenomeAnalysisEngine - Done preparing for traversal 
INFO  21:49:28,235 ProgressMeter -                 |      processed |    time |         per 1M |           |   total | remaining 
INFO  21:49:28,236 ProgressMeter -        Location | active regions | elapsed | active regions | completed | runtime |   runtime 
INFO  21:49:28,237 HaplotypeCaller - Standard Emitting and Calling confidence set to 0.0 for reference-model confidence output 
INFO  21:49:28,237 HaplotypeCaller - All sites annotated with PLs forced to true for reference-model confidence output 
INFO  21:49:28,445 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units 
INFO  21:49:28,447 PairHMM - Performance profiling for PairHMM is disabled because HaplotypeCaller is being run with multiple threads (-nct>1) option
Profiling is enabled only when running in single thread mode

INFO  21:49:51,883 VectorLoglessPairHMM - libVectorLoglessPairHMM unpacked successfully from GATK jar file 
INFO  21:49:51,884 VectorLoglessPairHMM - Using vectorized implementation of PairHMM 
INFO  21:49:55,536 GATKRunReport - Uploaded run statistics report to AWS S3 
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace 
    at java.lang.String.checkBounds(
    at java.lang.String.<init>(
    at htsjdk.samtools.util.StringUtil.bytesToString(
    at htsjdk.samtools.BAMRecord.decodeReadName(
    at htsjdk.samtools.BAMRecord.getReadName(
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler$
    at java.util.concurrent.Executors$
    at java.util.concurrent.FutureTask$Sync.innerRun(
    at java.util.concurrent.ThreadPoolExecutor.runWorker(
    at java.util.concurrent.ThreadPoolExecutor$
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.3-0-g37228af):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions
##### ERROR
##### ERROR MESSAGE: Code exception (see stack trace for error itself)
##### ERROR ------------------------------------------------------------------------------------------

Best Answers


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hmm, this looks like the program choked on one particular read, but no idea why. I don't remember seeing this before, sorry.

  • mmterpstrammterpstra NetherlandsMember
    edited March 2015

    Thanks for answering,

    After closer inspection I see it with multiple samples... Can you suggest a way of debugging?
    # If good answer then i will flag the upper answer as sufficient :P

    more details:
    i'm using bwa version 0.7.10 picard version 1.102 btw

    in short this the workflow:

    This seems to work:
    The regular haplotypecaller on all samples after BSQR printreads runs fine (at the moment...)

  • ryanabashbashryanabashbash Oak Ridge National LaboratoryMember

    I second what @Kurt said. We run it with "-nct 1"; we see relatively frequent crashes otherwise. The process sometimes makes it to completion, so as an alternative you could keep submitting the parallelized job until it succeeds.

  • mmterpstrammterpstra NetherlandsMember

    Mailed the internal resources: Could be because of missing filesystem mounts because of the out of memory killer killing the GPFS daemon.
    Cluster complexity at its best ;_;

    Sorry for the complaint => maybe check for this as an error

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Oh interesting -- thanks for letting us know!

Sign In or Register to comment.