Holiday Notice:
The Frontline Support team will be offline February 18 for President's Day but will be back February 19th. Thank you for your patience as we get to all of your questions!

GATK 3.7 HaplotypeCaller NullPointerException in removeAltAllelesIfTooManyGenotypes

HaplotypeCaller in GATK 3.7 (3.7-0-g56f2c1a) is throwing a NullPointerException in some cases. See below for log output from a failing run.

It looks to me like the call to .get() in the practicalAlleleCountForPloidy HashMap must returning null for some reason (and the unboxing into an int is then causing the null pointer exception):

Given that the immediately preceding call is to practicalAlleleCountForPloidy.putIfAbsent(), either the key for the given ploidy must already be in the HashMap with value null or the calculation from GenotypeLikelihoodCalculators.computeMaxAcceptableAlleleCount(ploidy, maxGenotypeCountToEnumerate) is returning null.

A quick scan of the code does not indicate any obvious problems here. I'll see if I can add some debug printing and re-run on the problematic data to clarify the situation.

['-T', 'HaplotypeCaller', '--no_cmdline_in_header', '-R', u'/keep/d527a0b11143ebf18be6c52ff6c09552+2163/hs37d5.fa', '-I', u'/keep/c5e28ac0e8014f6117792f83e031aea8+21780/20643_7.cram', '-L', u'/keep/85abb468fc85aece80e33396c48fb7d0+94/hs37d5.dict.159_of_200.interval_list', '-A', 'StrandAlleleCountsBySample', '-A', 'StrandBiasBySample', '-nct', '4', '--emitRefConfidence', 'GVCF', '--variant_index_type', 'LINEAR', '--variant_index_parameter', '128000', '-o', u'/tmp/crunch-job-task-work/humgen-04-02.8/out/20643_7.hs37d5.dict.159_of_200.interval_list.vcf.gz', '-l', 'INFO']
 INFO  13:31:41,104 HelpFormatter - --------------------------------------------------------------------------------
 INFO  13:31:41,110 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7-0-g56f2c1a, Compiled 2017/01/03 11:50:40
 INFO  13:31:41,110 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
 INFO  13:31:41,110 HelpFormatter - For support and documentation go to
 INFO  13:31:41,111 HelpFormatter - [Tue Jan 03 13:31:41 UTC 2017] Executing on Linux 3.13.0-85-generic amd64
 INFO  13:31:41,111 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_102-b14
 INFO  13:31:41,118 HelpFormatter - Program Args: -T HaplotypeCaller --no_cmdline_in_header -R /keep/d527a0b11143ebf18be6c52ff6c09552+2163/hs37d5.fa -I /keep/c5e28ac0e8014f6117792f83e031aea8+21780/20643_7.cram -L /keep/85abb468fc85aece80e33396c48fb7d0+94/hs37d5.dict.159_of_200.interval_list -A StrandAlleleCountsBySample -A StrandBiasBySample -nct 4 --emitRefConfidence GVCF --variant_index_type LINEAR --variant_index_parameter 128000 -o /tmp/crunch-job-task-work/humgen-04-02.8/out/20643_7.hs37d5.dict.159_of_200.interval_list.vcf.gz -l INFO
 INFO  13:31:41,125 HelpFormatter - Executing as [email protected] on Linux 3.13.0-85-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_102-b14.
 INFO  13:31:41,126 HelpFormatter - Date/Time: 2017/01/03 13:31:41
 INFO  13:31:41,126 HelpFormatter - --------------------------------------------------------------------------------
 INFO  13:31:41,126 HelpFormatter - --------------------------------------------------------------------------------
 WARN  13:31:41,135 GATKVCFUtils - Naming your output file using the .g.vcf extension will automatically set the appropriate values  for --variant_index_type and --variant_index_parameter
 WARN  13:31:41,136 GATKVCFUtils - Creating Tabix index for /tmp/crunch-job-task-work/humgen-04-02.8/out/20643_7.hs37d5.dict.159_of_200.interval_list.vcf.gz, ignoring user-specified index type and parameter
 INFO  13:31:41,178 GenomeAnalysisEngine - Strictness is SILENT
 INFO  13:31:41,910 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 500
 INFO  13:31:41,920 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
 INFO  13:31:43,684 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 1.76
 INFO  13:31:44,363 HCMappingQualityFilter - Filtering out reads with MAPQ < 20
 INFO  13:31:44,401 IntervalUtils - Processing 15618872 bp from intervals
 INFO  13:31:44,422 MicroScheduler - Running the GATK in parallel mode with 4 total threads, 4 CPU thread(s) for each of 1 data thread(s), of 40 processors available on this machine
 INFO  13:31:44,528 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
 INFO  13:31:45,093 GenomeAnalysisEngine - Done preparing for traversal
 INFO  13:31:45,094 ProgressMeter -                 |      processed |    time |         per 1M |           |   total | remaining
 INFO  13:31:45,094 ProgressMeter -        Location | active regions | elapsed | active regions | completed | runtime |   runtime
 INFO  13:31:45,097 HaplotypeCaller - Standard Emitting and Calling confidence set to 0.0 for reference-model confidence output
 INFO  13:31:45,097 HaplotypeCaller - All sites annotated with PLs forced to true for reference-model confidence output
 WARN  13:31:45,278 InbreedingCoeff - Annotation will not be calculated. InbreedingCoeff requires at least 10 unrelated samples.
 INFO  13:31:45,425 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units
 INFO  13:31:45,427 PairHMM - Performance profiling for PairHMM is disabled because the program is being run with multiple threads (-nct>1) option
 Profiling is enabled only when running in single thread mode

 Using AVX accelerated implementation of PairHMM
 INFO  13:31:50,403 VectorLoglessPairHMM - libVectorLoglessPairHMM unpacked successfully from GATK jar file
 INFO  13:31:50,403 VectorLoglessPairHMM - Using vectorized implementation of PairHMM
 ##### ERROR --
 ##### ERROR stack trace
 \011at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(
 \011at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(
 \011at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler$
 \011at java.util.concurrent.Executors$
 \011at java.util.concurrent.ThreadPoolExecutor.runWorker(
 \011at java.util.concurrent.ThreadPoolExecutor$
 ##### ERROR ------------------------------------------------------------------------------------------
 ##### ERROR A GATK RUNTIME ERROR has occurred (version 3.7-0-g56f2c1a):
 ##### ERROR
 ##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
 ##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
 ##### ERROR Visit our website and forum for extensive documentation and answers to
 ##### ERROR commonly asked questions
 ##### ERROR
 ##### ERROR MESSAGE: Code exception (see stack trace for error itself)
 ##### ERROR ------------------------------------------------------------------------------------------

Best Answer


  • jrandalljrandall Member

    This error cannot be immediately reproduced. I've re-run GATK with an identical command line and input data several times and it has run successfully to completion each time. I'll keep trying, as we saw this in one out of ~1000 runs today, so perhaps it occurs with low frequency.

    Or maybe it was just cosmic rays.

    Still, it would be good not to unbox an Integer into an int without catching the exception that would be thrown if that Integer is not null.

  • jrandalljrandall Member

    Looking again at the code, I don't see where the concurrent access to the (non thread-safe) practicalAlleleCountForPloidy HashMap is synchronized. It seems possible that multiple threads are entering the call to putIfAbsent at roughly the same time, or that one could be getting the value while another is reading it.

    Seems like the entire removeAltAllelesIfTooManyGenotypes method should be marked as synchronized or else a ConcurrentHashMap used in place of the non thread-safe HashMap.

    Issue · Github
    by Geraldine_VdAuwera

    Issue Number
    Last Updated
    Closed By
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I got confirmation from a dev that your proposed solution should work. Will let you know when it's patched in the nightly.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Fix for this is in, should be fixed in tomorrow's nightly.

  • sespiritusespiritu Toronto, ON, CanadaMember

    Hi Geraldine,

    Will the current GATK v3.7 be updated with this patch, or will it just remain in the nightly?


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    We're planning to do a patch release soon; probably in the next couple of weeks. We're looking at getting a few other minor fixes in while we're at it.
  • rernstrernst UMC UtrechtMember

    Is there an eta for this patch release? Thanks.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    Ah, we ended up accumulating a number of other patches and fixes, so we decided to hold out for a proper release (since the nightly builds are available as a workaround for those in a rush). Sorry we forgot to update the thread. We're waiting on one more feature to cut the 3.8 release -- ETA should be a couple of weeks, tops.
Sign In or Register to comment.