NullPointerException in HaplotypeCaller 4.0.1.1

Dear GATK team
I am calling variants using HaplotypeCaller on both WGS data form a normal tissue samle and RNA seq data on tumor tissue. Settings for HC are slightly different for the RNA seq data but the problem only arises when running HC on the WGS data. We are following Best Practices.
I am using Oracle JDK 1.8.0 144 Java HotSpot(TM) 64-Bit Server VM, but also tried Open JDK 64-Bit Server VM v1.8.0 161 and GATK version 4.0.1.1.
I am running using the WDL/Cromwell setup and scatter-gather so as you can see in the following command, I am not using --native-pair-hmm-threads (I saw in some previous posts that the old -nct could produce some errors).

It could be related to memory so I tried playing around with the Java settings like setting them from -Xmx4g to -Xms8000m which I saw was used here: https://github.com/gatk-workflows/gatk4-germline-snps-indels/blob/master/haplotypecaller-gvcf-gatk4.hg38.wgs.inputs.json. It doesn't any any difference, the error is still produced... I also tried deleting the GCLimits. Should I try something else? The -Duser.country is for some confusion between using ',' and '.' for floats, our server is set to Danish language (for no reason) and we use commas for decimals.

This is my command (some of the pats have been abbreviated for clarity:

$gatk4.0.1.1 --java-options "-XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -Xmx4g -Duser.country=en_US.UTF-8 -Duser.language=en_US.UTF-8" HaplotypeCaller \
-R $longpath/gatk-legacy-bundles/b37/human_g1k_v37_decoy.fasta \
-O Normal-056-WGS.vcf.gz \
-I $longpath/call-GatherBamFiles_normal/execution/Normal-056-WGS.bam \
--max-alternate-alleles 3 \
--contamination-fraction-to-filter 0.00172 \
--read-filter OverclippedReadFilter \
--standard-min-confidence-threshold-for-calling 30 \
-L $longpath/gatk-legacy-bundles/b37/scattered_wgs_intervals/scatter-50/temp_0024_of_50/scattered.interval_list

Stacktrace (sorry for the long paths but hopefully only the last part is important):

Using GATK jar /services/tools/gatk/4.0.1.1/gatk-package-4.0.1.1-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -Xmx4g -Duser.country=en_US.UTF-8 -Duser.language=en_US.UTF-8 -jar /services/tools/gatk/4.0.1.1/gatk-package-4.0.1.1-local.jar HaplotypeCaller -R /home/projects/dp_00005/apps/bonkolab_cromwell/tmp_wdir/Sample_021-056/cromwell-executions/WGS_normal_RNAseq_tumor_SNV_wf/4da5f4da-0dfc-4d55-a3bb-865eb51d6838/call-HaplotypeCaller_normal/shard-23/inputs/home/databases/gatk-legacy-bundles/b37/human_g1k_v37_decoy.fasta -O Normal-056-WGS.vcf.gz -I /home/projects/dp_00005/apps/bonkolab_cromwell/tmp_wdir/Sample_021-056/cromwell-executions/WGS_normal_RNAseq_tumor_SNV_wf/4da5f4da-0dfc-4d55-a3bb-865eb51d6838/call-HaplotypeCaller_normal/shard-23/inputs/home/projects/dp_00005/apps/bonkolab_cromwell/tmp_wdir/Sample_021-056/cromwell-executions/WGS_normal_RNAseq_tumor_SNV_wf/4da5f4da-0dfc-4d55-a3bb-865eb51d6838/call-GatherBamFiles_normal/execution/Normal-056-WGS.bam --max-alternate-alleles 3 --contamination-fraction-to-filter 0.00172 --read-filter OverclippedReadFilter --standard-min-confidence-threshold-for-calling 30 -L /home/projects/dp_00005/apps/bonkolab_cromwell/tmp_wdir/Sample_021-056/cromwell-executions/WGS_normal_RNAseq_tumor_SNV_wf/4da5f4da-0dfc-4d55-a3bb-865eb51d6838/call-HaplotypeCaller_normal/shard-23/inputs/home/databases/gatk-legacy-bundles/b37/scattered_wgs_intervals/scatter-50/temp_0024_of_50/scattered.interval_list
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/home/projects/dp_00005/apps/bonkolab_cromwell/tmp_wdir/Sample_021-056/cromwell-executions/WGS_normal_RNAseq_tumor_SNV_wf/4da5f4da-0dfc-4d55-a3bb-865eb51d6838/call-HaplotypeCaller_normal/shard-23/execution/tmp.47DEgM
11:29:03.730 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/services/tools/gatk/4.0.1.1/gatk-package-4.0.1.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
11:29:03.961 INFO  HaplotypeCaller - ------------------------------------------------------------
11:29:03.962 INFO  HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.0.1.1
11:29:03.963 INFO  HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
11:29:03.963 INFO  HaplotypeCaller - Executing as [email protected] on Linux v3.10.0-514.10.2.el7.x86_64 amd64
11:29:03.963 INFO  HaplotypeCaller - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_144-b01
11:29:03.963 INFO  HaplotypeCaller - Start Date/Time: March 19, 2018 11:29:03 AM CET
11:29:03.963 INFO  HaplotypeCaller - ------------------------------------------------------------
11:29:03.963 INFO  HaplotypeCaller - ------------------------------------------------------------
11:29:03.964 INFO  HaplotypeCaller - HTSJDK Version: 2.14.1
11:29:03.964 INFO  HaplotypeCaller - Picard Version: 2.17.2
11:29:03.964 INFO  HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 1
11:29:03.964 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
11:29:03.964 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
11:29:03.964 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
11:29:03.964 INFO  HaplotypeCaller - Deflater: IntelDeflater
11:29:03.964 INFO  HaplotypeCaller - Inflater: IntelInflater
11:29:03.964 INFO  HaplotypeCaller - GCS max retries/reopens: 20
11:29:03.964 INFO  HaplotypeCaller - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
11:29:03.965 INFO  HaplotypeCaller - Initializing engine
11:29:04.807 INFO  IntervalArgumentCollection - Processing 40724607 bp from intervals
11:29:04.833 INFO  HaplotypeCaller - Done initializing engine
11:29:04.863 INFO  HaplotypeCallerEngine - Disabling physical phasing, which is supported only for reference-model confidence output
11:29:05.604 INFO  NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/services/tools/gatk/4.0.1.1/gatk-package-4.0.1.1-local.jar!/com/intel/gkl/native/libgkl_utils.so
11:29:05.618 INFO  NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/services/tools/gatk/4.0.1.1/gatk-package-4.0.1.1-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
11:29:05.682 WARN  IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
11:29:05.683 INFO  IntelPairHmm - Available threads: 1
11:29:05.683 INFO  IntelPairHmm - Requested threads: 4
11:29:05.683 WARN  IntelPairHmm - Using 1 available threads, but 4 were requested
11:29:05.683 INFO  PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
11:29:05.759 INFO  ProgressMeter - Starting traversal
11:29:05.765 INFO  ProgressMeter -        Current Locus  Elapsed Minutes     Regions Processed   Regions/Minute
11:29:06.858 INFO  VectorLoglessPairHMM - Time spent in setup for JNI call : 0.001355153
11:29:06.859 INFO  PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 0.023359896
11:29:06.859 INFO  SmithWatermanAligner - Total compute time in java Smith-Waterman : 0.06 sec
11:29:06.860 INFO  HaplotypeCaller - Shutting down engine
[March 19, 2018 11:29:06 AM CET] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 0.05 minutes.
Runtime.totalMemory()=2041511936
java.lang.NullPointerException
    at java.util.Collections$UnmodifiableMap.<init>(Collections.java:1446)
    at java.util.Collections.unmodifiableMap(Collections.java:1433)
    at org.broadinstitute.hellbender.tools.walkers.genotyper.StandardCallerArgumentCollection.getSampleContamination(StandardCallerArgumentCollection.java:89)
    at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerGenotypingEngine.assignGenotypeLikelihoods(HaplotypeCallerGenotypingEngine.java:141)
    at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(HaplotypeCallerEngine.java:566)
    at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller.apply(HaplotypeCaller.java:218)
    at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:295)
    at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:271)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:893)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:136)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:153)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:195)
    at org.broadinstitute.hellbender.Main.main(Main.java:277)

Thank you so much for your help!

  • Nanna

Answers

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Hi @nanna,

    Given you say,

    Settings for HC are slightly different for the RNA seq data but the problem only arises when running HC on the WGS data.

    Perhaps this relates to the issue noted at the end of https://gatkforums.broadinstitute.org/gatk/discussion/5273/. WGS data is much denser than RNA-Seq data. Can you try running the HaplotypeCaller command on the same data outside of the WDL/Cromwell context, e.g. locally, and see if you run into the same issue? Thanks.

  • nannananna Member

    Hi @Shlee

    Yes, our WGS data is about 30x coverage so pretty dense, but we do divide into 50 shards according to the scattered_calling_intervals in the bundle, so it should help a bit. Each shard fail though, no difference there.

    I have tried executing the script file generated by Cromwell directly from the command-line and thus sort of bypassing the Cromwell context for that call, but I can't take the data out of our server due to patient privacy considerations.
    I also created and ran a separate script out of the Cromwell context that does not use the -L option at all but has the same remaining options so no shards - of course this is much more dense and would take a long time, but I think it confirms that it is not be related to Cromwell.

    Unfortunately both attempts still produce the same error. Is it possible that something is wrong with the bam-file? We did use ValidateSamFile and found no errors so I feel like I am running out of options.

    I also looked at the thread you link to, but only got not to use multithreading from it - did I miss a point?

    Thank you so much for the help!

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Hi @nanna,

    Sorry, you are experiencing this problem. Thankfully, your run errors immediately. Also, thanks for confirming you get the same error outside of Cromwell. You are using GATK version 4.0.1.1. Searching the forum with:

    java.lang.NullPointerException
    at java.util.Collections$UnmodifiableMap.

    We get a known bug noted here and resolved here.
    It appears that the --contamination-fraction-to-filterargument is buggy for your version of GATK4. GATK v4.0.2.0 and v4.0.2.1 contain the fix and you can download these at https://github.com/broadinstitute/gatk/releases.

  • nannananna Member

    Hi @shlee

    I tried running the script without --contamination-fraction-to-filter and it ran without a complaint! I will install the newest patched release and try again. I realized now that even though we also used this input parameter for the RNA sample, the contamination calculated was 0.0 which didn't produce any error.

    Thank you so much for your help and hopefully next time, I will find the answers at github instantly!

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Great to hear all is running fine now @nanna.

Sign In or Register to comment.