java.lang.NullPointerException in HaplotypeCaller when generating gVCF with gatk 4.0.11.0

I am running the latest gatk 4.0.11.0 on aligned reads from whole exome sequencing from TCGA to generate gVCF files. After generating the gVCF file, gatk is crashing with a null pointer exception. I get this exception only when I try to generate gVCF, but not regular VCF, from the same exact input. I also get the exception when I use different reference genomes and input bam files. The generated gVCF looks okay, but it is still strange that the software crashes. I was wondering if you have any suggestions?

Here is how I run gatk and the relevant console output:

$ gatk HaplotypeCaller -R ../../hg38.canonical_chromosomes/hg38.canonical_chromosomes.fa -I C828.TCGA-EB-A3XB-10B-01D-A23B-08.1_gdc_realn.sorted.bam --emit-ref-confidence GVCF -O C828.TCGA-EB-A3XB-10B-01D-A23B-08.1_gdc_realn.sorted.bam.genomic.hg38_canonical_chromosomes.vcf.gz

Using GATK jar /home/pfiziev/software/gatk-4.0.11.0/gatk-package-4.0.11.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/pfiziev/software/gatk-4.0.11.0/gatk-package-4.0.11.0-local.jar HaplotypeCaller -R ../../hg38.canonical_chromosomes/hg38.canonical_chromosomes.fa -I C828.TCGA-EB-A3XB-10B-01D-A23B-08.1_gdc_realn.sorted.bam --emit-ref-confidence GVCF -O C828.TCGA-EB-A3XB-10B-01D-A23B-08.1_gdc_realn.sorted.bam.genomic.hg38_canonical_chromosomes.vcf.gz
11:17:56.245 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/pfiziev/software/gatk-4.0.11.0/gatk-package-4.0.11.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
11:17:57.952 INFO HaplotypeCaller - ------------------------------------------------------------
11:17:57.953 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.0.11.0
11:17:57.953 INFO HaplotypeCaller - For support and documentation go to
11:17:57.953 INFO HaplotypeCaller - Executing as [email protected] on Linux v3.10.0-693.11.6.el7.x86_64 amd64
11:17:57.953 INFO HaplotypeCaller - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_161-b14
11:17:57.953 INFO HaplotypeCaller - Start Date/Time: November 8, 2018 11:17:56 AM PST
11:17:57.953 INFO HaplotypeCaller - ------------------------------------------------------------
11:17:57.954 INFO HaplotypeCaller - ------------------------------------------------------------
11:17:57.954 INFO HaplotypeCaller - HTSJDK Version: 2.16.1
11:17:57.954 INFO HaplotypeCaller - Picard Version: 2.18.13
11:17:57.955 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
11:17:57.955 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
11:17:57.955 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
11:17:57.955 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
11:17:57.955 INFO HaplotypeCaller - Deflater: IntelDeflater
11:17:57.955 INFO HaplotypeCaller - Inflater: IntelInflater
11:17:57.955 INFO HaplotypeCaller - GCS max retries/reopens: 20
11:17:57.955 INFO HaplotypeCaller - Requester pays: disabled
11:17:57.955 INFO HaplotypeCaller - Initializing engine
11:17:58.487 INFO HaplotypeCaller - Done initializing engine
11:17:58.489 INFO HaplotypeCallerEngine - Tool is in reference confidence mode and the annotation, the following changes will be made to any specified annotations: 'StrandBiasBySample' will be enabled. 'ChromosomeCounts', 'FisherStrand', 'StrandOddsRatio' and 'QualByDepth' annotations have been disabled
11:17:58.499 INFO HaplotypeCallerEngine - Standard Emitting and Calling confidence set to 0.0 for reference-model confidence output
11:17:58.499 INFO HaplotypeCallerEngine - All sites annotated with PLs forced to true for reference-model confidence output
11:17:58.512 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/home/pfiziev/software/gatk-4.0.11.0/gatk-package-4.0.11.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
11:17:58.514 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/home/pfiziev/software/gatk-4.0.11.0/gatk-package-4.0.11.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
11:17:58.571 WARN IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
11:17:58.572 INFO IntelPairHmm - Available threads: 56
11:17:58.572 INFO IntelPairHmm - Requested threads: 4
11:17:58.572 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
11:17:58.682 INFO ProgressMeter - Starting traversal
11:17:58.683 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
11:18:03.297 WARN DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
11:18:03.297 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null

14:10:58.160 INFO ProgressMeter - chrY:27588245 173.0 10830630 62608.0
14:11:08.290 INFO ProgressMeter - chrY:37023245 173.2 10862080 62728.5
14:11:18.292 INFO ProgressMeter - chrY:46002245 173.3 10892010 62840.9
14:11:28.292 INFO ProgressMeter - chrY:54567245 173.5 10920560 62945.1
14:11:32.729 WARN DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
14:11:32.729 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
14:11:33.219 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 2.404777391
14:11:33.219 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 255.64041847700003
14:11:33.219 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 417.38 sec
14:11:33.219 INFO HaplotypeCaller - Shutting down engine
[November 8, 2018 2:11:33 PM PST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 173.62 minutes.
Runtime.totalMemory()=12706119680
java.lang.NullPointerException
at org.broadinstitute.hellbender.engine.AssemblyRegion.getReference(AssemblyRegion.java:443)
at org.broadinstitute.hellbender.engine.AssemblyRegion.getAssemblyRegionReference(AssemblyRegion.java:464)
at org.broadinstitute.hellbender.engine.AssemblyRegion.getAssemblyRegionReference(AssemblyRegion.java:450)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.AssemblyBasedCallerUtils.createReferenceHaplotype(AssemblyBasedCallerUtils.java:149)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.referenceModelForNoVariation(HaplotypeCallerEngine.java:682)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(HaplotypeCallerEngine.java:521)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller.apply(HaplotypeCaller.java:240)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:291)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:267)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:966)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)

Best Answers

  • pfizievpfiziev
    Accepted Answer

    Hello Bhanu,

    Thank you very much for your reply! I ran ValidateSamFile and even though it showed some minor issues with the input bam files, the problem turned out to be elsewhere.

    The input sequencing reads were aligned against GRCh38.d1.vd1 (which I found out later) and I was using hg38 that I downloaded from UCSC as the reference in HaplotypeCaller. As far as I understand, these two should contain the same DNA sequence for the standard human chromosomes. With that in mind, I removed all sequencing reads from the input bam file that map to non-standard chromosomes and I removed all non-standard chromosomes (e.g. chrUn_ and _random sequences) from hg38 prior to running gatk, but I was still getting the above NullPointerException. Now that I switched to GRCh38.d1.vd1 as the reference for HaplotypeCaller, I am not getting the exception anymore. All of this makes me think, that the gatk code relies somewhere on using the exact same file for the reference that was used to align the reads. Not doing that crashes the software even though hg38 should have the same DNA sequence and the same chromosome IDs for all standard chromosomes as GRCh38.d1.vd1. I'm not sure if this is considered a bug, but it may be indicative of other problems in the software. Also, a more descriptive error message would be very helpful to diagnose issues like this in future.

    Thanks,
    Petko

Answers

  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin

    @pfiziev

    Please diagnose your bam file according to the instructions provided here.

    Regards
    Bhanu

  • pfizievpfiziev Member
    Accepted Answer

    Hello Bhanu,

    Thank you very much for your reply! I ran ValidateSamFile and even though it showed some minor issues with the input bam files, the problem turned out to be elsewhere.

    The input sequencing reads were aligned against GRCh38.d1.vd1 (which I found out later) and I was using hg38 that I downloaded from UCSC as the reference in HaplotypeCaller. As far as I understand, these two should contain the same DNA sequence for the standard human chromosomes. With that in mind, I removed all sequencing reads from the input bam file that map to non-standard chromosomes and I removed all non-standard chromosomes (e.g. chrUn_ and _random sequences) from hg38 prior to running gatk, but I was still getting the above NullPointerException. Now that I switched to GRCh38.d1.vd1 as the reference for HaplotypeCaller, I am not getting the exception anymore. All of this makes me think, that the gatk code relies somewhere on using the exact same file for the reference that was used to align the reads. Not doing that crashes the software even though hg38 should have the same DNA sequence and the same chromosome IDs for all standard chromosomes as GRCh38.d1.vd1. I'm not sure if this is considered a bug, but it may be indicative of other problems in the software. Also, a more descriptive error message would be very helpful to diagnose issues like this in future.

    Thanks,
    Petko

Sign In or Register to comment.