The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.
Register now for the upcoming GATK Best Practices workshop, Feb 20-22 in Leuven, Belgium. Open to all comers! More info and signup at http://bit.ly/2i4mGxz

# HaplotypeCaller LikelihoodCalculationEngine.computeDiploidHaplotypeLikelihoods NullPointerException

LondonMember Posts: 19

java.lang.NullPointerException
at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:708) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:704)
at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.ThreadPoolExecutorWorker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Program Args: -nct 5 -T HaplotypeCaller -R ../ref/ref.fa -L chr10 -I <48 bam files> -o chr10.raw.vcf Tagged: ## Best Answer • Administrator, Dev Posts: 10,993 admin Answer ✓ And yes, failed runs get logged via phone home; it informs us of which tools are failing and how (user error or code exception). Geraldine Van der Auwera, PhD ## Answers • LondonMember Posts: 19 Do these get logged anyway via phone home? • Administrator, Dev Posts: 10,993 admin Hi @AdminTim, First, a message from your friendly neighborhood tech support monkeys: we do generally appreciate when a question actually includes a sentence that briefly describes the context of the problem. Beyond the informative content, it feels more like we're being addressed as fellow humans, not machines. Thank you for keeping this in mind in future. Now to your problem: can you tell me which version of the program you're using, what kind of data you're running on (such as seq platform and organism if not human), and whether your input files have been validated? Geraldine Van der Auwera, PhD • Administrator, Dev Posts: 10,993 admin Answer ✓ And yes, failed runs get logged via phone home; it informs us of which tools are failing and how (user error or code exception). Geraldine Van der Auwera, PhD • LondonMember Posts: 19 edited October 2013 Hi Geraldine, Thanks for replying, TBH I was doing my best to give a quick bug report before trying a different tack rather than specifically asking for help, apologies for my terseness - no disrespect intended. The version in use is v2.7-4-g6f46d11 Illumina reads for macaque fascicularis avg 100x coverage exome sequencing. There are 48 bam files generated from bwa (0.5.9) against the MMUL_1 reference. They have (added) headers/read groups and pass validation with only a few read-pair and quite a few nucleotide missmatch count warnings. The bam files have been through indel realigner with no reference variants (I'm planning to bootstrap this). I was running multiple HaplotypeCaller's in parallel by giving a per-chromosome location to each process each using the command line above. I was getting very long predicted HaplotypeCaller run times for some chromosomes (> 2w) and these stayed high even after several days of execution. I've just completed running the same per-chromosome approach using UnifiedGenotyper and it has completed over night with no errors using variations on the following bash script: for c in 13 14 15 16 17 18 19 20 X; do (java -Xmx2g -jar ~/sw/GenomeAnalysisTK.jar -nt 3 -T UnifiedGenotyper -R ../ref/MMUL1.fa -L chrc -nda --max_alternate_alleles 10 -stand_call_conf 50.0 -stand_emit_conf 10.0 -glm BOTH for file in ../bam/*.ir.sorted.realigned.bam; do echo -n "-I $file "; done -o chr$c.ir.raw.vcf &> chrc.ir.raw.vcf.log &); done  I would like to be able to run HaplotypeCaller on these if possible, but would need to find a way to reduce the run time to 2-3 days and find out what I'm doing wrong to get the exception. I was using the default stand_x_conf values (30,30?) for HaplotypeCaller rather than those I used for Unified. Many thanks again for your great support! Tim Post edited by Geraldine_VdAuwera on • Administrator, Dev Posts: 10,993 admin Hi Tim, No worries, I just get cranky sometimes I understand the impulse to be brief, but the fact is if you're too brief, I can't do anything with your report, so there's no point even posting it at all. Your follow up is much more informative, thank you! On the issue of long runtimes with HC, you have some options: parallelize (multithreading and scatter-gather, which we recommend doing with Queue rather than with a shell script), decrease the max alternate alleles to consider, decrease minPruning, use an exome targets list (big wins there), and use ReduceReads. Not sure why you're getting that exception though; if your files pass validation you should be fine. If you can isolate the error to a snippet file we'd be happy to debug it locally. Geraldine Van der Auwera, PhD • LondonMember Posts: 19 Hi Geraldine, I'll re-check the files and see if I can narrow it down. Re Queue, I had a go at a script but realised I couldn't run scatter/gather over three machines (no lsf/grid) so resorted to split by chromosome and run different subsets on different boxes using bash. Is there a simple way to do multi-machine S/G without lsf/grid? Thanks Tim • Administrator, Dev Posts: 10,993 admin Sadly no, not that I know of. We will try to address that problem in the future, but for now you're SOL if you don't have a cluster. Re: the bug, I was reminded that we've seen some similar issues pop up non-reproducibly when HC is used with multithreading. We typically don't see it happen because we always scatter-gather HC jobs with Queue, and Queue retries any jobs that failed, so for non-persistent errors it's a non-issue for us. But others have reported it. We'd like to fix it but it's really hard to debug something that doesn't reproduce predictably. Geraldine Van der Auwera, PhD • LondonMember Posts: 19 Ok - I'll let you know if I can narrow it down. BTW I commented on another post and wonder if it's missed your 'net' I'm keen to know the answer as it has an impact on how important it is for me to run HaplotypeCaller: gatk.vanillaforums.com/discussion/comment/9979/#Comment_9979 Thanks Tim • Administrator, Dev Posts: 10,993 admin OK, good luck. Re: your other question, it's in my net, but I'm not 100% sure myself so I've asked the main developers of HC to comment. They're pretty busy so it might take a little longer than usual to get an answer. Geraldine Van der Auwera, PhD • LondonMember Posts: 19 I get a few errors like this from bam file validation are these a problem?: ERROR: Record 84670094, Read name FCC1DH9ACXX:8:1113:19552:36547#0, Mate unmapped flag does not match read unmapped flag of mate ERROR: Record 75490280, Read name FCC1DK7ACXX:7:2116:20766:6076#0, Mate unmapped flag does not match read unmapped flag of mate ERROR: Record 89986869, Read name FCD1J9YACXX:5:1306:6763:84738#0, Mate unmapped flag does not match read unmapped flag of mate Thanks Tim • Administrator, Dev Posts: 10,993 admin Hi Tim, As I recall those errors are minor and shouldn't have any effect, but you can always try running Picard's FixMates on your file. Geraldine Van der Auwera, PhD • UKMember Posts: 4 hello, I have same error message as the Tim and I can reproduce the error. My GATK is v2.7.2 (SW update is out of my control here) and I ran through grid cluster with each node 8 cores. I'm working on a cow whole genome with reference sequence from Ensembl. I ran initial variant call with v2.6.4 of UnifiedGenotyper and HaplotypeCaller, and used the subset of indel calls for v2.7.2 of indel realignment and proceeded to BQSR. UnifiedGenotyper finished ok but I got the error on my second round of HaplotypeCaller run. I resubmitted the job and got same error at similar position; INFO 16:37:44,916 HelpFormatter - -------------------------------------------------------------------------------- INFO 16:37:44,918 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.7-2-g6bda569, Compiled 2013/08/28 16:30:29 .. INFO 16:37:44,922 HelpFormatter - Program Args: -T HaplotypeCaller -R ref.fa -I large.rln.recal.bam -I small.rln.recal.bam --dbsnp Bos_taurus.sorted.ms_filter.vcf -l INFO -o both.rln.recal.vcf -nct 8 -dcov 200 -baqGOP 30.0 INFO 16:37:44,922 HelpFormatter - Date/Time: 2013/11/05 16:37:44 ... WARN 08:52:58,711 DiploidExactAFCalc - this tool is currently set to genotype at most 6 alternate alleles in a given context, but the context at 14:433749 has 7 alternate alleles so only the top alleles will be used; see the --max_alternate_alleles argument INFO 08:53:58,007 ProgressMeter - 14:457413 1.50e+09 16.2 h 39.0 s 56.1% 29.0 h 12.7 h INFO 08:54:32,884 GATKRunReport - Uploaded run statistics report to AWS S3 ##### ERROR ------------------------------------------------------------------------------------------ ##### ERROR stack trace java.lang.NullPointerException at org.broadinstitute.sting.gatk.walkers.haplotypecaller.LikelihoodCalculationEngine.computeDiploidHaplotypeLikelihoods(LikelihoodCalculationEngine.java:379) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.LikelihoodCalculationEngine.computeDiploidHaplotypeLikelihoods(LikelihoodCalculationEngine.java:353) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.calculateGLsForThisEvent(GenotypingEngine.java:338) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.assignGenotypeLikelihoods(GenotypingEngine.java:205) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:778) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:140) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegionsTraverseActiveRegionMap.apply(TraverseActiveRegions.java:708)
at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:704) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.ThreadPoolExecutorWorker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) ##### ERROR ------------------------------------------------------------------------------------------ ##### ERROR A GATK RUNTIME ERROR has occurred (version 2.7-2-g6bda569): When I ran the picard ValidateSamFile, I got "NM tag ... does not match reality" and "..Mate not found for paired end" error. Is this crucial? I'll run picard check on initial bam file and get back. thanks, Mikyung • UKMember Posts: 4 hello again, I'd add that the sequencing was done on MiSeq paired end and aligned with BWA, duplicate removed by Picard. Picard's ValidateSameFile with ignoring INVALID_TAG_NM and MATE_NOT_FOUND found no other errors on the input bam files. I'm trying to run the HaplotypeCaller for third time, to see if I get same error again. thanks, Mikyung • Administrator, Dev Posts: 10,993 admin Hi @mjang, If you can produce a file snippet that reproduces the error, we'd be happy to have a look at it and try to debug locally. Geraldine Van der Auwera, PhD • SeattleMember Posts: 3 hello, I'm running 32 mouse exomes that have been mapped with bwa and have successfully had variants called using unified genotyper. However when trying to run haplotype caller, i get the same error message as above after 40h of a ~100h run. Full error message is: ##### ERROR ------------------------------------------------------------------------------------------ ##### ERROR stack trace java.lang.NullPointerException at org.broadinstitute.sting.gatk.walkers.haplotypecaller.LikelihoodCalculationEngine.computeDiploidHaplotypeLikelihoods(LikelihoodCalculationEngine.java:379) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.LikelihoodCalculationEngine.computeDiploidHaplotypeLikelihoods(LikelihoodCalculationEngine.java:353) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.calculateGLsForThisEvent(GenotypingEngine.java:338) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.assignGenotypeLikelihoods(GenotypingEngine.java:205) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:778) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:140) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegionsTraverseActiveRegionMap.apply(TraverseActiveRegions.java:708)
at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:704) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

##### ERROR ------------------------------------------------------------------------------------------

Any help would be great.

Thanks

Andrew

• UKMember Posts: 4

hello Geraldine,

0) original log file(s)

1) [chr14 part (where HC crashed) of] Bam file

2) [chr14 part of] ref fasta file (.fa.gz)

3) [chr14 part of] known variant file (.vcf.gz)

4) [chr14 part of] variant output file(s) up until (.vcf.gz)

thanks,
Mikyung

• UKMember Posts: 4

hello Geraldine and team,
While I was waiting for the answers, I tried my third run of HC and it finished this time.
This time the estimated runtime stayed >47h and finished after 50.1h.
At the two previous failed instances, estimated runtime were 29h and 33h each and died at 16h and 18.4h each. I might run another round of indel realignment to variant calling. I'll be back if I have error again.

thanks,
Mikyung

• SeattleMember Posts: 3

@mjang said:
hello Geraldine and team,
While I was waiting for the answers, I tried my third run of HC and it finished this time.
This time the estimated runtime stayed >47h and finished after 50.1h.
At the two previous failed instances, estimated runtime were 29h and 33h each and died at 16h and 18.4h each. I might run another round of indel realignment to variant calling. I'll be back if I have error again.

thanks,
Mikyung

thanks for posting this... i'm going to re-run my hc analysis too..

did you change anything before your third run?

It seems that these are further instances of a transient bug that we haven't yet nailed down because we can't reproduce it locally. To deal with this, we recommend scatter-gathering your jobs so that if one region fails due to the bug, you can just restart that and not the entire job. You can do this failry easily with Queue.

Geraldine Van der Auwera, PhD

• SeattleMember Posts: 3

@Geraldine_VdAuwera said:
It seems that these are further instances of a transient bug that we haven't yet nailed down because we can't reproduce it locally. To deal with this, we recommend scatter-gathering your jobs so that if one region fails due to the bug, you can just restart that and not the entire job. You can do this failry easily with Queue.

• LondonMember Posts: 19

I ran the HC without threading (no -nct or nt) and splitting up the file based on my exome sequenced regions and running those in parallel (without using Queue at the moment) and with that setup I didn't get any exceptions. Might be worth putting a note against the HC -nt option that it is buggy?

Thanks

Tim