HaplotypeCaller stopping midway without error, probably ram related

I'm running the HaplotypeCaller on a series of samples using a while loop in a bash script and for some samples the HaplotypeCaller is stopping part way through the file. My command was:
java -Xmx18g -jar $Gpath/GenomeAnalysisTK.jar \ -nct 8 \ -l INFO \ -R $ref \ -log $log/$plate.$prefix.HaplotypeCaller.log \ -T HaplotypeCaller \ -I $bam/$prefix.realign.bam \ --emitRefConfidence GVCF \ -variant_index_type LINEAR \ -variant_index_parameter 128000 \ -o $gvcf/$prefix.GATK.gvcf.vcf

Most of the samples completed and the output looks good, but for some I only have a truncated gvcf file with no index. When I look at the log it looks like this:

INFO  17:25:15,289 HelpFormatter - --------------------------------------------------------------------------------
INFO  17:25:15,291 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.1-1-g07a4bf8, Compiled 2014/03/18 06:09:21
INFO  17:25:15,291 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO  17:25:15,291 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO  17:25:15,294 HelpFormatter - Program Args: -nct 8 -l INFO -R /home/owens/ref/Gasterosteus_aculeatus.BROADS1.73.dna.toplevel.fa -log /home/owens/SB/C31KCACXX05.log/C31KCACXX05.sb1Pax102L-S2013.Hap
INFO  17:25:15,296 HelpFormatter - Executing as [email protected] on Linux 3.2.0-63-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_17-b02.
INFO  17:25:15,296 HelpFormatter - Date/Time: 2014/06/10 17:25:15
INFO  17:25:15,296 HelpFormatter - --------------------------------------------------------------------------------
INFO  17:25:15,296 HelpFormatter - --------------------------------------------------------------------------------
INFO  17:25:15,722 GenomeAnalysisEngine - Strictness is SILENT
INFO  17:25:15,892 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 250
INFO  17:25:15,898 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO  17:25:15,942 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.04
INFO  17:25:15,948 HCMappingQualityFilter - Filtering out reads with MAPQ < 20
INFO  17:25:15,993 MicroScheduler - Running the GATK in parallel mode with 8 total threads, 8 CPU thread(s) for each of 1 data thread(s), of 12 processors available on this machine  
INFO  17:25:16,097 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
INFO  17:25:16,114 GenomeAnalysisEngine - Done preparing for traversal
INFO  17:25:16,114 ProgressMeter -        Location processed.active regions  runtime per.1M.active regions completed total.runtime remaining
INFO  17:25:16,114 HaplotypeCaller - Standard Emitting and Calling confidence set to 0.0 for reference-model confidence output
INFO  17:25:16,116 HaplotypeCaller - All sites annotated with PLs force to true for reference-model confidence output
INFO  17:25:16,278 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units
INFO  17:25:46,116 ProgressMeter - scaffold_1722:1180        1.49e+05   30.0 s        3.3 m      0.0%        25.6 h    25.6 h
INFO  17:26:46,117 ProgressMeter - scaffold_279:39930        1.37e+07   90.0 s        6.0 s      3.0%        50.5 m    49.0 m
INFO  17:27:16,118 ProgressMeter - scaffold_139:222911        2.89e+07  120.0 s        4.0 s      6.3%        31.7 m    29.7 m
INFO  17:27:46,119 ProgressMeter - scaffold_94:517387        3.89e+07    2.5 m        3.0 s      8.5%        29.2 m    26.7 m
INFO  17:28:16,121 ProgressMeter - scaffold_80:591236        4.06e+07    3.0 m        4.0 s      8.9%        33.6 m    30.6 m
INFO  17:28:46,123 ProgressMeter - groupXXI:447665        6.07e+07    3.5 m        3.0 s     13.3%        26.4 m    22.9 m
INFO  17:29:16,395 ProgressMeter -  groupV:8824013        7.25e+07    4.0 m        3.0 s     17.6%        22.7 m    18.7 m
INFO  17:29:46,396 ProgressMeter - groupXIV:11551262        9.93e+07    4.5 m        2.0 s     24.0%        18.7 m    14.2 m
WARN  17:29:52,732 ExactAFCalc - this tool is currently set to genotype at most 6 alternate alleles in a given context, but the context at groupX:1516679 has 8 alternate alleles so only the top alleles
INFO  17:30:19,324 ProgressMeter - groupX:14278234        1.15e+08    5.1 m        2.0 s     27.9%        18.1 m    13.0 m
INFO  17:30:49,414 ProgressMeter - groupXVIII:5967453        1.46e+08    5.6 m        2.0 s     33.0%        16.8 m    11.3 m
INFO  17:31:19,821 ProgressMeter - groupXI:15030145        1.63e+08    6.1 m        2.0 s     38.5%        15.7 m     9.7 m
INFO  17:31:50,192 ProgressMeter - groupVI:5779653        1.96e+08    6.6 m        2.0 s     43.8%        15.0 m     8.4 m
INFO  17:32:20,334 ProgressMeter - groupXVI:18115788        2.13e+08    7.1 m        1.0 s     50.1%        14.1 m     7.0 m
INFO  17:32:50,335 ProgressMeter - groupVIII:4300439        2.50e+08    7.6 m        1.0 s     55.1%        13.7 m     6.2 m
INFO  17:33:30,336 ProgressMeter - groupXIII:2378126        2.89e+08    8.2 m        1.0 s     63.1%        13.0 m     4.8 m
INFO  17:34:02,099 GATKRunReport - Uploaded run statistics report to AWS S3

It seems like it got half way through and stopped. I think it's a memory issue because when I increased the available ram to java, the problem happens less, although I can't figure out why some samples work and others don't (there isn't anything else running on the machine using ram and the biggest bam files aren't failing). It's also strange to me that there doesn't seem to be an error message. Any insight into why this is happening and how to avoid it would be appreciated.


  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin



    You can try decreasing the -nct value, however this should not be happening. Perhaps your platform does not support GATK?


  • Greg_OwensGreg_Owens Member

    Hi Sheila,

    The Indelrealigner seems to run fine on my system. It's Ubuntu 12.04, I'm not sure if that's a platform supported by GATK or not.


  • Greg_OwensGreg_Owens Member

    I reran the samples that didn't work with a lower -nct value. Some worked and others didn't so it seems rather random. I noticed that when it had an error it did actually output an error message, it just wasn't saved to the log file. I thought it would with the -log. The error was:

    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR stack trace
            at java.lang.String.checkBounds(String.java:374)
            at java.lang.String.<init>(String.java:314)
            at net.sf.samtools.util.StringUtil.bytesToString(StringUtil.java:301)
            at net.sf.samtools.BAMRecord.decodeReadName(BAMRecord.java:331)
            at net.sf.samtools.BAMRecord.getReadName(BAMRecord.java:220)
            at org.broadinstitute.sting.gatk.walkers.haplotypecaller.readthreading.ReadThreadingGraph.addRead(ReadThreadingGraph.java:543)
            at org.broadinstitute.sting.gatk.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.createGraph(ReadThreadingAssembler.java:163)
            at org.broadinstitute.sting.gatk.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.assemble(ReadThreadingAssembler.java:112)
            at org.broadinstitute.sting.gatk.walkers.haplotypecaller.LocalAssemblyEngine.runLocalAssembly(LocalAssemblyEngine.java:168)
            at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.assembleReads(HaplotypeCaller.java:961)
            at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:825)
            at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:141)
            at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:708)
            at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:704)
            at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
            at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
            at java.util.concurrent.FutureTask.run(FutureTask.java:166)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
            at java.lang.Thread.run(Thread.java:722)
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A GATK RUNTIME ERROR has occurred (version 3.1-1-g07a4bf8):
    ##### ERROR
    ##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ##### ERROR Visit our website and forum for extensive documentation and answers to
    ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ##### ERROR
    ##### ERROR MESSAGE: Code exception (see stack trace for error itself)
    ##### ERROR ------------
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin


    Hi Greg,

    This looks like a potential error with your Bam files. You should validate them with Picard ValidateSAMFile. Please read about it here: http://picard.sourceforge.net/command-line-overview.shtml#ValidateSamFile


  • Greg_OwensGreg_Owens Member

    All my samples eventually worked when rerun. I'm not sure what this means.
    When I validated my bam files, they all had errors, on a few records, mainly:
    -Mate CIGAR string does not match CIGAR string of mate
    -Mate alignment does not match alignment start of mate
    -Mate negative strand flag does not match read negative strand flag of mate

    I don't think these bam errors were causing my trouble because the bam files eventually worked without changing them.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin


    I guess it was a problem with -nct then.


  • barakbarak Member

    My 2 cents.
    We have had some problems with HC on our cluster. We did some tweaks in the Java garbage collector suggested here: http://sourceforge.net/p/picard/wiki/Main_Page/#q-i-got-the-error-javaioioexception-map-failed-how-can-i-fix-the-problem

    That is, adding this parameter seems to help: XX:MaxDirectMemorySize=4G.

    We will update if the issue was solved.

Sign In or Register to comment.