GATK 2.6.2 Exceptions with HaplotypeCaller and -nct

aeonsimaeonsim Posts: 64Member ✭✭

When trying to run the HaplotypeCaller in 2.6.2 with -nct I'm getting a number of crashes. Is NCT currently supported or is this experimental for the HaplotypeCaller currently? With the Multithreading I'm not exactly sure where the error is occurring and it's a pretty big bam. If needed I can try to narrow it down a bit further and create a subset bam...


../jre1.7.0_25/bin/java -jar ../GenomeAnalysisTK-2.6-2-ge03a5e9/GenomeAnalysisTK.jar -R ../refs/bosTau6.lic.fa -T HaplotypeCaller -I ../Chr15.ir.bam -bamout Chr15.bam -o Chr15.vcf.gz -L Chr15 -nct 5 -rf BadCigar

##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace 
java.lang.NullPointerException
        at net.sf.samtools.SAMRecordCoordinateComparator.fileOrderCompare(SAMRecordCoordinateComparator.java:82)
        at net.sf.samtools.SAMRecordCoordinateComparator.compare(SAMRecordCoordinateComparator.java:43)
        at net.sf.samtools.SAMRecordCoordinateComparator.compare(SAMRecordCoordinateComparator.java:41)
        at java.util.TimSort.countRunAndMakeAscending(Unknown Source)
        at java.util.TimSort.sort(Unknown Source)
        at java.util.Arrays.sort(Unknown Source)
        at net.sf.samtools.util.SortingCollection.spillToDisk(SortingCollection.java:203)
        at net.sf.samtools.util.SortingCollection.add(SortingCollection.java:150)
        at net.sf.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:170)
        at org.broadinstitute.sting.gatk.io.storage.SAMFileWriterStorage.addAlignment(SAMFileWriterStorage.java:94)
        at org.broadinstitute.sting.gatk.io.stubs.SAMFileWriterStub.addAlignment(SAMFileWriterStub.java:307)
        at org.broadinstitute.sting.utils.haplotypeBAMWriter.HaplotypeBAMWriter.writeHaplotype(HaplotypeBAMWriter.java:310)
        at org.broadinstitute.sting.utils.haplotypeBAMWriter.HaplotypeBAMWriter.writeHaplotypesAsReads(HaplotypeBAMWriter.java:285)
        at org.broadinstitute.sting.utils.haplotypeBAMWriter.CalledHaplotypeBAMWriter.writeReadsAlignedToHaplotypes(CalledHaplotypeBAMWriter.java:87)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:733)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:138)
        at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:708)
        at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:704)
        at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471)
        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 2.6-2-ge03a5e9):
##### ERROR
##### ERROR Please check the documentation guide to see if this is a known problem
##### ERROR If not, please post the error, with stack trace, to the GATK forum
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Code exception (see stack trace for error itself)
##### ERROR ------------------------------------------------------------------------------------------

and with more threads allowed:


../jre1.7.0_25/bin/java -jar ../GenomeAnalysisTK-2.6-2-ge03a5e9/GenomeAnalysisTK.jar -R ../refs/bosTau6.lic.fa -T HaplotypeCaller -I ../Chr15.ir.bam -bamout Chr15.bam -o Chr15.vcf.gz -L Chr15 -nct 16 -rf BadCigar

#### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace 
java.lang.IllegalArgumentException: Comparison method violates its general contract!
        at java.util.TimSort.mergeLo(Unknown Source)
        at java.util.TimSort.mergeAt(Unknown Source)
        at java.util.TimSort.mergeCollapse(Unknown Source)
        at java.util.TimSort.sort(Unknown Source)
        at java.util.Arrays.sort(Unknown Source)
        at net.sf.samtools.util.SortingCollection.spillToDisk(SortingCollection.java:203)
        at net.sf.samtools.util.SortingCollection.add(SortingCollection.java:150)
        at net.sf.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:170)
        at org.broadinstitute.sting.gatk.io.storage.SAMFileWriterStorage.addAlignment(SAMFileWriterStorage.java:94)
        at org.broadinstitute.sting.gatk.io.stubs.SAMFileWriterStub.addAlignment(SAMFileWriterStub.java:307)
        at org.broadinstitute.sting.utils.haplotypeBAMWriter.HaplotypeBAMWriter.writeReadAgainstHaplotype(HaplotypeBAMWriter.java:196)
        at org.broadinstitute.sting.utils.haplotypeBAMWriter.CalledHaplotypeBAMWriter.writeReadsAlignedToHaplotypes(CalledHaplotypeBAMWriter.java:103)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:733)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:138)
        at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:708)
        at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:704)
        at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471)
        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 2.6-2-ge03a5e9):
##### ERROR
##### ERROR Please check the documentation guide to see if this is a known problem
##### ERROR If not, please post the error, with stack trace, to the GATK forum
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Comparison method violates its general contract!
##### ERROR ------------------------------------------------------------------------------------------

Best Answer

Answers

  • aeonsimaeonsim Posts: 64Member ✭✭

    Note these may be due to a bug with the handling of -BAMOUT with NCT. If I remove -bamout the job appears to continue running with out exceptions.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,423Administrator, GATK Developer admin

    Hmm, I'm not sure -- let me pass this on to the team.

    Geraldine Van der Auwera, PhD

  • aeonsimaeonsim Posts: 64Member ✭✭

    Any chance of the two options being made compatible in the future? The bamout option was very useful for seeing what exactly was happening with the indels. While NCT simplifies getting the HC to run a decent rate with out having to deal with 10000+ subfiles that all need to be merged.

    Both together would be ideal.

  • Mark_DePristoMark_DePristo Posts: 153Administrator, GATK Developer admin

    I'm curious how you are using bamout? It's not efficiently implemented -- its really more of a debugging tool -- so running even without multiple threads I suspect that bamout must be slowing down the caller. Is that not your experience? Or does it just not matter, given that you can more easily understand what the HC is doing? Would some other type of output work better?

    It's entirely possible to make the bamout option work with multiple threads. It's just a bit complex, since the reads could be coming out of order. I'll throw it in JIRA

    -- Mark A. DePristo, Ph.D. Co-Director, Medical and Population Genetics Broad Institute of MIT and Harvard

  • aeonsimaeonsim Posts: 64Member ✭✭

    Hi Mark I've been using bamout to help our more biologically orientated staff to see what exactly the HC has done with the reads when making the call. The last validation step they do is simply to check the bam for the population to make sure it makes sense based on the Genotype supplied by GATK.

    It allows them to have a look at bigger indels where the input bam is ambiguous as well as look at regions where there are multiple indels and check for compensatory mutations ( ie two frameshifts canceling each other out) as well as other error types that VQSR and other filtering have problems dealing with.

    Also if your discussing the possible impact of an indel it's nice to be able to stick up a clear image from IGV showing the Indel and where it fits in the Genome and what is near by.

  • modi2020modi2020 Posts: 15Member

    Hi Geraldine, I am trying to run the HaplotypeCaller on version GenomeAnalysisTK-2.4-9. I used the commands below: java -Xmx6g -jar /home//GenomeAnalysisTK.jar -T HaplotypeCaller -nct 5 -R equcab2.fa -I /home/cleaned.sorted.bam1 -I /home/cleaned.sorted.bam2 -I /home/cleaned.sorted.bam3 -stand_call_conf 20 -stand_emit_conf 10.0 -o output.raw.snps.indels.vcf

    When I run the command, I receive the following error:

    `INFO 14:12:21,883 HelpFormatter - Date/Time: 2013/08/21 14:12:21 INFO 14:12:21,883 HelpFormatter - -------------------------------------------------------------------------------- INFO 14:12:21,883 HelpFormatter - -------------------------------------------------------------------------------- INFO 14:12:21,952 GenomeAnalysisEngine - Strictness is SILENT INFO 14:12:22,057 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 250 INFO 14:12:22,064 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 14:12:22,091 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.03 INFO 14:12:22,115 MicroScheduler - Running the GATK in parallel mode with 5 total threads, 5 CPU thread(s) for each of 1 data thread(s), of 8 processors available on this machine INFO 14:12:22,739 GATKRunReport - Uploaded run statistics report to AWS S3

    ERROR ------------------------------------------------------------------------------------------
    ERROR A USER ERROR has occurred (version 2.4-9-g532efad):
    ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
    ERROR Please do not post this error to the GATK forum
    ERROR
    ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Invalid command line: Argument nct has a bad value: The analysis HaplotypeCaller currently does not support parallel execution with nct. Please run your analysis without the nct option.
    ERROR ------------------------------------------------------------------------------------------

    ` I am sure the documentation says that the HaplotypeCaller does support parallel execution as in "http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_haplotypecaller_HaplotypeCaller.html"

    What do you think the problem may be?

    Thank you

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,423Administrator, GATK Developer admin

    Well, the documentation on the website is for version 2.6. I would have to check but I believe version 2.4 wasn't yet capable of running the HC multithreaded. I would recommend you update to the latest version to take advantage of the latest performance improvements.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.