Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

a question about running HaplotypeCaller with intervals

Hi,

I have a question when running HaplotypeCaller functions with intervals on exome-seq data.
Here is the command I used:
java -jar gatk-package-4.0.6.0-local.jar HaplotypeCaller -R /espresso/share/genomes/hg38/genome.fa -I recal_reads.bam -O variants.g.vcf -ERC GVCF -L capture.bed

However, when I ran the command, I got the following message:
17:13:14.439 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk-4.0.6.0/gatk-package-4.0.6.0-local.jar!/com/intel/gkl/native/libgkl_compression.so 17:13:14.591 INFO HaplotypeCaller - ------------------------------------------------------------ 17:13:14.591 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.0.6.0 17:13:14.591 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/ 17:13:14.591 INFO HaplotypeCaller - Executing as ... on Linux v2.6.32-431.29.2.el6.x86_64 amd64 17:13:14.592 INFO HaplotypeCaller - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_121-b13 17:13:14.592 INFO HaplotypeCaller - Start Date/Time: July 16, 2018 5:13:14 PM EDT 17:13:14.592 INFO HaplotypeCaller - ------------------------------------------------------------ 17:13:14.592 INFO HaplotypeCaller - ------------------------------------------------------------ 17:13:14.592 INFO HaplotypeCaller - HTSJDK Version: 2.16.0 17:13:14.592 INFO HaplotypeCaller - Picard Version: 2.18.7 17:13:14.592 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2 17:13:14.592 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 17:13:14.592 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 17:13:14.592 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 17:13:14.593 INFO HaplotypeCaller - Deflater: IntelDeflater 17:13:14.593 INFO HaplotypeCaller - Inflater: IntelInflater 17:13:14.593 INFO HaplotypeCaller - GCS max retries/reopens: 20 17:13:14.593 INFO HaplotypeCaller - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes 17:13:14.593 INFO HaplotypeCaller - Initializing engine 17:13:15.037 INFO FeatureManager - Using codec BEDCodec to read file file:///capture.bed 17:13:16.883 INFO IntervalArgumentCollection - Processing 64190747 bp from intervals 17:13:17.009 INFO HaplotypeCaller - Shutting down engine [July 16, 2018 5:13:17 PM EDT] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 0.04 minutes. Runtime.totalMemory()=2041053184 java.lang.NullPointerException at java.util.ComparableTimSort.countRunAndMakeAscending(ComparableTimSort.java:325) at java.util.ComparableTimSort.sort(ComparableTimSort.java:202) at java.util.Arrays.sort(Arrays.java:1312) at java.util.Arrays.sort(Arrays.java:1506) at java.util.ArrayList.sort(ArrayList.java:1454) at java.util.Collections.sort(Collections.java:141) at org.broadinstitute.hellbender.utils.IntervalUtils.sortAndMergeIntervals(IntervalUtils.java:459) at org.broadinstitute.hellbender.utils.IntervalUtils.getIntervalsWithFlanks(IntervalUtils.java:956) at org.broadinstitute.hellbender.utils.IntervalUtils.getIntervalsWithFlanks(IntervalUtils.java:971) at org.broadinstitute.hellbender.engine.MultiIntervalLocalReadShard.<init>(MultiIntervalLocalReadShard.java:59) at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.makeReadShards(AssemblyRegionWalker.java:195) at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.onStartup(AssemblyRegionWalker.java:175) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:133) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:180) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:199) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203) at org.broadinstitute.hellbender.Main.main(Main.java:289)

I did not see any error but it seems HaplotypeCaller did not run and there is no output.
So I will really appreciate it if I can get help from you guys.

Thank you!

Best,
Siyu

Tagged:

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @liusiyu93
    Hi Siyu,

    Perhaps this thread will help.

    -Sheila

  • vortexingvortexing Member
    edited March 14
    I'm now having this same error come up, and I read that other thread and I don't see that any of the suggestions beyond perhaps the java version are relevant to what I'm working on. I'm not trying to use -nct, etc.

    I'm using GATK/4.0.10.0, Java HotSpot(TM) 64-Bit Server VM v1.8.0_181-b13

    I have gotten this error every time I add an interval file in bed format to any of the GATK tools (in this case BaseRecalibrator, but the same error comes up for ApplyBQSR and HaplotypeCaller as well.

    ```
    19:58:10.729 INFO FeatureManager - Using codec BEDCodec to read file file://intervals.bed
    19:58:11.085 INFO BaseRecalibrator - Shutting down engine
    [March 13, 2019 7:58:11 PM PDT] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 0.06 minutes.
    Runtime.totalMemory()=4761059328
    java.lang.NullPointerException
    at java.util.ComparableTimSort.countRunAndMakeAscending(ComparableTimSort.java:325)
    at java.util.ComparableTimSort.sort(ComparableTimSort.java:202)
    at java.util.Arrays.sort(Arrays.java:1312)
    at java.util.Arrays.sort(Arrays.java:1506)
    at java.util.ArrayList.sort(ArrayList.java:1462)
    at java.util.Collections.sort(Collections.java:141)
    at org.broadinstitute.hellbender.utils.IntervalUtils.sortAndMergeIntervals(IntervalUtils.java:455)
    at org.broadinstitute.hellbender.utils.IntervalUtils.getIntervalsWithFlanks(IntervalUtils.java:953)
    at org.broadinstitute.hellbender.utils.IntervalUtils.loadIntervals(IntervalUtils.java:225)
    at org.broadinstitute.hellbender.cmdline.argumentcollections.IntervalArgumentCollection.parseIntervals(IntervalArgumentCollection.java:174)
    at org.broadinstitute.hellbender.cmdline.argumentcollections.IntervalArgumentCollection.getTraversalParameters(IntervalArgumentCollection.java:155)
    at org.broadinstitute.hellbender.cmdline.argumentcollections.IntervalArgumentCollection.getIntervals(IntervalArgumentCollection.java:111)
    at org.broadinstitute.hellbender.engine.GATKTool.initializeIntervals(GATKTool.java:445)
    at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:640)
    at org.broadinstitute.hellbender.engine.ReadWalker.onStartup(ReadWalker.java:50)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:137)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)
    ````

    This is what I'm running:

    ```
    gatk --java-options "-Xms4g" BaseRecalibrator \
    -R Homo_sapiens_assembly38.fasta \
    -I merged.bam \
    -O recal_data.csv \
    --known-sites ${dbSNP_vcf} \
    --known-sites ${known_indels_hg38} \
    --known-sites ${known_indels_Mills} \
    --intervals intervals.bed \
    --interval-padding 100
    ```


    I have tried to make sure my bed file is sorted in the same as my reference fasta via:

    ```sort -k1,1V -k2,2n -k3,3n intervals-unsorted.bed >intervals.bed```

    That did not help. Thoughts?
  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    Can you sort your bed using bedtools and try again?

  • vortexingvortexing Member
    I did, verified the correct sort order, and still the exact same error. :( Any other wisdom?
  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @vortexing,

    Please try with BedToIntervalList or with IntervalListTools. These tools implicitly allow for sorting based on your reference sequence dictionary.

  • vortexingvortexing Member
    I actually did that earlier but ran into a different snafu and I've blanked on what that was. Also, I'm realizing I had run this pipeline/these tools/on these data via Cromwell using the Broad's Docker container and this problem did not exist. This is something that's arising on our local cluster. Is there something I could check to see why these bed/data problems were not an issue in the docker container but are locally?
  • vortexingvortexing Member
    This was what it was doing locally when I used BedToIntervalList:
    ```
    Exception in thread "main" picard.PicardException: Start on sequence 'chr10' was past the end: 133797422 < 135043782
    at picard.util.BedToIntervalList.doWork(BedToIntervalList.java:159)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:282)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108)
    ```
    Using a bed file from Illumina directly for one of their panels, and the Homo_sapiens_assembly38.dict from the GATK bundle.
  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hmm @vortexing,

    What version of the tool are you using and can you post your command?

    The error message comes from this code. It appears the start position of a bed interval is past the end of the contig it maps to. Is it possible you have mismatched your reference?

    If the above does not help us solve the issue, I think the discussion in https://github.com/broadinstitute/picard/pull/1292 worth looking into. The fixes in this PR that were merged ~a week ago. So changes would be in the next Picard release.

  • vortexingvortexing Member
    edited March 19
    Version was: picard/2.18.1-Java-1.8.0_121, so multiple versions behind.

    Command was the basics.
    ```
    java -jar picard.jar BedToIntervalList \
    I=input.bed \
    O=list.interval_list \
    SD=reference_sequence.dict
    ```
    Should we update and try to see if this discussed fix was the problem? I'm worried as to why I suddenly have this bed file sorting problem to begin with b/c with this (broadinstitute/gatk:4.0.4.0) docker container I had no problems but now with my local cluster and GATK/4.0.10.0-Python-3.6.6 the problem arises.
  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭
    edited March 20

    @vortexing, looks like Picard v2.18.29 just released this morning. Can you try it and see if it solves the issue? Also can you pull out your chr10 dictionary line, e.g. with cat reference_sequence.dict | grep 'SN:chr10' and post it here? We should confirm the contig length, which is given by the LN field of the dictionary. Apologies for the delay in follow-up. I was out today for jury duty summons.

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    One thing that I noticed. You are using hg38 to map your reads but are you sure that your capture intervals are compatible with hg38. Some of the hg38 contigs are actually shorter than their hg19 b37 counterparts.

  • vortexingvortexing Member
    Crap. The bed files I'd tested were all not lifted over from hg19. ARGH. This still worries me as to why it WASN'T erroring out when done via WDL vs local bash. Let me see if the current versions of the tools I have will work with the correct genome for the bed file and report back. Sorry to have this red herring thrown into the mix!! But I guess it's good to have discovered this though better if it was less circuitous.
Sign In or Register to comment.