Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Error messages when running StructuralVariationDiscoveryPipelineSpark

Dear GATK developers, I am running the following command to perform SV calling with StructuralVariationDiscoveryPipelineSpark:

nohup /home/simone/software/gatk-4.0.3.0/gatk --java-options -Xmx100G StructuralVariationDiscoveryPipelineSpark \
-I /home/simone/Project_Nebbiolo/10x_Genomics/n423_SVs_validation_alignment/outs/n423.bam \
-R Neb71.PB.Primary.plus.Haplotigs_merged.2bit \
--contig-sam-file n423_SVs_GATK_SV_contigs.sam \
--aligner-index-image Neb71.PB.Primary.plus.Haplotigs_merged.fasta.img \
--kmers-to-ignore BadKmers.out \
-O n423_GATK_SVs.vcf &

and i keep getting these ERROR messages:

19/05/27 08:12:34 ERROR TaskSchedulerImpl: Lost executor driver on localhost: Executor heartbeat timed out after 143709 ms
19/05/27 16:41:48 ERROR Executor: Exception in task 15.0 in stage 8.0 (TID 7111)
19/05/27 16:41:48 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker for task 7102,5,main]
19/05/27 16:54:19 ERROR Executor: Exception in task 31.0 in stage 8.0 (TID 7127)
19/05/27 16:41:48 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker for task 7099,5,main]
19/05/27 17:46:42 ERROR Executor: Exception in task 26.0 in stage 8.0 (TID 7122)
19/05/27 19:05:49 ERROR SparkUncaughtExceptionHandler: [Container in shutdown] Uncaught exception in thread Thread[Executor task launch worker for task 7121,5,main]
19/05/27 18:46:10 ERROR Executor: Exception in task 20.0 in stage 8.0 (TID 7116)
19/05/27 19:34:32 ERROR SparkUncaughtExceptionHandler: [Container in shutdown] Uncaught exception in thread Thread[Executor task launch worker for task 7108,5,main]
19/05/27 19:48:22 ERROR Executor: Exception in task 32.0 in stage 8.0 (TID 7128)
19/05/27 19:48:22 ERROR SparkUncaughtExceptionHandler: [Container in shutdown] Uncaught exception in thread Thread[Executor task launch worker for task 7128,5,main]
19/05/27 20:10:58 ERROR Executor: Exception in task 33.0 in stage 8.0 (TID 7129)
19/05/27 20:10:58 ERROR SparkUncaughtExceptionHandler: [Container in shutdown] Uncaught exception in thread Thread[Executor task launch worker for task 7129,5,main]
19/05/27 20:10:58 ERROR TaskSetManager: Task 7 in stage 8.0 failed 1 times; aborting job
19/05/27 20:11:08 ERROR ShutdownHookManager: ShutdownHookManger shutdown forcefully.
19/05/27 20:11:08 ERROR Utils: Uncaught exception in thread pool-7-thread-1

I would really appreciate any help you could provide on how to solve these issues.

Thank you in advanced.
Luciano

Answers

  • shuangBroadshuangBroad Broad75Member, Broadie, Dev

    Hi Luciano,

    Thanks for reporting back this problem.
    I'm happy to help.

    Is it true that you are running the Spark pipeline on a local machine?
    If so, is the BAM you are studying a 30X WGS bam?
    Would you let us know how long did it take for the task for fail?
    And if possible, can you post/attach the full stack trace please?

    Thanks!

    Steve

  • lcalderonlcalderon Member
    Hi Steve, thanks for answering back. These are my answers to your questions:

    Is it true that you are running the Spark pipeline on a local machine?
    - Yes that is true.
    If so, is the BAM you are studying a 30X WGS bam?
    - No, it is actually a 23X coverage bam file of 10X Genomics data.
    Would you let us know how long did it take for the task for fail?
    - The first error message was observed 4 hours after the start and it took 12hours to actually stop running.
    And if possible, can you post/attach the full stack trace please?
    - Here attached is the trace after error messages started to appear.

    Thanks for your collaboration.

    Luciano
  • shuangBroadshuangBroad Broad75Member, Broadie, Dev

    Thanks for answering these many questions, Luciano.

    I suspected it was due to memory when you first reported, and looking at the time it took for the the pipeline to flounder, it points further to that direction (usually the pipeline finishes in sub-hour, of-course that depends on the resources available to the machine).
    Now looking at the stack trace you attached, line 1922-1923, it says

    19/05/27 19:48:22 ERROR Executor: Exception in task 32.0 in stage 8.0 (TID 7128)
    java.lang.OutOfMemoryError: Java heap space
    

    So it was an out of memory problem.

    Now if you are keen to knowing why that is, here's some explanation: there's a nice feature that we integrated into the pipeline that greatly boosts our sensitivity on long insertions but unfortunately is demanding on memory. If we are allowed to have an excuse, SV algorithms typically are resource hungry. ;)

    Now onto solutions.
    Is it possible for you to access a real Spark cluster, which has larger memory available? I know you have assigned 100GB already.

  • lcalderonlcalderon Member
    Hi Steve, regarding to the proposed solution:

    Is it possible for you to access a real Spark cluster, which has larger memory available? I know you have assigned 100GB already.
    - We even tried to assign 250GB to the process, which is the maximum capacity of our cluster, and
    obtained the same result. So maybe the only solution is to go for a real Spark cluster. We don't have
    local access to such a resource, do you have any online platform to recommend?

    Thanks
    Luciano
  • shuangBroadshuangBroad Broad75Member, Broadie, Dev

    Let me answer in two parts:

    Spark local mode

    Considering that you are running the Spark pipeline locally, which we haven't tested throughly when developing, I'd say consider limiting the number of executors by setting --num-executors and limiting the number of cores per executor by setting --executor-cores (more detail here).

    Specifically, to run GATK, you can do

    <usual gatk commands> \
    -- \
    --spark-runner LOCAL \
    --num-executors <YOUR_VALUE> \
    --executor-cores <YOUR_VALUE> \
    --executor-memory <YOUR_VALUE> \
    --conf spark.yarn.executor.memoryOverhead=<YOUR_VALUE>
    

    Spark on the cloud

    We use Google Dataproc for developing the pipeline. The GATK repo has this section in its Readme explaining how to run on Dataproc.

    The machine specification we use for development is quite high—as expected to speedup dev work—but in the meantime, let me try and see how low the requirements on machine specs can go.
    Will let you know.

  • shuangBroadshuangBroad Broad75Member, Broadie, Dev

    Hi Luciano,

    I did an experiment using Google Dataproc, with the following configuration on a HG38 WGS 30X BAM, the pipeline finished in 100 minutes. So, it should not need that long a time to study your BAM if you use a Spark cluster similar to the one I use.

    Pipeline completed in 01h:39m:22s
    

    Please let us know if you have more questions!

    Steve

  • lcalderonlcalderon Member
    Hi Steve,
    Thanks for running that test for us.
    What do you think regarding the fact that our coverage is 23X instead of the recommended 30X. If you think the analysis will run anyways we will give it a try to the Google Dataproc.

    Luciano
  • shuangBroadshuangBroad Broad75Member, Broadie, Dev

    There are multiple aspects that I can think of

    • it should run given the lower coverage
    • but I'd go slow first, i.e. only run with one sample or one trio as a pilot project
    • take a look at the output format first, we produce complex variants that not many SV algorithms output, so please let us know what you like and dislike
    • if there's any problem running the pipeline, please let us know, we are always happy to help timely
  • shuangBroadshuangBroad Broad75Member, Broadie, Dev

    One follow up one that my fingers forgot to put in: given that the coverage is lower and the fact that the pipeline is assembly-based, I'd expect the sensitivity may get a bit lower (because assembly algos typically require relatively higher coverage).

Sign In or Register to comment.