cromwell is skipping a scatter call

alphahmedalphahmed JAPANMember

I am trying to run the wdl workflow file for germline CNVCaller cohort mode.

After modifying the original file from the github repository to run it locally on our servers, I faced the following issues:

  1. If I provide the .json input file with the optional inputs untouched, I get a message of "no coercion defined" for all of the optional parameters, and the workflow stops.
  • I could work around this issue by entering the default parameters directly in the wdl code.
  1. After the "DetermineGermlineContigPloidyCohortMode" call was complete, the pipeline goes directly to "CNVTasks.PostprocessGermlineCNVCalls" call, skipping the "GermlineCNVCallerCohortMode" call and its preceeding scatter step; I then get an error of 'calls-shard-path' is required, which is the expected output of the "GermlineCNVCallerCohortMode" uncalled step.

I wonder why this step is being skipped, although it is called in order?

Thanks!

Post edited by alphahmed on

Answers

  • ChrisLChrisL Cambridge, MAMember, Broadie, Moderator, Dev admin

    Hi @alphahmed -

    To help us debug this, could you provide:

    • A link to the modified WDL and inputs file that are failing to work for you?
    • How long the ScatterIntervals.scattered_interval_lists output was? We have seen similar events before - if this is empty then the scatter would appear to be skipped when actually it just scattered over a zero-length array

    Thanks!

  • alphahmedalphahmed JAPANMember

    Hello @ChrisL

    I've attached the modified common_tasks and main cohort_mode files, in addition to the input json file for your kind review.

    The ScatterIntervals.scattered_interval_lists execusion folder actually contains "lots of lists", since I've assigned only 30 intervals per scatter; which made me think that this might be the cause of skipping it??!! I've attached an example of one of these lists as well.

    After realizing the probability of having too many scattered interval lists as the cause of skipping, I am now trying to rerun with a 100,000 intervals per scatter, since I am doing whole genome analysis anyways. I'll keep you updated with the results...

    To me, the "no coercion defined" issue is still a mystery!!

    Thanks a bunch!

  • alphahmedalphahmed JAPANMember

    I have been trying to run the pipeline on a larger server, I kept getting errors of "unable to create new native thread."
    When switched to tow other large-memory servers, there were errors of "resources temporarily unavailable."
    I am not sure where the issue could be, I tried to reduce the number of intervals to 10,000 per scatter, with no difference in outcomes.

Sign In or Register to comment.