To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

read_lines() in scatter definition with large file input transiently fails

jsotojsoto Broad InstituteMember, Broadie

cromwell version: 29-675e865-SNAP

I have a read_lines call inside of a scatter definition like

scatter (file in read_lines(shard_fofn)){
        call SelectVariants{
            input:
                vcf=file,
                intervals=interval_list
        }
    }

where shard fofn has 20k+ lines inside of it. When running this workflow different shards failed
with one of two errors. 30-50 different shards would fail each retry.

{
failures: [
{
causedBy: [
{
causedBy: [
{
causedBy: [
{
causedBy: [
{
causedBy: [
{
causedBy: [
{
causedBy: [ ],
message: "Data received in non-data state: 6"
}
],
message: "Connection has been shutdown: javax.net.ssl.SSLProtocolException: Data received in non-data state: 6"
}
],
message: "Connection has been shutdown: javax.net.ssl.SSLProtocolException: Data received in non-data state: 6"
}
],
message: "All reopens failed"
}
],
message: "vcf"
}
],
message: "Input evaluation for Call mergeAndGetSites.SelectVariants failed."
}
],
message: "Couldn't resolve all inputs for mergeAndGetSites.SelectVariants at index Some(778)."
}
],
attempt: 1,
shardIndex: 778
},

or

{
failures: [
{
causedBy: [
{
causedBy: [
{
causedBy: [
{
causedBy: [
{
causedBy: [
{
causedBy: [ ],
message: "Connection closed prematurely: bytesRead = 34066, Content-Length = 2974047"
}
],
message: "Connection closed prematurely: bytesRead = 34066, Content-Length = 2974047"
}
],
message: "All reopens failed"
}
],
message: "vcf"
}
],
message: "Input evaluation for Call mergeAndGetSites.SelectVariants failed."
}
],
message: "Couldn't resolve all inputs for mergeAndGetSites.SelectVariants at index Some(19820)."
}
],
attempt: 1,
shardIndex: 19820
}
]
},

Answers

  • jsotojsoto Broad InstituteMember, Broadie

    I worked around this issue by moving the read_lines outside of the scatter and passing that variable to the scatter like

        Array[File] fofn_files = read_lines(shard_fofn)
    
        scatter (file in fofn_files){
            call SelectVariants{
                input:
                    vcf=file,
                    intervals=interval_list
            }
        }
    
  • ChrisLChrisL Cambridge, MAMember, Broadie, Dev
    edited December 2017

    @jsoto is this a test you can easily re-run? I'm hopeful that this is addressed in Cromwell 30 due to it hopefully now only evaluating the scatter array once (rather than having to re-evaluating the entire array again to access every element in the array... don't ask... :smile: )

Sign In or Register to comment.