To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at

Running FindCoveredIntervals in Queue script fails with "-L unmapped" parameter

I put FindCoveredIntervals into a Queue script and ran it with 25 scatters (one for each chromosome in hg19). The 25_of_25 job fails all attempts to run:
<br /> Failed functions:<br /> Attempt 4 of 4.<br /> 'java' '-Xmx4096m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLimit=50' '-XX:GCHeapFreeLimit=10' '' '-cp' '/usr/local/bio_apps/Queue-3.2/Queue.jar' 'org.broadinstitute.gatk.engine.CommandLineGATK' '-T' 'FindCoveredIntervals' '-I' '/path/bwa.bam' '-L' '/path/.queue/scatterGather/Coverage-1-sg/temp_25_of_25/scatter.intervals' '-L' 'unmapped' '-R' '/path/hg19.fa' '-o' '/path/.queue/scatterGather/Coverage-1-sg/temp_25_of_25/bwa.20xCov.Queue.list' '-cov' '20' '-minBQ' '17' '-minMQ' '20' <br /> Logs:<br /> /path/.queue/scatterGather/Coverage-1-sg/temp_25_of_25/bwa.20xCov.Queue.list.out<br />

Note that there are two -L arguments in the job submission command. In the log I see this error:
</p> <h5>ERROR</h5> <h5>ERROR MESSAGE: Interval list specifies unmapped region. Only read walkers may include the unmapped region.</h5> <h5>ERROR ------------------------------------------------------------------------------------------</h5> <p>

It appears that the "-L unmapped" parameter is the reason for the error. Is this a bug in how Queue creates the scatter for FindCoveredIntervals? The error says that only read walkers should be processing with "-L unmapped". Is there a way to force it to not include unmapped reads so I can avoid the error?




Best Answer


  • Hi Geraldine,

    Thanks. When I run FindCoveredIntervals outside of a Queue script, I don't supply an intervals file, so I thought that in the Queue script I wouldn't need to either. For some reason, when FindCoveredIntervals is run in a Queue script, the unmapped reads are also considered unless I provide an intervals file like you mentioned. It's still odd to me that there is that difference, but at least this is a solution! Now when I load an intervals file (with -L), it works fine.



  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    I'm also not sure why unmapped isn't just ignored, and will look into it, but I'm glad this solves your issue in the meantime!

  • FindCoveredIntervals is annotated as PartitionBy(CONTIG), so the input is scattered by ContigScatterFunction.scala. Line 37 of that file sets this.includeUnmapped = true, which ultimately adds the "unmapped" contig to the end of the list. The problem is that FindCoveredIntervals is an ActiveRegionWalker, so doesn't like that unmapped contig. Geraldine's solution works because -XL takes precedence over -L.

    I think the long term solution is either to partition FindCoveredInterval in a different way or to completely rewrite the contig scattering code to allow more control over whether unmapped reads are included. I know which is easier, I just don't know enough about FindCoveredInterval to know if it could correctly be partitioned differently

  • Thanks @pdexheimer‌ for the additional information. That makes sense now. I'm not sure how using -XL is a solution to get rid of the "unmapped" contig -- when I try to pass the argument "-XL unmapped.list" (where unmapped.list has the single line "unmapped") it gives this error:
    <br /> ERROR MESSAGE: Badly formed genome loc: Contig 'unmapped' does not match any contig in the GATK sequence dictionary derived from the reference; are you sure you are using the correct reference fasta file?<br />
    However, using -L to explicitly specify the contigs I want to scatter works fine.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    @andrewo My bad, I forgot that 'unmapped' isn't actually listed in the sequence directory and is probably special-cased somewhere. I guess passing in a positive list with -L is the only solution for now.

    @pdexheimer I'll put in a bug report -- will be low priority but I think the right thing to do would be to modify the contig scattering code so the solution is more universal.

Sign In or Register to comment.