Running FindCoveredIntervals in Queue script fails with "-L unmapped" parameter

I put FindCoveredIntervals into a Queue script and ran it with 25 scatters (one for each chromosome in hg19). The 25_of_25 job fails all attempts to run:
<br /> Failed functions:<br /> Attempt 4 of 4.<br /> 'java' '-Xmx4096m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLimit=50' '-XX:GCHeapFreeLimit=10' '-Djava.io.tmpdir=/temp' '-cp' '/usr/local/bio_apps/Queue-3.2/Queue.jar' 'org.broadinstitute.gatk.engine.CommandLineGATK' '-T' 'FindCoveredIntervals' '-I' '/path/bwa.bam' '-L' '/path/.queue/scatterGather/Coverage-1-sg/temp_25_of_25/scatter.intervals' '-L' 'unmapped' '-R' '/path/hg19.fa' '-o' '/path/.queue/scatterGather/Coverage-1-sg/temp_25_of_25/bwa.20xCov.Queue.list' '-cov' '20' '-minBQ' '17' '-minMQ' '20' <br /> Logs:<br /> /path/.queue/scatterGather/Coverage-1-sg/temp_25_of_25/bwa.20xCov.Queue.list.out<br />

Note that there are two -L arguments in the job submission command. In the log I see this error:
</p> <h5>ERROR</h5> <h5>ERROR MESSAGE: Interval list specifies unmapped region. Only read walkers may include the unmapped region.</h5> <h5>ERROR ------------------------------------------------------------------------------------------</h5> <p>

It appears that the "-L unmapped" parameter is the reason for the error. Is this a bug in how Queue creates the scatter for FindCoveredIntervals? The error says that only read walkers should be processing with "-L unmapped". Is there a way to force it to not include unmapped reads so I can avoid the error?

Thanks,

Andrew

Tagged:

Best Answer

Answers

  • Hi Geraldine,

    Thanks. When I run FindCoveredIntervals outside of a Queue script, I don't supply an intervals file, so I thought that in the Queue script I wouldn't need to either. For some reason, when FindCoveredIntervals is run in a Queue script, the unmapped reads are also considered unless I provide an intervals file like you mentioned. It's still odd to me that there is that difference, but at least this is a solution! Now when I load an intervals file (with -L), it works fine.

    Thanks,

    Andrew

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    I'm also not sure why unmapped isn't just ignored, and will look into it, but I'm glad this solves your issue in the meantime!

  • FindCoveredIntervals is annotated as PartitionBy(CONTIG), so the input is scattered by ContigScatterFunction.scala. Line 37 of that file sets this.includeUnmapped = true, which ultimately adds the "unmapped" contig to the end of the list. The problem is that FindCoveredIntervals is an ActiveRegionWalker, so doesn't like that unmapped contig. Geraldine's solution works because -XL takes precedence over -L.

    I think the long term solution is either to partition FindCoveredInterval in a different way or to completely rewrite the contig scattering code to allow more control over whether unmapped reads are included. I know which is easier, I just don't know enough about FindCoveredInterval to know if it could correctly be partitioned differently

  • Thanks @pdexheimer‌ for the additional information. That makes sense now. I'm not sure how using -XL is a solution to get rid of the "unmapped" contig -- when I try to pass the argument "-XL unmapped.list" (where unmapped.list has the single line "unmapped") it gives this error:
    <br /> ERROR MESSAGE: Badly formed genome loc: Contig 'unmapped' does not match any contig in the GATK sequence dictionary derived from the reference; are you sure you are using the correct reference fasta file?<br />
    However, using -L to explicitly specify the contigs I want to scatter works fine.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    @andrewo My bad, I forgot that 'unmapped' isn't actually listed in the sequence directory and is probably special-cased somewhere. I guess passing in a positive list with -L is the only solution for now.

    @pdexheimer I'll put in a bug report -- will be low priority but I think the right thing to do would be to modify the contig scattering code so the solution is more universal.

Sign In or Register to comment.