Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Running FindCoveredIntervals in Queue script fails with "-L unmapped" parameter

I put FindCoveredIntervals into a Queue script and ran it with 25 scatters (one for each chromosome in hg19). The 25_of_25 job fails all attempts to run:

Failed functions:
Attempt 4 of 4.
'java' '-Xmx4096m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLimit=50' '-XX:GCHeapFreeLimit=10' '-Djava.io.tmpdir=/temp' '-cp' '/usr/local/bio_apps/Queue-3.2/Queue.jar' 'org.broadinstitute.gatk.engine.CommandLineGATK' '-T' 'FindCoveredIntervals' '-I' '/path/bwa.bam' '-L' '/path/.queue/scatterGather/Coverage-1-sg/temp_25_of_25/scatter.intervals' '-L' 'unmapped' '-R' '/path/hg19.fa' '-o' '/path/.queue/scatterGather/Coverage-1-sg/temp_25_of_25/bwa.20xCov.Queue.list' '-cov' '20' '-minBQ' '17' '-minMQ' '20'
Logs:
/path/.queue/scatterGather/Coverage-1-sg/temp_25_of_25/bwa.20xCov.Queue.list.out

Note that there are two -L arguments in the job submission command. In the log I see this error:

ERROR
ERROR MESSAGE: Interval list specifies unmapped region. Only read walkers may include the unmapped region.
ERROR ------------------------------------------------------------------------------------------

It appears that the "-L unmapped" parameter is the reason for the error. Is this a bug in how Queue creates the scatter for FindCoveredIntervals? The error says that only read walkers should be processing with "-L unmapped". Is there a way to force it to not include unmapped reads so I can avoid the error?

Thanks,

Andrew

Tagged:

Best Answer

Answers

  • andrewoandrewo Member

    Hi Geraldine,

    Thanks. When I run FindCoveredIntervals outside of a Queue script, I don't supply an intervals file, so I thought that in the Queue script I wouldn't need to either. For some reason, when FindCoveredIntervals is run in a Queue script, the unmapped reads are also considered unless I provide an intervals file like you mentioned. It's still odd to me that there is that difference, but at least this is a solution! Now when I load an intervals file (with -L), it works fine.

    Thanks,

    Andrew

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I'm also not sure why unmapped isn't just ignored, and will look into it, but I'm glad this solves your issue in the meantime!

  • pdexheimerpdexheimer Member ✭✭✭✭

    FindCoveredIntervals is annotated as PartitionBy(CONTIG), so the input is scattered by ContigScatterFunction.scala. Line 37 of that file sets this.includeUnmapped = true, which ultimately adds the "unmapped" contig to the end of the list. The problem is that FindCoveredIntervals is an ActiveRegionWalker, so doesn't like that unmapped contig. Geraldine's solution works because -XL takes precedence over -L.

    I think the long term solution is either to partition FindCoveredInterval in a different way or to completely rewrite the contig scattering code to allow more control over whether unmapped reads are included. I know which is easier, I just don't know enough about FindCoveredInterval to know if it could correctly be partitioned differently

  • andrewoandrewo Member

    Thanks @pdexheimer‌ for the additional information. That makes sense now. I'm not sure how using -XL is a solution to get rid of the "unmapped" contig -- when I try to pass the argument "-XL unmapped.list" (where unmapped.list has the single line "unmapped") it gives this error:

    ERROR MESSAGE: Badly formed genome loc: Contig 'unmapped' does not match any contig in the GATK sequence dictionary derived from the reference; are you sure you are using the correct reference fasta file?

    However, using -L to explicitly specify the contigs I want to scatter works fine.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @andrewo My bad, I forgot that 'unmapped' isn't actually listed in the sequence directory and is probably special-cased somewhere. I guess passing in a positive list with -L is the only solution for now.

    @pdexheimer I'll put in a bug report -- will be low priority but I think the right thing to do would be to modify the contig scattering code so the solution is more universal.

Sign In or Register to comment.