We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

CountBasesSpark doesn't work with -L opt

Polar_bearPolar_bear FreedomMember
edited November 2019 in Ask the GATK team

I test this in 4.1.4.0 and 4.1.4.1

gatk CountBasesSpark \
     -I input_reads.bam \
     -O base_count.txt

When run this cmd, it is OK, and get a right output base_count.txt.
But I want compute bases located in a interval file, so:

gatk CountBasesSpark \
     -I input_reads.bam \
     -O base_count.txt\
     -L interval.file

This cmd cannot run successfully, with some errors I find like this:

......
9/11/28 17:44:01 INFO NewHadoopRDD: Input split: file:/disks/disk1/data_sample/19NGS14
2/19NGS142.bam:1476395008+33554432
19/11/28 17:44:01 INFO NewHadoopRDD: Input split: file:/disks/disk1/data_sample/19NGS14
2/19NGS142.bam:1509949440+33554432
19/11/28 17:44:01 INFO NewHadoopRDD: Input split: file:/disks/disk1/data_sample/19NGS14
2/19NGS142.bam:704643072+33554432
19/11/28 17:44:02 ERROR Executor: Exception in task 6.0 in stage 1.0 (TID 7)
java.util.NoSuchElementException: next on empty iterator
        at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)
        at scala.collection.Iterator$$anon$2.next(Iterator.scala:37)
        at scala.collection.Iterator$$anon$13.next(Iterator.scala:469)
......

The interval.file is fine because I use it for the whole GATK pipeline.
The CountReadsSpark has the same error.

Please check this

Thanks.
Chris

Tagged:

Issue · Github
by bhanuGandham

Issue Number
6319
State
open
Last Updated
Milestone
Array

Answers

Sign In or Register to comment.