[GATK4-BETA3]BwaAndMarkDuplicatesPipelineSpark command line problems

I use GATK4-BETA3 in X86 system and run the pipeline BwaAndMarkDuplicatesPipelineSpark,and have several problems,but the other spark pipeline can use correct, I doubt that whether the code of BwaAndMarkDuplicatesPipelineSpark has some error.

1) when I use the command:
./gatk-launch BwaAndMarkDuplicatesPipelineSpark --bwamemIndexImage hdfs://foam3:9000/test_block_4/hg19mini.fasta.img -I hdfs://foam3:9000/test_block_4/SRR015438_bam.bam -O hdfs://foam3:9000/test_block_4/mem_test_markdup.bam -R hdfs://foam3:9000/test_block_4/ucsc.hg19.fasta.2bit --disableSequenceDictionaryValidation true -- --sparkRunner SPARK --sparkMaster spark://foam3:7077
it returns the follows error:

Caused by: java.lang.NullPointerException
at org.broadinstitute.hellbender.utils.bwa.BwaMemAligner.(BwaMemAligner.java:25)
at org.broadinstitute.hellbender.tools.spark.bwa.BwaSparkEngine$ReadAligner.apply(BwaSparkEngine.java:93)
at org.broadinstitute.hellbender.tools.spark.bwa.BwaSparkEngine.lambda$align$ed6f731$1(BwaSparkEngine.java:56)
at org.broadinstitute.hellbender.tools.spark.bwa.BwaSparkEngine$$Lambda$25/1061261870.call(Unknown Source)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:153)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:153)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2) when I use the command that don't use " -- --sparkRunner SPARK --sparkMaster spark://foam3:7077"

./gatk-launch BwaAndMarkDuplicatesPipelineSpark --bwamemIndexImage hdfs://foam3:9000/test_block_4/hg19mini.fasta.img -I hdfs://foam3:9000/test_block_4/SRR015438_bam.bam -O hdfs://foam3:9000/test_block_4/mem_test_markdup.bam -R hdfs://foam3:9000/test_block_4/ucsc.hg19.fasta.2bit --disableSequenceDictionaryValidation true

it returns the follows error,the file "hg19mini.fasta.img" is in the hdfs,but it calls "Missing bwa index file":

Caused by: java.lang.IllegalArgumentException: Missing bwa index file: hdfs://foam3:9000/test_block_4/hg19mini.fasta.img
at org.broadinstitute.hellbender.utils.bwa.BwaMemIndex.assertNonEmptyReadable(BwaMemIndex.java:132)
at org.broadinstitute.hellbender.utils.bwa.BwaMemIndex.(BwaMemIndex.java:56)
at org.broadinstitute.hellbender.utils.bwa.BwaMemIndexSingleton.getInstance(BwaMemIndexSingleton.java:23)
at org.broadinstitute.hellbender.tools.spark.bwa.BwaSparkEngine$ReadAligner.(BwaSparkEngine.java:73)
at org.broadinstitute.hellbender.tools.spark.bwa.BwaSparkEngine.lambda$align$ed6f731$1(BwaSparkEngine.java:56)
at org.broadinstitute.hellbender.tools.spark.bwa.BwaSparkEngine$$Lambda$139/1286235039.call(Unknown Source)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:152)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:152)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:785)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:785)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
3) when I put the file hg19mini.fasta.img in the local storage, this pipeline can get the correct result.

it is the command I used.

./gatk-launch BwaAndMarkDuplicatesPipelineSpark --bwamemIndexImage hg19mini.fasta.img -I hdfs://foam3:9000/test_block_4/SRR015438_bam.bam -O hdfs://foam3:9000/test_block_4/mem_test_markdup.bam -R hdfs://foam3:9000/test_block_4/ucsc.hg19.fasta.2bit --disableSequenceDictionaryValidation true

Whether the function of BwaAndMarkDuplicatesPipelineSpark is not complete?

Answers

Sign In or Register to comment.