Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

[GATK4-BETA3]BwaAndMarkDuplicatesPipelineSpark command line problems

I use GATK4-BETA3 in X86 system and run the pipeline BwaAndMarkDuplicatesPipelineSpark,and have several problems,but the other spark pipeline can use correct, I doubt that whether the code of BwaAndMarkDuplicatesPipelineSpark has some error.

1) when I use the command:
./gatk-launch BwaAndMarkDuplicatesPipelineSpark --bwamemIndexImage hdfs://foam3:9000/test_block_4/hg19mini.fasta.img -I hdfs://foam3:9000/test_block_4/SRR015438_bam.bam -O hdfs://foam3:9000/test_block_4/mem_test_markdup.bam -R hdfs://foam3:9000/test_block_4/ucsc.hg19.fasta.2bit --disableSequenceDictionaryValidation true -- --sparkRunner SPARK --sparkMaster spark://foam3:7077
it returns the follows error:

Caused by: java.lang.NullPointerException
at org.broadinstitute.hellbender.utils.bwa.BwaMemAligner.(BwaMemAligner.java:25)
at org.broadinstitute.hellbender.tools.spark.bwa.BwaSparkEngine$ReadAligner.apply(BwaSparkEngine.java:93)
at org.broadinstitute.hellbender.tools.spark.bwa.BwaSparkEngine.lambda$align$ed6f731$1(BwaSparkEngine.java:56)
at org.broadinstitute.hellbender.tools.spark.bwa.BwaSparkEngine$$Lambda$25/1061261870.call(Unknown Source)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:153)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:153)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2) when I use the command that don't use " -- --sparkRunner SPARK --sparkMaster spark://foam3:7077"

./gatk-launch BwaAndMarkDuplicatesPipelineSpark --bwamemIndexImage hdfs://foam3:9000/test_block_4/hg19mini.fasta.img -I hdfs://foam3:9000/test_block_4/SRR015438_bam.bam -O hdfs://foam3:9000/test_block_4/mem_test_markdup.bam -R hdfs://foam3:9000/test_block_4/ucsc.hg19.fasta.2bit --disableSequenceDictionaryValidation true

it returns the follows error,the file "hg19mini.fasta.img" is in the hdfs,but it calls "Missing bwa index file":

Caused by: java.lang.IllegalArgumentException: Missing bwa index file: hdfs://foam3:9000/test_block_4/hg19mini.fasta.img
at org.broadinstitute.hellbender.utils.bwa.BwaMemIndex.assertNonEmptyReadable(BwaMemIndex.java:132)
at org.broadinstitute.hellbender.utils.bwa.BwaMemIndex.(BwaMemIndex.java:56)
at org.broadinstitute.hellbender.utils.bwa.BwaMemIndexSingleton.getInstance(BwaMemIndexSingleton.java:23)
at org.broadinstitute.hellbender.tools.spark.bwa.BwaSparkEngine$ReadAligner.(BwaSparkEngine.java:73)
at org.broadinstitute.hellbender.tools.spark.bwa.BwaSparkEngine.lambda$align$ed6f731$1(BwaSparkEngine.java:56)
at org.broadinstitute.hellbender.tools.spark.bwa.BwaSparkEngine$$Lambda$139/1286235039.call(Unknown Source)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:152)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:152)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:785)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:785)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
3) when I put the file hg19mini.fasta.img in the local storage, this pipeline can get the correct result.

it is the command I used.

./gatk-launch BwaAndMarkDuplicatesPipelineSpark --bwamemIndexImage hg19mini.fasta.img -I hdfs://foam3:9000/test_block_4/SRR015438_bam.bam -O hdfs://foam3:9000/test_block_4/mem_test_markdup.bam -R hdfs://foam3:9000/test_block_4/ucsc.hg19.fasta.2bit --disableSequenceDictionaryValidation true

Whether the function of BwaAndMarkDuplicatesPipelineSpark is not complete?

Answers

Sign In or Register to comment.