I can't run GATK4 BwaSpark tool on a yarn cluster

hmushtaqhmushtaq NetherlandsMember

I am trying to run the gatk BwaSpark tool on Microsoft Azure cloud, by issuing the following command (Note that I am a bit unsure about how many / should I put with hdfs:. If my file is in /data/gatk4input, is hdfs:///data/gatk4input, the right way to do it?).

./gatk BwaSpark -I hdfs:///data/gatk4input/input.bam -O hdfs:///data/gatk4output/bwa.bam -R hdfs:///data/filesB37/human_g1k_v37_decoy.fasta --disable-sequence-dictionary-validation true -- --spark-master yarn

But, I get the following error. Any idea, what is the problem here. Every other Spark program that I wrote myself, runs absolutely fine on my cluster.

18/03/12 17:30:01 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Unable to load YARN support
at org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1(SparkHadoopUtil.scala:417)
at org.apache.spark.deploy.SparkHadoopUtil$.yarn$lzycompute(SparkHadoopUtil.scala:412)
at org.apache.spark.deploy.SparkHadoopUtil$.yarn(SparkHadoopUtil.scala:412)
at org.apache.spark.deploy.SparkHadoopUtil$.get(SparkHadoopUtil.scala:437)
at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:2303)
at org.apache.spark.storage.BlockManager.(BlockManager.scala:104)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:320)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:165)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:256)
at org.apache.spark.SparkContext.(SparkContext.scala:420)
at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58

Answers

Sign In or Register to comment.