Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

I can't run GATK4 BwaSpark tool on a yarn cluster

hmushtaqhmushtaq NetherlandsMember

I am trying to run the gatk BwaSpark tool on Microsoft Azure cloud, by issuing the following command (Note that I am a bit unsure about how many / should I put with hdfs:. If my file is in /data/gatk4input, is hdfs:///data/gatk4input, the right way to do it?).

./gatk BwaSpark -I hdfs:///data/gatk4input/input.bam -O hdfs:///data/gatk4output/bwa.bam -R hdfs:///data/filesB37/human_g1k_v37_decoy.fasta --disable-sequence-dictionary-validation true -- --spark-master yarn

But, I get the following error. Any idea, what is the problem here. Every other Spark program that I wrote myself, runs absolutely fine on my cluster.

18/03/12 17:30:01 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Unable to load YARN support
at org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1(SparkHadoopUtil.scala:417)
at org.apache.spark.deploy.SparkHadoopUtil$.yarn$lzycompute(SparkHadoopUtil.scala:412)
at org.apache.spark.deploy.SparkHadoopUtil$.yarn(SparkHadoopUtil.scala:412)
at org.apache.spark.deploy.SparkHadoopUtil$.get(SparkHadoopUtil.scala:437)
at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:2303)
at org.apache.spark.storage.BlockManager.(BlockManager.scala:104)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:320)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:165)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:256)
at org.apache.spark.SparkContext.(SparkContext.scala:420)
at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58


Sign In or Register to comment.