We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

GATK Support for files Stored in Azure Blog or Data Lake when processing in Spark

I was trying to use GATK wrapper to run tasks on Apache Spark 2.2.1.

However, my files are on Azure Blog and Azure DataLake, I noticed that GATK has support of Google Cloud storage. In my case i needed to copy those large files to my local Linux VM and then some tasks required it on local some on HDFS is there a way to use Azure storage directly.

Next, ex: baserecalibrator is still in BETA and not production ready. How is this different from when using on Google cloud cluster when run on Spark.


Sign In or Register to comment.