Attention:
The frontline support team will be slow on the forum because we are occupied with the GATK Workshop on March 21st and 22nd 2019. We will be back and more available to answer questions on the forum on March 25th 2019.

GATK 4 support on AWS

pagarwal14pagarwal14 Durham, NCMember
edited March 2018 in Ask the GATK team

Hi,
Just wanted to check on the status of support for GATK 4 on AWS. The most recent I read was that there was support only for google cloud. We are working with a vendor to run GATK 4 and their tool only works on AWS. If the support is not yet available is there an estimate of when it will be available.
Thanks,

  • Pankaj

Issue · Github
by Sheila

Issue Number
3000
State
closed
Last Updated
Assignee
Array
Closed By
vdauwera

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @pagarwal14
    Hi Pankaj,

    I will check with the team and get back to you.

    -Sheila

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @pagarwal14,

    You should be able to run GATK4 on AWS. See

    We have clear instructions for using the NIO feature of GATK4 in Google Cloud. This is the feature that allows analysis of streaming data from Google Cloud Storage buckets and writing only the analysis results to disk. It appears that using the same feature in AWS is something that requires some mechanics.

    Can you let me know where you read that AWS is unsupported? Feel free to write comments in the issue ticket I shared. GATK4 is an open-source repo.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @pagarwal14, to add to what @shlee has said: technically you can run GATK on any cloud that allows you to spin up VM instances with a Linux-type operating system. What is provided on Google Cloud is an extra layer of functionality that allows you to run GATK pipelines conveniently at scale. A group at AWS is currently working to develop similar functionality. We expect this to be available in the coming quarter; there will be a blog post on the WDL blog when it is ready, and we will also update the page that lists the platforms we are actively working to support in this way: https://software.broadinstitute.org/gatk/documentation/pipelines

  • pagarwal14pagarwal14 Durham, NCMember

    Thanks for the response. Hi @shlee, to answer "Can you let me know where you read that AWS is unsupported?" I had read on the information page for GATK4 that "AWS support is forthcoming." I cannot find the link now so maybe it was removed but that is what led me to presume that AWS is not supported. Thank you all for the response, this is very helpful.

  • Does the Broad have any general configuration guidelines available concerning setting up a working Sparks cluster for GATK4 via AWS EMR? The GATK4 Sparks tools 'run' for me on the master node; however, it does not appear that the work is being distributed to the worker/slave nodes. Is it possible that this is related to an HDFS or EMRFS filesystem issue?

    Issue · Github
    by Sheila

    Issue Number
    3060
    State
    open
    Last Updated
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @tonyanselmo
    Hi,

    I think this is a work in progress/being completed. I think there should be an announcement coming out soon about this, and then more documentation will follow. I will also ask someone from the team to followup.

    -Sheila

Sign In or Register to comment.