Questions about the very basics, please help

Hi there,

I've been trying my best to work through the tutorials, but it's been hard for me to follow.

I'd like to do a very simple task just to make sure I understand the parts correctly.

Let's say I'm a five-year old who knows Docker but not much else. I want to sign into Terra, and run an analysis. The analysis simply accesses a TCGA data set from FireCloud (say any FPKM), multiplies the read_counts by 2, and saves it to my Google bucket.

How do I do this?

Can we start with the first step? How do I make a TCGA FPKM file that is hosted on FireCloud accessible to my method?

Please explain like I'm five. Pictures with code would be greatly appreciated.

Thank you!

Answers

  • AdelaideRAdelaideR Member admin

    Hi @etgrieco -

    Let's do this step by step.

    1.) Have you got a Terra account set up with billing or free credits?

  • etgriecoetgrieco Member

    Thank you -- this is exactly what I was hoping for!

    1) Yes.

  • AdelaideRAdelaideR Member admin

    Okay.

    Have you created a workspace?

  • etgriecoetgrieco Member

    This is great.

    Yes!

  • AdelaideRAdelaideR Member admin
    edited March 12

    Okay, what type of data do you want to load into your workspace?

    Do you know what type of method you are going to run using the data? Do you have a link to that code?

  • etgriecoetgrieco Member

    I would like to load any TCGA FPKM data set from the data hosted on FireCloud.

    I'm going to simply load it as a dataframe in Pandas python and multiply by 2.

    Let's say this open access file from GDC:
    https://portal.gdc.cancer.gov/files/2f8308c5-5ebd-49c2-953d-620176b66862

    First questions:
    Is this TCGA open access data hosted by FireCloud, or do I need to store a copy on my own Google Bucket?

    If it is hosted by TCGA, how would I explore the data?

    When I try to click "Browse FireCloud Data Sets," I am directed to a Workspace that I can't figure out.

  • AdelaideRAdelaideR Member admin

    Okay, so the calculation itself is very easily done in a jupyter notebook, no need for a WDL.

    I would have to have the link to the firecloud bucket to determine whether it is open access. If you have your authorization credentials attached to your email that you are using in Firecloud/Terra, those authorizations should allow you to work on the data in your own workspace where you have permissions.

    Here are the Terms of Service

    Here are the steps to create you [Authorization Domains].

    Do you have permissions for TCGA already approved? You should work with your university to get those if you do not.

    What workspace are you directed to? Can you please provide a screenshot or link?

  • etgriecoetgrieco Member

    Yes, the calculation is trivial . Basically I'm doing this toy example so I can learn how to use WDL for future use.

    So before, I was able to click this:

    and it led me to a sign in page, then to a workspace, where I could subsequently select "hg38 Open Access" or something.

    Now, however, it leads to this:

    OK I think I was being too ambitious in trying to use the TCGA data. So let's start simpler:

    I have this workspace:

    I have this Google bucket with this file:

    The file is a simple TSV that says:
    A 1
    B 2
    C 3

    I would like to run an analysis that accesses this file, multiplies all of the values by 2, then saves the results back to the Google bucket. What do I do?

    Thank you!!

  • AdelaideRAdelaideR Member admin

    Okay, so if you do this in a notebook, it is quite simple to write the code. You can either use R or a Python kernel.

    Are you familiar with jupyter notebooks and which programming language would you use for this task?

    When you set up the notebook, you need to specify where you want your files to go in and out.

    I usually designate a variable called "bucket" that points at the google bucket in my workspace.

    You can find your bucket on the front page of your Terra workspace, on the right side.

    So, inside the notebook, open a code cell and write:

    bucket = "gs://myworkspacebucketIgotfromthedashboard"
    

    Then if you run a bash command by appending "!" in front you can see what is in the bucket.

    !gsutil ls $bucket
    

    You have to use the gsutil version of terminal commands to make this work.

    From your desktop terminal, you would want to install Google Cloud SDK tools for your terminal.

    Then you can copy any files from your desktop into the bucket.

    Here is the link to those tools.

    Then you can copy from your local directory.

    gsutil cp ~/Desktop/somefile gs://myworkspacebucketIgotfromthedashboard
    
    

    And just make sure you use the bucket variable inside the notebook when copying in and copying out to your google bucket and you should be able to do what you described.

    Does that make sense?

  • AdelaideRAdelaideR Member admin

    Hi @etgrieco

    Are you all set?

    If so, I am going to close this ticket if I do not hear back from you by the end of the day.

Sign In or Register to comment.