Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.

Data Model Magic

jasongallant1jasongallant1 East Lansing, MIMember

Sorry for repost, was for some reason in "Zoo and Garden!!

Hi All,

Stuck again.

The first step in my pipeline is to convert fastq -> unaligned BAM, which was successful. This is done so that there are several read groups (corresponding to different libraries / lanes / runs) from the same individual. In the next step, I'd like to do analysis at the individual level.

I've populated my samples table with the individual "unaligned_BAM" attribute. My data model is set up such that individuals are "participants" and each individual has multiple samples reflecting each of the read groups.

In the next step, I want to iterate over a participant set, and "collect" the corresponding unaligned BAMs for each participant. I'm having trouble with the mental gymnastics to do this.

What I'm thinking is I want the method to run with the root entity type as a participant, and give the "Launch Analysis" expression on a participant set (this.participants). How would I go about implementing something like this?

Tagged:

Best Answers

  • jasongallant1jasongallant1 East Lansing, MI
    edited February 2018 Accepted Answer

    Update: Here's a super hacky way to do it. There's gotta be a better way?

    1. Download The "Samples.TSV" from FireCloud

    2. awk -F "\t" '{print $13 >> ("./sample_metadata/unaligned_bams_" $12 ".txt")} ; close("./sample_metadata/unaligned_bams_" $12".txt")' ~/Downloads/sample.txt

    3. Upload result files to bucket

    4. Construct participant table with column "unaligned_bam_list" pointing to google bucket location

    5. Run below WDL to populate the participants "set_of_unaligned_bams" attribute

    fof_usage_wf.wdl:

    workflow fof_usage_wf {
    File file_of_files
    Array[File] array_of_files = read_lines(file_of_files)
    
    output {
    Array[File] array_output = array_of_files
     }
    }
    

Answers

  • jasongallant1jasongallant1 East Lansing, MIMember
    edited February 2018 Accepted Answer

    Update: Here's a super hacky way to do it. There's gotta be a better way?

    1. Download The "Samples.TSV" from FireCloud

    2. awk -F "\t" '{print $13 >> ("./sample_metadata/unaligned_bams_" $12 ".txt")} ; close("./sample_metadata/unaligned_bams_" $12".txt")' ~/Downloads/sample.txt

    3. Upload result files to bucket

    4. Construct participant table with column "unaligned_bam_list" pointing to google bucket location

    5. Run below WDL to populate the participants "set_of_unaligned_bams" attribute

    fof_usage_wf.wdl:

    workflow fof_usage_wf {
    File file_of_files
    Array[File] array_of_files = read_lines(file_of_files)
    
    output {
    Array[File] array_output = array_of_files
     }
    }
    
  • SheilaSheila Broad InstituteMember, Broadie admin

    @jasongallant1
    Hi,

    I just moved our question to FireCloud where Kate @KateN can help you.

    -Sheila

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    FWIW I fleshed this out a bit here; it's pretty raw and will probably get improved/corrected as we iterate over docs, but hopefully it will provide a jumping point for people with this problem. Feedback most welcome!

  • jasongallant1jasongallant1 East Lansing, MIMember

    Thanks @Geraldine_VdAuwera. I agree that your method makes more sense!

    I am actually treating my individual (fish) in this case as 'participants' and my samples as the individual read groups. The way I've hacked the data model (at least, I think) is that the participant can have as many samples as needed. I've been setting up my workflows to mostly iterate, therefore, over participants, because indeed multiple fish have multiple read groups. Obviously, this would be bad if we had RNAseq and Genomics Data from the same fish, lets say, but we haven't gotten that fancy yet ;)

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hah, fair enough :-D

Sign In or Register to comment.