Data with newlines and tabs

yfarjounyfarjoun Broad InstituteDev ✭✭✭

I have data that contains newlines and tabs and and when I "download sample data" from the GUI I get an unusable file. it would be good if there was an option to export to excel perhaps, or to otherwise "protect" the contents of the data.

Tagged:

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    We should definitely be able to provide the contents unmodified. Can you describe how it's unusable? If possible, post snippets showing what was messed up in the process.

  • yfarjounyfarjoun Broad InstituteDev ✭✭✭

    well...the data is provided at a tab-separated file with newlines indicating the end of a line...so if any of the fields have a tab or a newline in them, it screws up the formatting.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Not trying to be dense here, but what's different in the downloaded file? Are the newlines or tabs getting stripped out, or are extra characters getting added?

  • yfarjounyfarjoun Broad InstituteDev ✭✭✭

    I'll see how long it takes me to generate a project with this problem (as I already fixed the one I had, by removing the offending field)

  • yfarjounyfarjoun Broad InstituteDev ✭✭✭

    here's how my data gets exported before I run my script:

    entity:sample_id    numbers strings participant_id
    me  1   here    one
    them    4   strings one
    we  3   a   one
    you 2   is  one
    

    and here is is after running the script on one of the rows:

    entity:sample_id    numbers strings participant_id  tabs    newlines
    me  1   here    one     
    them    4   strings one     
    we  3   a   one tab haha    newline
    haha
    you 2   is  one     
    

    (can you guess which row it was?)

    in case you want to have the wdl that I ran..here it is:

    task RuinDataWithTab {
        command {
    
        }
        output {
        String outtab = "tab\thaha"
        String outnl ="newline\nhaha"
        }
    
        runtime {
          docker: "python:2.7"
          memory: "1 GB"
        }
    }
    
    
    workflow RuinData {
    
        call RuinDataWithTab
    
        output {
            RuinDataWithTab.*
        }
    }
    
    

    In fiecloud I hooked up the outtab output to the field "tabs" and outnl to the field "newlines"

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Okay, so do I understand correctly that you're running a script on a tsv file that adds some columns to said tsv file, then trying to download the result and it looks bad? If that's not it, can you please describe what you're doing with human words? Because I am rather confused right now. Your example doesn't actually explain what you're trying to do / what's happening, it just shows that it's possible to intentionally screw up the formatting of a file.

    So I know this is going to sound like a dumb question, but I have to ask dumb questions due to having very little clue what you're actually doing (presumably not literally running scripts to ruin file formatting). Here goes. Assuming that your tsv is a file that can be retrieved from the workspace bucket, have you checked that its formatting is actually okay (which would confirm that the downloading/exporting process is ruining it) or whether the file itself is in fact messed up (which would suggest your script is doing something wrong, no offense)?

  • yfarjounyfarjoun Broad InstituteDev ✭✭✭
    edited January 2017

    there is no tsv...this is the data in firecloud. when I say "my data" I mean the Firecloud Sample Data (i.e. in the data tab.) I am simply adding two new fields to my sample data. I should be allowed to do this. And infact, the data is fine in the browser (http://imgur.com/a/hEAZa) Then, when I ask to "export sample data" I get the second file which is no longer parseable. This actually came up in an analysis and I was not "trying" to ruin my data.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Ah, that makes more sense, thanks for clarifying. One more question: how are you adding the two new fields to your sample data?

  • yfarjounyfarjoun Broad InstituteDev ✭✭✭

    In FireCloud, I set up a method configuration pointing the outputs of the wdl-script above to the fields of choice (using the edit button):

    http://imgur.com/a/dmnsG

    Then, I launch an analysis (blue button in same image) and chose the sample I want to destroy:

    http://imgur.com/a/JfHJR

    After running, the script data is fine in firecloud, but if I download the sample data I get extra tab and newlines that destroy the formatting

Sign In or Register to comment.