(howto) Overwrite and delete data from the data model

Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Administrator, Broadie, Moderator admin
edited April 2018 in Tutorials

This doc is in progress. Feel free to leave feedback on how it can be improved.

Overwrite

If an attribute already exists, you can import a load file to overwrite its values.

For example, you may have previously imported a participant load file that had several columns of metadata. If you import another participant load file, it will overwrite values in all columns that existed in the previous load file, and create new entries for any new columns.


Delete

To delete data entities or attributes from your workspace, you'll need to use the FireCloud API, as the delete functionality is not available through the web interface.

See below a step by step example of how you would delete a sample from a workspace or delete an attribute of a sample (or other entity) using the API Swagger page. If you're not familiar with this, Swagger is a popular web interface for making APIs user-friendly. You can use the API commands directly from that web page by filling in the various text fields; or you can also run the corresponding commands shown on the page from your terminal.

Before you do anything else inside a selected endpoint, you need to authenticate yourself in the Swagger page. If you see a red circle with the exclamation mark in the top right corner of the colored area, click on it (if you don't see one, congratulations, skip this step).

In the dialog that opens up, check the box that says "openid" and click the "Authorize" button. This will make you authenticate through your Google account profile. Once that's done you'll be back on the API page and the red circle should be blue.


How to delete one or more Sample_Id(s), Participant_Id(s), Pair_Id(s) or Set_Id(s)

  1. Go to the API Swagger page. On that page you'll find a list of all the "endpoints", which correspond to the possible commands you can run. Click here for a direct link or find POST /api/workspaces/{workspaceNamespace}/{workspaceName}/entities/delete.The line will expand to reveal input text fields along with example values and some description of what this does.

  2. Fill in the namespace (the first part of the workspace name ending at the forward slash) and name of the workspace from which you want to delete the entity in the corresponding text fields.

  3. Next, the body field is where you specify which entity you want to delete. If you click on the box with the example values on the right, it will automatically copy the contents over to the input field, then you just need to fill in the values. Replace string with the relevant text for entityType and entityName.
  • For example, to delete a specific sample ID, the entityType will equal sample and the entityName will be the ID of that sample from the sample_ID column. To delete multiple entities, copy and paste the json format with a comma in between:

    [
      {
        "entityType": "sample",
        "entityName": "NA12878_24RG_small"
      },
      {
        "entityType": "sample",
        "entityName": "NA12878_24RG_med"
      }
    ]
    
  1. Then find the little button labeled "Try it out!" and click. If everything worked correctly, the box will unfold further and among other things, there will be a line that says the result code is 204 for Successful request. A 409, means the entity you tried to delete has other entities that depend on it. For example, a sample that is contained in a sample set cannot be deleted unless you also delete the sample set. A 0 response code or “no server response” can be resolved by logging out your “openid” and logging back in.

How to delete an attribute of a Sample_Id(s), Participant_Id(s), Pair_Id(s) or Set_Id(s)

  1. Go to the API Swagger page. On that page you'll find a list of all the "endpoints", which correspond to the possible commands you can run. Click here for a direct link or find PATCH on /api/workspaces/{workspaceNamespace}/{workspaceName}/entities/{entityType}/{entityName}.The line will expand to reveal input text fields along with example values and some description of what this does.

  2. Fill in the namespace (the first part of the workspace name ending at the forward slash) and name of the workspace from which you want to delete the attributes.

  3. Specify the entity type (participant, sample, pair, set) and the entity name (specific id).
  4. Next in the ‘attributeupdateJson’ section click the box on the right with the example values. This will automatically copy the contents over to the input field. You just need to add square brackets [ around the request ] and fill in the values. For “op”, operation, type “RemoveAttribute”. Then list the name of the attribute (the column header) you’d like to remove in “attributeName.” Replace string with the relevant text.

Example of removing the Zipcode (attributeName) from a sample, participant, pair, or set.

[
  {
    "op": "RemoveAttribute",
    "attributeName": "ZipCode"
  }
]
  1. Then find the little button labeled "Try it out!" and click. If everything worked correctly, the box will unfold further and among other things, there will be a line that says the result code is 204 for Successful request. A 409, means the entity you tried to delete has other entities that depend on it. For example, a sample that is contained in a sample set cannot be deleted unless you also delete the sample set. A 0 response code or “no server response” can be resolved by logging out your “openid” and logging back in.

In addition, there are other “ops” with examples listed below:

AddUpdateAttribute: create a new attribute called attributeName with the attribute value addUpdateAttribute on the entity

i.e. you want to add a Zip Code attribute to an entity:

[
  {
    "op": "AddUpdateAttribute",
    "attributeName": "ZipCode",
    "addUpdateAttribute": "90210"
  }
]

AddListMember: add a value newMember to the list attributeListName on the entity. This will create an array of values.

i.e. you want to have an attribute list of Zip Codes that the person (sample) has ever lived in. Each entry is another member of the list you want to add [90210, 90211]:

[
  {
    "op": "AddListMember",
    "attributeListName": "ZipCodes",
    "newMember": "90210"
  },
  {
    "op": "AddListMember",
    "attributeListName": "ZipCodes",
    "newMember": "90211"
  }
]

RemoveListMember: remove a value removeMember from the list attributeListName on the sample, participant, pair, or set.

i.e. you want to remove one of the Zip Codes from that list because you realized it was wrong:

[
  {
    "op": "RemoveListMember",
    "attributeListName": "ZipCodes",
    "removeMember": "90211"
  }
]

CreateAttributeEntityReferenceList: create an empty list attributeListname of references

i.e. creates a new column called Reference_List and will display 0 items in the list.

[ 
 {
    "op": "CreateAttributeEntityReferenceList",
    "attributeListName": “Reference_List”
  }
]

CreateAttributeValueList: create an empty list attributeListname.

i.e. creates a new column called ZipCodes and will display 0 items in list.

[ 
 {
    "op": "CreateAttributeValueList",
    "attributeListName": "ZipCodes"
  }
]
Post edited by KateN on
Tagged:

Comments

  • ChipChip 415M 4053Member, Broadie

    Should FISSFC be mentioned as an option for users that don't want to edit json files ?

  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Administrator, Broadie, Moderator admin

    Hi Chip! Feel free to share more information about it with the community on the forum. I think technically CGA supports it, not this team.

  • bshifawbshifaw Member, Broadie, Moderator admin
    edited February 2018

    I want to delete an attribute. I followed the steps up to "Specify the entity type (participant, sample, pair, set) and the entity name (specific id)" . I don't see any boxes to specify the entity type, there are only box values for workspaceNamespace, workspaceName, and workspaceUpdateJson. Where do i specify the entity type?

    Post edited by bshifaw on
  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Administrator, Broadie, Moderator admin

    @beri - thanks for the questions. It appears the doc listed the wrong API. It has been corrected now. Also, "attributeName" should be the analysis_ready_bam (the name of the column). No need to add this.sample.
    Thanks

  • bshifawbshifaw Member, Broadie, Moderator admin
    edited October 2018

    Since i needed to delete the whole data model and the tutorial required a specific format i wrote a python script to do it for each entity given a tsv downloaded from FC.

    #usage
    #python delete_entity.py <entity.tsv>
    
    
    import sys
    import re
    
    file_of_entities = sys.argv[1] 
    
    entity_list = []
    entity_bulk_delete = []
    
    #open and read file 
    file = open(file_of_entities, "r")
    
    #get entity type from first line in file. Also removes first line from file
    first_line = file.readline()
    entityType = ((first_line.split(':')[1]).split('\t')[0]).split('_i')[0]
    
    
    #write lines in file into entity_list. 
    entity_list = file.readlines()
    file.close()
    
    #write to file the entites to be deleted in correct format
    file = open("entities2delete.json", "w")
    file.write("["+'\n')
    for entity in entity_list:
        file.write('  {' + '\n')
        file.write("    \"entityType\": \"" + entityType + "\",\n")
        file.write("    \"entityName\": \"" + (entity.split('\t')[0]).rstrip() + "\"\n")
        file.write('  },' + '\n')
    file.close()
    
    #Read files into array to remove comma on the last brace and add bracket
    file = open("entities2delete.json", "r")
    lines = file.readlines() 
    del lines[-1] 
    lines.append('  }\n' + ']\n')
    file.close()
    
    
    file = open("entities2delete.json", "w")
    file.writelines(lines) 
    file.close()
    
    #print new line to indicate run finished. 
    print('Created entities2delete.json \n')
    
    Post edited by bshifaw on
  • dannykwellsdannykwells San FranciscoMember ✭✭

    Hi @Tiffany_at_Broad one thing we've just realized is that, for sample_set, if we re-upload the file we download from Firecloud, it will duplicate all of the entries (so this operation is additive, rather than replicative). Given that this screwed up all of our sample sets, we would like to just wipe all of them - can you advise how we could (easily?) do that?

    Thanks,
    -d

  • dannykwellsdannykwells San FranciscoMember ✭✭

    Also, just realized, this breaks Firecloud, since we now get errors saying "cannot have duplicate IDs in the same sample set" (!!). So this is likely a bug.

  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Administrator, Broadie, Moderator admin

    Thanks for letting us know @dannykwells. I will try to replicate.
    In the meantime, you could try Beri's script listed above. Once you’ve downloaded the sample_set from the data model, use the sample_set_entity.tsv as input for the python script (not the membership.tsv). The output should be a json file that can be used by the Firecloud API.

    I am in meetings and haven't had time to test this, but when I am out I will give it a try and let you know.

  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Administrator, Broadie, Moderator admin

    I was able to replicate and raised a bug ticket for this. Thanks for reporting.

  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Administrator, Broadie, Moderator admin

    @dannykwells and anyone else who wants to use the python script instead of typing out lots of entities. This is an example for deleting sample_sets, but can be edited and applied to other entities.

    Steps for executing the python script to bulk delete
    1. Save the script above (from Beri) with the ending .py
    2. In your terminal/shell execute: python /path/to/name_of_script.py /path/to/sample_set_entity.tsv Note: you can download the sample_set and receive this input file right from FireCloud.
    3. Once executed a new file called entities2delete will be created within the directory you are in
    4. Use the delete entities API (instructions above) to delete by copying and pasting the body of the entities2delete file into the body cell. Link to API: https://api.firecloud.org/#!/Entities/deleteEntities

  • VickyGuoVickyGuo Member

    It is true that deleting an entry in the data model does not delete the files it is referencing to? If the files are not deleted, what's the best practice to delete those files.

  • SChaluvadiSChaluvadi Member, Broadie, Moderator admin

    @VickyGuo Correct, deleting an entry in the data model does not delete the file it is referencing. The best way to do this would be to delete the relevant files from the google bucket.

  • KellyKKellyK Member
    Hello. I am consistently getting a "0" response code (no server response). Logging out my openid and logging back in does not fix the problem. Do you have any other suggestions?
    Thanks.
  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Administrator, Broadie, Moderator admin

    Hi @KellyK the best way to ensure you get an answer to your question is to post under "Ask a FireCloud question" This is actively monitored. Thanks!

  • arielyharielyh Member

    How to delete one attribute for all samples? For example, just delete the column of that attribute. Thanks.

Sign In or Register to comment.