We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
(howto) Overwrite and delete data from the data model
This doc is in progress. Feel free to leave feedback on how it can be improved.
Overwrite
If an attribute already exists, you can import a load file to overwrite its values.
For example, you may have previously imported a participant load file that had several columns of metadata. If you import another participant load file, it will overwrite values in all columns that existed in the previous load file, and create new entries for any new columns.
Delete
To delete data entities or attributes from your workspace, you'll need to use the FireCloud API, as the delete functionality is not available through the web interface.
See below a step by step example of how you would delete a sample from a workspace or delete an attribute of a sample (or other entity) using the API Swagger page. If you're not familiar with this, Swagger is a popular web interface for making APIs user-friendly. You can use the API commands directly from that web page by filling in the various text fields; or you can also run the corresponding commands shown on the page from your terminal.
Before you do anything else inside a selected endpoint, you need to authenticate yourself in the Swagger page. If you see a red circle with the exclamation mark in the top right corner of the colored area, click on it (if you don't see one, congratulations, skip this step).
In the dialog that opens up, check the box that says "openid" and click the "Authorize" button. This will make you authenticate through your Google account profile. Once that's done you'll be back on the API page and the red circle should be blue.
How to delete one or more Sample_Id(s), Participant_Id(s), Pair_Id(s) or Set_Id(s)
Go to the API Swagger page. On that page you'll find a list of all the "endpoints", which correspond to the possible commands you can run. Click here for a direct link or find
POST /api/workspaces/{workspaceNamespace}/{workspaceName}/entities/delete
.The line will expand to reveal input text fields along with example values and some description of what this does.Fill in the namespace (the first part of the workspace name ending at the forward slash) and name of the workspace from which you want to delete the entity in the corresponding text fields.
- Next, the
body
field is where you specify which entity you want to delete. If you click on the box with the example values on the right, it will automatically copy the contents over to the input field, then you just need to fill in the values. Replacestring
with the relevant text for entityType and entityName.
For example, to delete a specific sample ID, the entityType will equal sample and the entityName will be the ID of that sample from the sample_ID column. To delete multiple entities, copy and paste the json format with a comma in between:
[ { "entityType": "sample", "entityName": "NA12878_24RG_small" }, { "entityType": "sample", "entityName": "NA12878_24RG_med" } ]
- Then find the little button labeled "Try it out!" and click. If everything worked correctly, the box will unfold further and among other things, there will be a line that says the result code is
204
forSuccessful request
. A409
, means the entity you tried to delete has other entities that depend on it. For example, a sample that is contained in a sample set cannot be deleted unless you also delete the sample set. A0
response code or “no server response” can be resolved by logging out your “openid” and logging back in.
How to delete an attribute of a Sample_Id(s), Participant_Id(s), Pair_Id(s) or Set_Id(s)
Go to the API Swagger page. On that page you'll find a list of all the "endpoints", which correspond to the possible commands you can run. Click here for a direct link or find
PATCH on /api/workspaces/{workspaceNamespace}/{workspaceName}/entities/{entityType}/{entityName}
.The line will expand to reveal input text fields along with example values and some description of what this does.Fill in the namespace (the first part of the workspace name ending at the forward slash) and name of the workspace from which you want to delete the attributes.
- Specify the entity type (participant, sample, pair, set) and the entity name (specific id).
- Next in the ‘attributeupdateJson’ section click the box on the right with the example values. This will automatically copy the contents over to the input field. You just need to add square brackets [ around the request ] and fill in the values. For “op”, operation, type “RemoveAttribute”. Then list the name of the attribute (the column header) you’d like to remove in “attributeName.” Replace
string
with the relevant text.
Example of removing the Zipcode (attributeName) from a sample, participant, pair, or set.
[ { "op": "RemoveAttribute", "attributeName": "ZipCode" } ]
- Then find the little button labeled "Try it out!" and click. If everything worked correctly, the box will unfold further and among other things, there will be a line that says the result code is
204
forSuccessful request
. A409
, means the entity you tried to delete has other entities that depend on it. For example, a sample that is contained in a sample set cannot be deleted unless you also delete the sample set. A0
response code or “no server response” can be resolved by logging out your “openid” and logging back in.
In addition, there are other “ops” with examples listed below:
AddUpdateAttribute: create a new attribute called attributeName with the attribute value addUpdateAttribute on the entity
i.e. you want to add a Zip Code attribute to an entity:
[ { "op": "AddUpdateAttribute", "attributeName": "ZipCode", "addUpdateAttribute": "90210" } ]
AddListMember: add a value newMember to the list attributeListName on the entity. This will create an array of values.
i.e. you want to have an attribute list of Zip Codes that the person (sample) has ever lived in. Each entry is another member of the list you want to add [90210, 90211]:
[ { "op": "AddListMember", "attributeListName": "ZipCodes", "newMember": "90210" }, { "op": "AddListMember", "attributeListName": "ZipCodes", "newMember": "90211" } ]
RemoveListMember: remove a value removeMember from the list attributeListName on the sample, participant, pair, or set.
i.e. you want to remove one of the Zip Codes from that list because you realized it was wrong:
[ { "op": "RemoveListMember", "attributeListName": "ZipCodes", "removeMember": "90211" } ]
CreateAttributeEntityReferenceList: create an empty list attributeListname of references
i.e. creates a new column called Reference_List and will display 0 items in the list.
[ { "op": "CreateAttributeEntityReferenceList", "attributeListName": “Reference_List” } ]
CreateAttributeValueList: create an empty list attributeListname.
i.e. creates a new column called ZipCodes and will display 0 items in list.
[ { "op": "CreateAttributeValueList", "attributeListName": "ZipCodes" } ]
Comments
Should FISSFC be mentioned as an option for users that don't want to edit json files ?
Hi Chip! Feel free to share more information about it with the community on the forum. I think technically CGA supports it, not this team.
I want to delete an attribute. I followed the steps up to "Specify the entity type (participant, sample, pair, set) and the entity name (specific id)" . I don't see any boxes to specify the entity type, there are only box values for workspaceNamespace, workspaceName, and workspaceUpdateJson. Where do i specify the entity type?
@beri - thanks for the questions. It appears the doc listed the wrong API. It has been corrected now. Also, "attributeName" should be the analysis_ready_bam (the name of the column). No need to add this.sample.
Thanks
Since i needed to delete the whole data model and the tutorial required a specific format i wrote a python script to do it for each entity given a tsv downloaded from FC.
Hi @Tiffany_at_Broad one thing we've just realized is that, for sample_set, if we re-upload the file we download from Firecloud, it will duplicate all of the entries (so this operation is additive, rather than replicative). Given that this screwed up all of our sample sets, we would like to just wipe all of them - can you advise how we could (easily?) do that?
Thanks,
-d
Also, just realized, this breaks Firecloud, since we now get errors saying "cannot have duplicate IDs in the same sample set" (!!). So this is likely a bug.
Thanks for letting us know @dannykwells. I will try to replicate.
In the meantime, you could try Beri's script listed above. Once you’ve downloaded the sample_set from the data model, use the sample_set_entity.tsv as input for the python script (not the membership.tsv). The output should be a json file that can be used by the Firecloud API.
I am in meetings and haven't had time to test this, but when I am out I will give it a try and let you know.
I was able to replicate and raised a bug ticket for this. Thanks for reporting.
@dannykwells and anyone else who wants to use the python script instead of typing out lots of entities. This is an example for deleting sample_sets, but can be edited and applied to other entities.
Steps for executing the python script to bulk delete
1. Save the script above (from Beri) with the ending .py
2. In your terminal/shell execute: python /path/to/name_of_script.py /path/to/sample_set_entity.tsv Note: you can download the sample_set and receive this input file right from FireCloud.
3. Once executed a new file called entities2delete will be created within the directory you are in
4. Use the delete entities API (instructions above) to delete by copying and pasting the body of the entities2delete file into the body cell. Link to API: https://api.firecloud.org/#!/Entities/deleteEntities
It is true that deleting an entry in the data model does not delete the files it is referencing to? If the files are not deleted, what's the best practice to delete those files.
@VickyGuo Correct, deleting an entry in the data model does not delete the file it is referencing. The best way to do this would be to delete the relevant files from the google bucket.
Thanks.
Hi @KellyK the best way to ensure you get an answer to your question is to post under "Ask a FireCloud question" This is actively monitored. Thanks!
How to delete one attribute for all samples? For example, just delete the column of that attribute. Thanks.