Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

How to output AD ref and alt in separate columns with VariantsToTable

I am using VariantsToTable (GATK 3.8) to convert my MuTect2 .vcf to .tsv. The command looks like this:

gatk.sh -T VariantsToTable \
    -R "${ref_fasta}" \
    -V "${output_vcf}" \
    -F CHROM -F POS -F ID -F REF -F ALT -F FILTER -F QUAL -F AC -F AN -F NLOD -F TLOD \
    -GF AD -GF DP -GF AF \
    -o "${output_tsv}"

The relevant output columns in the output look like this:

CHROM   POS ID  REF ALT NORMAL.AD   TUMOR.AD
chr1    3102936 .   C   G   496,2   392,0

Here, column NORMAL.AD has values that look like 496,2, along with TUMOR.AD that has values in the same format, 392,0. Its my understanding that these values are the ref and alt allelic depths, respectively, for each sample.

I would like to instead have the fields output as separate columns. This would make the table output much easier to use downstream, and especially to parse e.g. in Excel. For example if there is a value such as 23,390 in this column, then Excel thinks it is the number 23,390 and it is extremely difficult to get it to perform string operations on it on order to filter in the ref and alt AD values. I end up having to write a custom script just to parse this single column and output a whole new .tsv table with the correctly split values.

It would be much easier if the values could just be output from VariantsToTable as separate fields in the first place. Is this possible?

Best Answer

  • steve1steve1
    Accepted Answer

    Thanks, unfortunately I do not ever have Excel open before opening a file, its always the reverse where I need to open a file and Excel automatically modifies the data upon opening. I ended up writing a script that parses this and splits the columns apart in Python.

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @steve1

    That is not possible in gatk. What you could do is,if this is for excel purposes, use the "text to collumns" function in excel to split the column by a "," delimiter.
    I hope this helps.

    Regards
    Bhanu

  • steve1steve1 Member

    Yes, the "text to columns" feature is exactly what I was referring to. It does not work if the value is something like "23,390", because Excel no longer treats it like text but as a number

  • steve1steve1 Member
    edited January 15

    Here is a screenshot to show what I mean:

    When you open the table in Excel, the values that match "[xx]x,xxx" are immediately converted to integers and are no longer split-able by Excel like this. And even if you highlight the cells and select "Format > Text", it still does not work because Excel has already converted it to an integer and simply gives you the integer without the comma

  • bshifawbshifaw Member, Broadie, Moderator admin
    edited January 16

    Try setting the cells to text format before adding the data to the sheet. Then try delimiting the column
    with comma.

  • steve1steve1 Member
    Accepted Answer

    Thanks, unfortunately I do not ever have Excel open before opening a file, its always the reverse where I need to open a file and Excel automatically modifies the data upon opening. I ended up writing a script that parses this and splits the columns apart in Python.

Sign In or Register to comment.