Filtering based on annotations under ANN within the INFO field
I am brand new to using GATK and was assigned the task of filtering a VCF file provided to me. I already had success with filtering based on the AF value and other numerical fields, however, the rest of the tags I should filter are included (or more like, buried) within an extremely long "ANN=..." field inside the INFO column that I'm having trouble extracting the information from.
Basically, what I need is something like
-filter "ANN=has_a_string_somewhere_inside_this_extremely_long_field" -filter-name "HAS_THIS_ANNOTATION"
An example line from my VCF:
where you can see, there is a lot of information inside "ANN=" and what I would like to use as filter expressions are tags like "synonymous_variant".
I was trying to simply include it as a string, because that's only how I have seen it before, like:
./gatk VariantFiltration -V my.snps.vcf -R ref.fasta -filter "ANN=='non_coding_transcript_exon_variant'" --filter-name "EXONIC" -filter "AF < 0.01" --filter-name "AF-FAIL" -O filtered_for_EXON_AF.vcf
or use a regular expression like
".*non_coding_transcript_exon_variant.*", but it brings no results. When I use -invfilter, it adds the filter name to every single line, so I guess I really just need to find a way to describe this string I'm searching for properly...
I assume the biggest issue could be my lack of experience with JEXL, but I have yet to find a simple tutorial on how to describe such a value with these expressions, ie. ANN=(any_context)STRING_I_WANT(any_context)...
I will also welcome any additional tips on how to deal with monstrous ANN fields like this in general, as I would like to also use SelectVariants on it and separately add them into my future .table when I manage to filter it.
I have gatk-188.8.131.52 and openjdk version "1.8.0_171".