Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

JEXL expressions in GenotypeConcordance

We have VCFs that we've applied genotype-level filters to using FilterVariants, and we are interested in using your –gfc/gfe flags in order to allow all genotypes with an non-"PASS" FT annotation to be set to no call for a concordance check using GenotypeConcordance.

I have tried using the follow JEXL expression 'FT!="PASS"' for –gfc and –gfe but this has not been successful in setting the genotypes that FAIL to no call. Is there way to achieve this genotype-level filtration within GATK GenotypeConcordace? Is there a different JEXL expression that I should be using?

(Asking on behalf of @azo121 who isn't able to post yet)

Thanks!

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @johnwallace123 and @azo121 (who should be able to post now)

    You'll need to use VariantFiltration with JEXL to access the variant context as described here: https://www.broadinstitute.org/gatk/guide/article?id=1255

    GenotypeConcordance does not really filter anything, it is only meant for evaluation and reporting.

  • johnwallace123johnwallace123 Member ✭✭

    @Geraldine_VdAuwera, @azo121,

    We do use the VariantFiltration with the JEXL epressions such as 'GQ<20' to produce our VCFs that we want to then check for concordance. Rather than re-using the 'GQ<20' expression, we want to evaluate the VCF as it was filtered, because we want to make our concordance script as generic as possible, and the exact choice of filters is apparently a contentious issue.

    So, if we have a VCF that looks like:

    #CHROM POS REF ALT FILTER FORMAT    SAMP1           SAMP2
    1      20  A   T   PASS   GT:GQ:FT  0/1:10:BadCall  1/1:50:PASS
    

    We would want to use a JEXL expression that would allow GenotypeConcordance to interpret the VCF as:

    #CHROM POS REF ALT FILTER FORMAT SAMP1 SAMP2
    1      20  A   T   PASS   GT     ./.   1/1
    

    We've tried using -gfe 'FT!="PASS"', but that didn't seem to work. Accessing the VariantContext object would give us the variant level information, but we want to access the Genotype object directly. I've seen 'vc.getGenotype("Sample").<...>', but we don't necessarily know the sample names a priori. Also, we will likely have a large number of overlapping samples (potentially in the thousands), so building a long "or" JEXL expression may not be feasible.

    Thanks for the help,

    John

  • SheilaSheila Broad InstituteMember, Broadie admin

    @johnwallace123
    Hi John,

    I am not sure if this will work, but have you tried -gfe 'FT==BadCall'?

    -Sheila

  • azo121azo121 PAMember

    @Sheila, @johnwallace123 @Geraldine_VdAuwera

    Hi Sheila,

    Thank you for your reply. Unfortunately, that expression does not work either. John, was giving an example above as to how we would like to evaluate the VCF as it was filtered in a very generic way. The actual FT annotation is either 'PASS' or 'FAIL' .
    We have tried using -gfe 'FT!="PASS"' as well as -gfe 'FT=="PASS"', and neither of these seem to work. Please let me know if there are any other expression we can try that would allow us to perform a Genotype Concordance with this type of genotype-level filtration.

    Thank you for your help

    Anna

  • SheilaSheila Broad InstituteMember, Broadie admin

    @azo121
    Hi Anna,

    What do you mean -gfe 'FT==PASS' does not work? Do you get an error? Or, do the results not look the way you want them to?

    I am guessing you have tried -gfe 'FT==FAIL'? That is really what you want. -gfe sets all the sites in the eval file to no-call if they have the specified expression. https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_variantutils_GenotypeConcordance.php#--genotypeFilterExpressionEval

    -Sheila

  • azo121azo121 PAMember

    @Sheila Hi Sheila,

    The expression gfe 'FT==PASS' works in the sense that it does not cause an error. However, it is not successful in setting genotypes with a non-"PASS" FT annotation to be set to no-call.

    Yes, we have tried 'FT==FAIL' as well, and it also fails to set non-PASS FT genotypes to no-call.

    Thank you for your help,
    Anna

  • SheilaSheila Broad InstituteMember, Broadie admin

    @azo121
    Hi Anna,

    Can you post the exact commands you used? If you try -gfe 'GQ<20' or -gfc 'GQ<20 does it work?

    Thanks,
    Sheila

  • johnwallace123johnwallace123 Member ✭✭

    @Sheila @azo121,

    Using the previous VCF as "test.vcf" (with the missing fields fleshed out), our base command line (omitting reference) is as follows:

    GATK -T GenotypeConcordance -comp test.vcf -eval test.vcf -moltenize | grep 'SAMP' | grep '1$'
    

    This gives the expected unfiltered result (all other rows 0):

    SAMP1   HET                  HET                      1
    SAMP2   HOM_VAR              HOM_VAR                  1
    

    Adding in -gfe 'GQ<20' -gfc 'GQ<20' to the base command line gives us what we want for the "filtered" case:

    SAMP1   NO_CALL              NO_CALL                  1
    SAMP2   HOM_VAR              HOM_VAR                  1
    

    However, when we attempt any of the following, we end up with the same result as the "unfiltered" case:

    -gfe 'FT!=null' -gfc 'FT!=null'
    -gfe 'FT!=""' -gfc 'FT!=""'
    -gfe 'FT!=PASS' -gfc 'FT!=PASS'
    -gfe 'FT==BadCall' -gfc 'FT==BadCall'
    -gfe 'FT!=BadCall' -gfc 'FT!=BadCall'
    

    Unfortunately, we can't rely on solely the "GQ<20" filter, because the incoming VCFs may have had a variety of genotype-level filter annotaions applied. Also, some of the genotype filtration may not have come from JEXL expressions at all, instead coming from external sources and/or manual review.

    Thanks,

    John Wallace

  • pdexheimerpdexheimer Member ✭✭✭✭

    I see the problem.

    The JEXL engine figures out these identifier -> attribute mappings (i.e., DP -> vc.getAttribute("DP")) by means of a map specified at initialization. That map is built for Genotypes by hard coding GT, GQ, DP, and a couple of helper keys like "isHom", and then iterating over everything accessible via Genotype.getExtendedAttributes(). Unfortunately, FT (and AD, I think) is not accessible through that method - so without using the vc.getGenotype("sample").isFiltered() syntax, I don't think this is possible right now

  • SheilaSheila Broad InstituteMember, Broadie admin

    @johnwallace123
    Hi John,

    I would like to have a look at the files to see what is going on. Can you submit a bug report? Instructions are here: http://gatkforums.broadinstitute.org/discussion/1894/how-do-i-submit-a-detailed-bug-report

    Thanks,
    Sheila

  • SheilaSheila Broad InstituteMember, Broadie admin

    @johnwallace123 @pdexheimer
    Hi again John,

    Ignore my last post and see Phillip's answer. Sorry for the confusion. Thanks for clarifying Phillip!

    -Sheila

  • johnwallace123johnwallace123 Member ✭✭

    @pdexheimer,

    That's somewhat disappointing. Not having this available means that we can't use GATK for our concordance checks. Would it be possible to add a helper key "isFiltered" for this use case? If you point me to the construction of the map, I'd be happy to take a crack at it. I just got lost in all of the abstraction and I wasn't sure if it was happening in GATK or htsjdk.

    Thanks,

    John

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi all, sorry I'm coming to this so late. FYI earlier today I put in a feature request to have some convenience methods put in to be able to include/exclude sites with SelectVariants based on proportion of samples with a filtered genotype; and also to be able to set filtered genotypes to no-call with SelectVariants and VariantFiltration. I could add a request to make FT and AD available to Genotypes while we're at it -- but that might take longer than if you go ahead and do it yourself.

  • johnwallace123johnwallace123 Member ✭✭

    @pdexheimer,

    Thanks so much for the pointer in the right direction! I did have one final question (for now). I made the changes that I thought would be useful in the trunk of htsjdk, but I noticed that the trunk gatk-protected failed to build because of a number of missing keys in VCFConstants (MLE_ALLELE_COUNT_KEY is one example). Is the upcoming version of GATK targeting a specific version of htsjdk, or will it track with the trunk?

    I've personally backported our fix to our copy of the 1.120 version of htsjdk, but it would be nice to not have to go through that pain in the future... or at least not for too many releases :).

  • pdexheimerpdexheimer Member ✭✭✭✭

    GATK tracks with htsjdk. The issue you encountered is due to the differences in release schedule - htsjdk releases biweekly, GATK releases as appropriate. So there's definitely time for them to get out of synch. The VCFConstants changes will be part of GATK 3.4, though, so at least this particular issue will no longer be a problem

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @johnwallace123 Thanks for your patch to htsjdk! I believe some tests have been requested, but pending that we'll be able to push it through -- and get all of this included in 3.4 so that everyone can benefit! Your contribution is much appreciated.

Sign In or Register to comment.