Holiday Notice:
The Frontline Support team will be slow to respond December 17-18 due to an institute-wide retreat and offline December 22- January 1, while the institute is closed. Thank you for your patience during these next few weeks. Happy Holidays!

JEXL error arising from SNPs with zero coverage for either REF or ALT alleles

Gavin_SherlockGavin_Sherlock StanfordMember
edited November 16 in Ask the GATK team

I ran:

gatk-4.0.11.0/gatk SelectVariants -R Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fasta -V variants/P2-E8-ACTGAGCG-CTTAATAG_S152_first_pass_filtered.vcf -select '(1.0*vc.getGenotype("P2-E8-ACTGAGCG-CTTAATAG_S152").getAD().1)/(1.0*vc.getGenotype("P2-E8-ACTGAGCG-CTTAATAG_S152").getDP()) > 0.9' -output variants/P2-E8-ACTGAGCG-CTTAATAG_S152_first_pass_selected.vcf --exclude-filtered

and get:

A USER ERROR has occurred: Invalid JEXL expression detected for select-0

The exact same filter worked for 90% of my files, but failed on about 10% of them. I then found that if I replace the numerator with a '1', it still fails:

1/(1.0*vc.getGenotype("P2-E8-ACTGAGCG-CTTAATAG_S152").getDP()) > 0.9

and looking at the vcf file, it turns out there are some SNPs from the initial SNP calling that have zero coverage:

I   27036   .   G   A   15.14   PASS    AC=1;AF=1.00;AN=1;FS=0.000;MLEAC=1;MLEAF=1.00;SOR=0.693 GT:AD:DP:GQ:PL  1:0,0:0:45:45,0
I   27063   .   G   A   60  PASS    AC=1;AF=1.00;AN=1;FS=0.000;MLEAC=1;MLEAF=1.00;SOR=0.693 GT:AD:DP:GQ:PL  1:0,0:0:90:90,0
XII 585947  .   T   A   16.11   PASS    AC=1;AF=1.00;AN=1;FS=0.000;MLEAC=1;MLEAF=1.00;SOR=0.693 GT:AD:DP:GQ:PL  1:0,0:0:46:46,0

Thus, I now understand why my filter failed, because of a divide by zero error, but I don't understand how I got these SNPs in the first place. They were called with:

gatk-4.0.11.0/gatk HaplotypeCaller -ploidy 1 -R Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fasta -I bam/P2-E8-ACTGAGCG-CTTAATAG_S152.dedup.realigned.bam --output variants/P2-E8-ACTGAGCG-CTTAATAG_S152_first_pass_raw.vcf
gatk-4.0.11.0/gatk VariantFiltration -R Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fasta -V variants/P2-E8-ACTGAGCG-CTTAATAG_S152_first_pass_raw.vcf -filter 'QD < 2.0 || FS > 60.0 || SOR > 3.0 || MQ < 40.0 || MQRankSum < -10.5 || ReadPosRankSum < -8.0' -output variants/P2-E8-ACTGAGCG-CTTAATAG_S152_first_pass_filtered.vcf -filter-name "hard_filter"

Any ideas how a SNP can be called with zero coverage in either the REF or the ALT alleles?

Post edited by shlee on
Tagged:

Answers

  • shleeshlee CambridgeMember, Broadie, Moderator admin
    edited November 16

    Hi @Gavin_Sherlock,

    If you search the forum, I believe there is some fair amount of discussion on calling on haploid samples with HaplotypeCaller, e.g. here, here and here. I'm not familiar with these discussions but I list them because perhaps they will be of interest to you.

    As an important aside, please see the recent discussion in https://github.com/broadinstitute/gatk/issues/5362 describing the unintuitive behavior of VariantFiltration.

    To diagnose what may be going on with your AD and DP 0 calls, you should see if the ALT calls for your three sites make sense in terms of the reads. You can start by viewing your preprocessed reads in IGV and then continue with viewing the HaplotypeCaller BAMOUT. If they don't make sense, please let us know. Although having AD=0 sites is known to happen, variant calls at DP=0 sites is odd.

    Another approach I can think of for calling on your yeast sample is to use Mutect2. Mutect2 does not have an explicit ploidy assumption and so will call even low allele fraction variants, which may be helpful to your research. You can use the tumor-only mode of the tool. Tutorial#11136 outlines a Mutect2 workflow in the context of somatic calling but the tutorial should be sufficient for understanding key features of Mutect2.

    Post edited by shlee on
  • Gavin_SherlockGavin_Sherlock StanfordMember

    Thanks - I'll look at the sites in question to see what the coverage looks like to see if I can work out how they ended up having a variant, but no coverage. I read your linked article on VariantFiltration - definitely not how I would expect it to work! Will have to think about whether I should go back and re-examine my filters.

  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin

    @Gavin_Sherlock How did this solution work out for you? This ticket will now be closed but as always, feel free to reach out again with any follow-up questions! You can respond here with follow-up questions.

Sign In or Register to comment.