Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Using GATK SelectVariants to filter based on calculated allele frequency

Many of the variant callers I use, such as Pindel, do not include the AF or allele frequency value in the vcf output. However I still need to filter the vcf based on the allele frequencies of the Tumor and Normal samples.

My .vcf file looks like this:

##fileformat=VCFv4.0
##FILTER=<ID=PASS,Description="All filters passed">
##fileDate=2012_03_15
##source=pindel
##reference=hg19
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
##INFO=<ID=HOMLEN,Number=1,Type=Integer,Description="Length of base pair identical micro-homology at event breakpoints">
##INFO=<ID=PF,Number=1,Type=Integer,Description="The number of samples carry the variant">
##INFO=<ID=HOMSEQ,Number=.,Type=String,Description="Sequence of base pair identical micro-homology at event breakpoints">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Difference in length between REF and ALT alleles">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=NTLEN,Number=.,Type=Integer,Description="Number of bases inserted in place of deleted code">
##FORMAT=<ID=PL,Number=3,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=RD,Number=1,Type=Integer,Description="Reference depth, how many reads support the reference">
##FORMAT=<ID=AD,Number=2,Type=Integer,Description="Allele depth, how many reads support this allele">
##bcftools_normVersion=1.3+htslib-1.3
##bcftools_normCommand=norm --multiallelics -both --output-type v -
##bcftools_normCommand=norm --rm-dup both --output-type v -
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NORMAL  TUMOR
chr9    87325582    .   CTG C   .   PASS    END=87325584;HOMLEN=2;HOMSEQ=TG;SVLEN=-2;SVTYPE=DEL GT:AD   0/0:601,0   0/0:356,1
chr9    87325589    .   AT  A   .   PASS    END=87325590;HOMLEN=1;HOMSEQ=T;SVLEN=-1;SVTYPE=DEL  GT:AD   0/0:595,1   0/0:352,1
chr9    87325608    .   TC  T   .   PASS    END=87325609;HOMLEN=1;HOMSEQ=C;SVLEN=-1;SVTYPE=DEL  GT:AD   0/0:596,0   0/0:337,1
chr9    87325675    .   T   TC  .   PASS    END=87325675;HOMLEN=4;HOMSEQ=CCCC;SVLEN=1;SVTYPE=INS    GT:AD   0/0:620,0   0/0:408,1

I can use a command like this in order to filter for structural variants of length less than 500bp, and for variants that have at least 5 variant reads in the tumor:

gatk.sh -T SelectVariants \
-R "genome.fa" \
-V "norm.vcf" \
-select 'SVLEN < 500' \
-select 'vc.getGenotype("TUMOR").getAD().1 > 5' \
> "filtered.vcf"

However, I cannot figure out a select expression that will filter based on the allele frequency; AD Alt / ( AD Alt + AD Ref )

This is the type of command I am trying to use:

gatk.sh -T SelectVariants \
-R "genome.fa" \
-V "norm.vcf" \
-select 'SVLEN < 500' \
-select 'vc.getGenotype("TUMOR").getAD().1 / (vc.getGenotype("TUMOR").getAD().0 + vc.getGenotype("TUMOR").getAD().1) > 0.01' \
> "filtered.vcf"

But this command does not work, it seems to fail silently and instead print out no variants at all, even if there are variants present that match the criteria

How do you get this to work?

for reference, the docs are here for SelectVariants:

https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/org_broadinstitute_gatk_tools_walkers_variantutils_SelectVariants.php

and here for the JEXL expression guidelines:

www.broadinstitute.org/gatk/guide/article?id=1255

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @steve1

    We are sorry, the forum volume is high right now and we are only focusing on tool error questions right now. Will are unable to answer experimental design question at the moment. Please feel free to look through GATK user guide for more information on functionality of the tools.

  • steve1steve1 Member

    Hi @bhanuGandham , this is not an experimental design question, it is a question about tool functionality & errors. The user guide does not explain how to implement this functionality, and trying to implement it myself based on the user guide's instructions results in an error; no variants are output. I listed in my original post the relevant user guides I used to develop my implementation shown. Thanks.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited August 5

    @steve1

    I was thrown by the mention of Pindel which is not supported by GATK. Let me look into this and see what I can find.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @steve1

    1) We do not support GATK3 anymore. Please try this with the latest version of GATK4.1.3.0 and see if the issue persists.
    2) If you are still getting the same issue then please share your input vcf and reference files using the following instructions: https://software.broadinstitute.org/gatk/guide/article?id=1894

Sign In or Register to comment.