We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

gatk FilterVariantTranches

jodybkjodybk LondonMember

I am a bit unsure on the usage of FilterVariantTranches. I have have applied CNNScoreVariants to a VCF and am now trying to filter. I am using another VCF as the resource (also after applying CNNScoreVariants) like this:

gatk FilterVariantTranches -V ERR1213935.CNN.vcf.gz --resource por7A1.CNN.vcf.gz --info-key CNN_1D --snp-tranche 99.95 --indel-tranche 99.4 -O out.vcf

However it is throwing the following error: A USER ERROR has occurred: VCF must contain SNPs and indels with scores and resources must contain matching SNPs and indels.

So my question does the overlap between the sample VCF and the resource have to be? At the moment they contain some of the same variants but also different ones. From the error message it seems like they should contain exactly the same variants?

Thanks in advance,
Jody

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @jodybk

    What is this resource vcf you are using --resource por7A1.CNN.vcf.gz? You should use a list of validated VCFs with known sites of common variation as a resource vcf, see https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_hellbender_tools_walkers_vqsr_FilterVariantTranches.php

    Can you please post the version of GATK you are using and the CNNScoreVariants command you used.

  • medgenrumedgenru RussiaMember
    Hello, i have same error and i cant figure out what is the problem.
    I've built pipeline from gatk best practices for germline snp+indels for single-sample data, all steps work fine up until filtering tranches.

    Command that i use

    gatk FilterVariantTranches \
    -V ${R1%%_*}-annotated.vcf \
    --create-output-variant-index true \
    --resource $WES/db/00-All.vcf.gz \
    --resource $WES/reference/1000G_phase1.indels.hg19.sites.vcf \
    --resource $WES/reference/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf \
    --snp-tranche 99.9 --snp-tranche 99.95 \
    --indel-tranche 99.0 --indel-tranche 99.4 \
    -O ${R1%%_*}-filtered.vcf \
    --tmp-dir $WES/output/tmp \
    --java-options "-Xmx30g"

    What should i do ?
    I'm feeling like its a result of previous steps, but they work without errors so i'm clueless.
    Thank you.
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @medgenru

    Take a look at my previous post.
    What is this resource vcf you are using --resource 00-All.vcf.gz? You should use a list of validated VCFs with known sites of common variation as a resource vcf, see https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_hellbender_tools_walkers_vqsr_FilterVariantTranches.php

  • medgenrumedgenru RussiaMember
    > @bhanuGandham said:
    > Hi @medgenru
    >
    > Take a look at my previous post.
    > What is this resource vcf you are using --resource 00-All.vcf.gz? You should use a list of validated VCFs with known sites of common variation as a resource vcf, see https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_hellbender_tools_walkers_vqsr_FilterVariantTranches.php

    00-All.vcf.gz is a dbsnp downloaded directly from ncbi ftp
    ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh37p13/VCF/GATK/

    It was used in base recalibration step of pipeline as well. But even if i remove it from command and leave only Mills and 1000G the error is present.
  • medgenrumedgenru RussiaMember
    Okay, so i reread thru best practices and experimented a bit, and now i have question regarding HC and CNN.
    Do i have to run HaplotypeCaller in AS GVCF mode or nonAS standard VCF to smoothely run annotation proccess in CNN and FilterVariantTranches ?
    Because i now feel this might've been the problem but i'm not sure.
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @medgenru

    Can you please post:
    1. the version of GATK used
    2. the exact commands for CNNScoreVariants and FilterVariantTranches
    3. entire error log

  • medgenrumedgenru RussiaMember
    > @bhanuGandham said:
    > Hi @medgenru
    >
    > Can you please post:
    > 1. the version of GATK used
    > 2. the exact commands for CNNScoreVariants and FilterVariantTranches
    > 3. entire error log

    Sorry for late response. Problem solved itself when i looked up at what vcf type created at HaplotypeCaller. I was creating GVCF files so after rerunning in standard mode all worked fine.
Sign In or Register to comment.