VariantFiltration

Hello:

In trying to use VariantFiltration to hard filter vcf I continue to get the same runtime warning, e.g.

WARN 15:28:05,017 Interpreter - ![0,2]: 'DP;' undefined variable DP

and all sites have 'FILTER' set to 'PASS'.

Here a typical command:

java -jar /software/gatk/3.4-46/static/GenomeAnalysisTK.jar \
-T VariantFiltration \
-L 3R:12000000-12200000 \
-R /home/chuck/shrd/reference_genomes/D_mel_RELEASE6_Sue/norm_dmel_R6_SL.fasta \
-V /home/chuck/chl_working/testgatk/DPGP3_ZI118N.raw.snps.indels.gGVCF.g.vcf \
--filterName test_DPGT38 \
--filterExpression "DP>38.0" \
-o test_filtered_CHL.vcf

I suspect I am missing an obvious requirement or constraint.

Any help appreciated.

Cheers,
Chuck

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @chlangley
    Hi Chuck,

    Are you trying to filter a GVCF? We don't recommend filtering GVCFs, as they are an intermediate file not to be used in final analyses.

    If you are indeed filtering a final VCF, please post some records from it.

    Thanks,
    Sheila

  • chlangleychlangley UCDMember

    Sheila:

    Thanks for getting back so quickly.
    Yes, for reasons discussed earlier I want to stick with the GVCFs.

    So I want to use VariantFiltration if possible to explore and create hard (minimal) filtered data sets wi VariantFiltration.

    I was hoping to Variant filtration to filter the called reference sequence sites also.

    But I as I mentioned above I could not get it to work.

    I am attaching a file that starts the headers of a typical GVCF followed by 100 records containing various calls.

    Thanks for the help.

    Cheers,
    Chuck

  • chlangleychlangley UCDMember

    Sheila:

    I don't see the attached file to my last post.
    Trying again. But I get a message, "(hdr_DPGP3_ZI118N.raw.snps.indels.gGVCF.g.vcf) Uploaded file types not allowed."

    So I tried "hdr_DPGP3_ZI118N.raw.snps.indels.gGVCF.g.vcf.txt".
    That seems to have worked.

    Cheers,
    Chuck

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Try --filterExpression "DP > 38"

    But be aware that as far as we're concerned, what you're doing is The Wrong Thing. We will not be able to provide any help with interpretation of results you get with this methodology.

  • chlangleychlangley UCDMember

    Hello:

    I tried --filterExpression "DP > 38"

    java -jar /software/gatk/3.4-46/static/GenomeAnalysisTK.jar \
    -T VariantFiltration \
    -L 3R:12000000-12200000 \
    -R /home/chuck/shrd/reference_genomes/D_mel_RELEASE6_Sue/norm_dmel_R6_SL.fasta \
    -V /home/chuck/chl_working/testgatk/DPGP3_ZI118N.raw.snps.indels.gGVCF.g.vcf \
    --filterExpression "DP > 38" \
    --filterName test_DPGT38 \
    -o test_filtered_CHL.vcf
    

    but alas the same result, lots of "WARN 14:21:42,527 Interpreter - ![0,2]: 'DP > 38;' undefined variable DP"
    and no FILTERed records in the output vcd .

    Cheers,
    Chuck

    PS: I fully understand that I am on my own in my propose analysis of a collection of separately filtered and called and (say) snpEff-annotated GVCFs. But I do appreciate help getting VariantFiltration working with them (if you think it is possible).

  • chlangleychlangley UCDMember

    Thanks, Geraldine for you patience.

    Inverting the logic did the trick.

    I guessed the warnings were from the sites empty record. I am capturing every site in the gvcf. So that's OK.

    But I can't believe I did not try inverting the logic of the filter! I tried so many other less obvious paths.

    Thanks again.

    Cheers,
    Chuck

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Glad to hear that worked -- it's an easy mistake :)

Sign In or Register to comment.