The Frontline Support team will be offline February 18 for President's Day but will be back February 19th. Thank you for your patience as we get to all of your questions!
VariantFiltration

Hello:
In trying to use VariantFiltration to hard filter vcf I continue to get the same runtime warning, e.g.
WARN 15:28:05,017 Interpreter - ![0,2]: 'DP;' undefined variable DP
and all sites have 'FILTER' set to 'PASS'.
Here a typical command:
java -jar /software/gatk/3.4-46/static/GenomeAnalysisTK.jar \
-T VariantFiltration \
-L 3R:12000000-12200000 \
-R /home/chuck/shrd/reference_genomes/D_mel_RELEASE6_Sue/norm_dmel_R6_SL.fasta \
-V /home/chuck/chl_working/testgatk/DPGP3_ZI118N.raw.snps.indels.gGVCF.g.vcf \
--filterName test_DPGT38 \
--filterExpression "DP>38.0" \
-o test_filtered_CHL.vcf
I suspect I am missing an obvious requirement or constraint.
Any help appreciated.
Cheers,
Chuck
Best Answer
-
Geraldine_VdAuwera Cambridge, MA admin
Oh sure, the variant filtration should absolutely work, GVCFs are fully valid VCF files.
I thought it might be the syntax but then I had some coffee and my brain turned on.
- The warnings simply come from records that inexplicably have no INFO-level annotations (including no DP) in your file, such as
2Cen_mapped_Scaffold_10 1 . C . . . . GT:AD:DP:RGQ .:0:0:0 2Cen_mapped_Scaffold_10 2 . T . . . . GT:AD:DP:RGQ .:0:0:0 2Cen_mapped_Scaffold_10 3 . T . . . . GT:AD:DP:RGQ .:0:0:0 2Cen_mapped_Scaffold_10 4 . T . . . . GT:AD:DP:RGQ .:0:0:0 2Cen_mapped_Scaffold_10 5 . T . . . . GT:AD:DP:RGQ .:0:0:0
Did you do any processing on the VCFs that would explain this? They certainly don't look like they're fresh out of HaplotypeCaller.
- You have no FILTERed records in your output because your command is filtering out anything where DP is greater than 38, while the depth at all the sites in your example are below that number. The trick is that the filtering logic is the inverse of the selection logic. Flip the operator and you'll be good to go (for the records that are correctly annotated anyway).
Answers
@chlangley
Hi Chuck,
Are you trying to filter a GVCF? We don't recommend filtering GVCFs, as they are an intermediate file not to be used in final analyses.
If you are indeed filtering a final VCF, please post some records from it.
Thanks,
Sheila
Sheila:
Thanks for getting back so quickly.
Yes, for reasons discussed earlier I want to stick with the GVCFs.
So I want to use VariantFiltration if possible to explore and create hard (minimal) filtered data sets wi VariantFiltration.
I was hoping to Variant filtration to filter the called reference sequence sites also.
But I as I mentioned above I could not get it to work.
I am attaching a file that starts the headers of a typical GVCF followed by 100 records containing various calls.
Thanks for the help.
Cheers,
Chuck
Sheila:
I don't see the attached file to my last post.
Trying again. But I get a message, "(hdr_DPGP3_ZI118N.raw.snps.indels.gGVCF.g.vcf) Uploaded file types not allowed."
So I tried "hdr_DPGP3_ZI118N.raw.snps.indels.gGVCF.g.vcf.txt".
That seems to have worked.
Cheers,
Chuck
Try
--filterExpression "DP > 38"
But be aware that as far as we're concerned, what you're doing is The Wrong Thing. We will not be able to provide any help with interpretation of results you get with this methodology.
Hello:
I tried --filterExpression "DP > 38"
but alas the same result, lots of "WARN 14:21:42,527 Interpreter - ![0,2]: 'DP > 38;' undefined variable DP"
and no FILTERed records in the output vcd .
Cheers,
Chuck
PS: I fully understand that I am on my own in my propose analysis of a collection of separately filtered and called and (say) snpEff-annotated GVCFs. But I do appreciate help getting VariantFiltration working with them (if you think it is possible).
Oh sure, the variant filtration should absolutely work, GVCFs are fully valid VCF files.
I thought it might be the syntax but then I had some coffee and my brain turned on.
Did you do any processing on the VCFs that would explain this? They certainly don't look like they're fresh out of HaplotypeCaller.
Thanks, Geraldine for you patience.
Inverting the logic did the trick.
I guessed the warnings were from the sites empty record. I am capturing every site in the gvcf. So that's OK.
But I can't believe I did not try inverting the logic of the filter! I tried so many other less obvious paths.
Thanks again.
Cheers,
Chuck
Glad to hear that worked -- it's an easy mistake