We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Issue reading VCF files with PERL library Vcf.pm after complex filtering
I want to report this in GATK's VCF output format after VariantFiltration task.
I was trying to use vcftools to read the output VCF file from that program but an error came out when parsing the header. Specifically I found this is related with the string related to filters used. The following are the filters as reported in the VCF:
##FILTER=<ID=HARD_TO_VALIDATE,Description=""QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0""> ##FILTER=<ID=LowQual,Description=""QUAL > 30.0 && QUAL < 100.0"">
The double quotes are repeated, as you can see in all the three filters and this is causing Vcf.pm perl library to fail with:
Could not parse header line: FILTER=<ID=HARD_TO_VALIDATE,Description=""QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0"">
Stopped at [QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0""].
When I go to remove the repeated double-quotes than I solved this. Infact with ##FILTER=<ID=LowQual,Description="QUAL > 30.0 && QUAL < 100.0"> the error is not there.
Moreover, one more error is there because of VariantFiltration. Few lines later the Command line printed has all the parameters given to VariantFiltration and this filter again:
##GATKCommandLine.VariantFiltration=<ID=VariantFiltration,Version=3.4-46-gbc02625,Date="Sat Apr 16 12:01:58 CEST 2016",Epoch=1460800918456,CommandLineOptions=.. .. .. ..... filterExpression=["QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0", "QUAL < 30.0", "QUAL > 30.0 && QUAL < 100.0"] filterName=[HARD_TO_VALIDATE, VeryLowQual, LowQual] genotypeFilterExpression= genotypeFilterName= . .. .... >
And Vcf.pm stops with:
Could not parse header line: GATKCommandLine.VariantFiltration=<ID=VariantFiltration,Version=3.4-46-gbc02625 .....
I solved in a similar way removing the quotes from the filter elements:
filterExpression=[QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0, QUAL < 30.0, QUAL > 30.0 && QUAL < 100.0]
I do not know if it is a problem in the Vcf.pm library or the VCF format is not respected in VariantFiltration.
Hope this will be useful