Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Issue reading VCF files with PERL library Vcf.pm after complex filtering

VergiliusVergilius ItalyMember
edited April 2016 in Ask the GATK team

Hi there,

I want to report this in GATK's VCF output format after VariantFiltration task.

I was trying to use vcftools to read the output VCF file from that program but an error came out when parsing the header. Specifically I found this is related with the string related to filters used. The following are the filters as reported in the VCF:

##FILTER=<ID=HARD_TO_VALIDATE,Description=""QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0""> ##FILTER=<ID=LowQual,Description=""QUAL > 30.0 && QUAL < 100.0"">

The double quotes are repeated, as you can see in all the three filters and this is causing Vcf.pm perl library to fail with:

Could not parse header line: FILTER=<ID=HARD_TO_VALIDATE,Description=""QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0"">
Stopped at [QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0""].

When I go to remove the repeated double-quotes than I solved this. Infact with ##FILTER=<ID=LowQual,Description="QUAL > 30.0 && QUAL < 100.0"> the error is not there.

Moreover, one more error is there because of VariantFiltration. Few lines later the Command line printed has all the parameters given to VariantFiltration and this filter again:

##GATKCommandLine.VariantFiltration=<ID=VariantFiltration,Version=3.4-46-gbc02625,Date="Sat Apr 16 12:01:58 CEST 2016",Epoch=1460800918456,CommandLineOptions=.. .. .. ..... filterExpression=["QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0", "QUAL < 30.0", "QUAL > 30.0 && QUAL < 100.0"] filterName=[HARD_TO_VALIDATE, VeryLowQual, LowQual] genotypeFilterExpression=[] genotypeFilterName=[] . .. .... >
And Vcf.pm stops with:

Could not parse header line: GATKCommandLine.VariantFiltration=<ID=VariantFiltration,Version=3.4-46-gbc02625 .....

I solved in a similar way removing the quotes from the filter elements:

filterExpression=[QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0, QUAL < 30.0, QUAL > 30.0 && QUAL < 100.0]

I do not know if it is a problem in the Vcf.pm library or the VCF format is not respected in VariantFiltration.

Hope this will be useful

Regards,

Francesco Musacchia

Best Answer

Answers

Sign In or Register to comment.