Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Suggestions on how to correct this malformed vcf

sp580sp580 GermanyMember
edited December 2018 in Ask the GATK team

Hello!

Trying to filter a variant call set using SelectVariants, I get the followin error:

htsjdk.tribble.TribbleException: The provided VCF file is malformed at approximately line number 141: 0 is an invalid filter name in vcf4, for input source:

Line 141 is here, and it corresponds to the first record in the vcf:

1 3000019 . G GA 999 MinAB;MinABF;MinABR;MinGQ;MinDP INDEL;IDV=9;IMF=1;DP=152;ADF=1,151;ADR=0,0;AD=1,151;VDB=0;SGB=-6.33551;MQ0F=0.0986842;AC=80;AN=80;DP4=1,0,151,0;MQ=28;CSQ=A||||intergenic_variant||||||||||| GT:PL:DP:SP:ADF:ADR:AD:GQ:FT 1/1:48,5,0:4:0:1,3:0,0:1,3:30:0 1/1:120,15,0:5:0:0,5:0,0:0,5:40:0 1/1:72,9,0:3:0:0,3:0,0:0,3:34:0 1/1:42,6,0:2:0:0,2:0,0:0,2:31:0 1/1:43,9,0:3:0:0,3:0,0:0,3:34:0 1/1:40,3,0:1:0:0,1:0,0:0,1:28:0 1/1:160,24,0:8:0:0,8:0,0:0,8:49:0 1/1:97,18,0:6:0:0,6:0,0:0,6:43:0 1/1:39,3,0:1:0:0,1:0,0:0,1:28:0 1/1:125,15,0:5:0:0,5:0,0:0,5:40:0 1/1:95,12,0:4:0:0,4:0,0:0,4:37:0 1/1:73,9,0:3:0:0,3:0,0:0,3:34:0 1/1:173,24,0:8:0:0,8:0,0:0,8:49:0 1/1:48,6,0:2:0:0,2:0,0:0,2:31:0 1/1:41,9,0:3:0:0,3:0,0:0,3:34:0 1/1:45,9,0:3:0:0,3:0,0:0,3:34:0 1/1:128,18,0:6:0:0,6:0,0:0,6:43:0 1/1:73,9,0:3:0:0,3:0,0:0,3:34:0 1/1:51,9,0:3:0:0,3:0,0:0,3:34:0 1/1:31,3,0:1:0:0,1:0,0:0,1:28:0 1/1:78,9,0:3:0:0,3:0,0:0,3:34:0 1/1:56,9,0:3:0:0,3:0,0:0,3:34:0 1/1:42,6,0:2:0:0,2:0,0:0,2:31:0 1/1:7,6,0:2:0:0,2:0,0:0,2:31:0 ./.:0,0,0:0:0:0,0:0,0:0,0:0:0 1/1:71,6,0:2:0:0,2:0,0:0,2:31:0 1/1:97,15,0:5:0:0,5:0,0:0,5:40:0 1/1:54,9,0:3:0:0,3:0,0:0,3:34:0 1/1:141,21,0:7:0:0,7:0,0:0,7:46:0 1/1:97,9,0:3:0:0,3:0,0:0,3:34:0 1/1:152,21,0:7:0:0,7:0,0:0,7:46:0 ./.:0,0,0:0:0:0,0:0,0:0,0:0:0 1/1:42,6,0:2:0:0,2:0,0:0,2:31:0 1/1:72,9,0:3:0:0,3:0,0:0,3:34:0 1/1:73,12,0:4:0:0,4:0,0:0,4:37:0 1/1:100,9,0:3:0:0,3:0,0:0,3:34:0 1/1:74,9,0:3:0:0,3:0,0:0,3:34:0 1/1:40,3,0:1:0:0,1:0,0:0,1:28:0 ./.:0,0,0:0:0:0,0:0,0:0,0:0:0 1/1:112,24,0:8:0:0,8:0,0:0,8:49:0 1/1:156,27,0:9:0:0,9:0,0:0,9:52:0 1/1:115,18,0:6:0:0,6:0,0:0,6:43:0 1/1:42,6,0:2:0:0,2:0,0:0,2:31:0

I ran the tool as follows:
gatk SelectVariants -R ref.fa -V resource.vcf.gz -O resource_PASS.vcf --exclude-filtered

This is a file I downloaded from ftp://ftp-mouse.sanger.ac.uk/REL-1807-SNPs_Indels/

I am not sure what it means with 0 is an invalid filter name

Answers

  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin

    HI @sp580 I am curious if this might be a download error. I tried to download the vcf and I am currently running it through a program vcf-validator, but it will take a few more hours to finish. So far, this error has not popped up. Because it is a very large file, one thought I had was that it might not have downloaded completely to your machine.

    Usually, that error refers to the presence of alternative alleles such as C 0, instead of C T, for example. I do see one instance of the "./." notation in your example, which may be an indication that this symbol needs to be replaced with something that can be read by GATK, but I am not sure. I could not find the mouse sanger group documentation on how their group notates their alternative versus reference alleles. I would encourage you to send them a note to find out about what ./. means (versus 1/1) in their version of vcf.

  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin

    @sp580 I finished running the vcf-validator tool from the [link here] (https://hub.docker.com/r/ken01nn/vcf-validator/)

    It did generate an output indicating that there is an error. It is not unusual for the lines to be offset due to the error because the headers may not have been counted in this tool. So, contacting the group that generated this vcf may be more expeditious than trying to fix it yourself.

    Here is the error message generated by the tool I mentioned above:

    According to the VCF specification, the input file is not valid Error: ALT metadata ID is not prefixed by DEL/INS/DUP/INV/CNV/BND and suffixed by ':' and a text sequence. This occurs 1 time(s), first time in line 72. Error: Error in meta-data section. This occurs 1 time(s), first time in line 128.

Sign In or Register to comment.