Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Error with validating VCF file or known variants

Hi,

I got the following error when validating VCF files. The downloaded the reference sequence from ftp://ftp.broadinstitute.org/pub/seq/references/Homo_sapiens_assembly19.fasta, and the vcf file from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/VCF/00-All.vcf.gz. Could anyone help? Thank you!

java -jar GenomeAnalysisTK.jar -T ValidateVariants -R ~/data/genome/genome19.fasta --variant:VCF ~/data/genome/All.2ab.vcf
INFO 08:11:55,241 HelpFormatter - --------------------------------------------------------------------------------
INFO 08:11:55,255 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.1-1-g07a4bf8, Compiled 2014/03/18 06:09:21
INFO 08:11:55,255 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 08:11:55,255 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 08:11:55,261 HelpFormatter - Program Args: -T ValidateVariants -R /home/data/genome/genome19.fasta --variant:VCF /home/shenyang/data/genome/All.2ab.vcf
INFO 08:11:55,276 HelpFormatter - Executing as [email protected] on Linux 2.6.32-250.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_25-b15.
INFO 08:11:55,276 HelpFormatter - Date/Time: 2014/06/24 08:11:55
INFO 08:11:55,276 HelpFormatter - --------------------------------------------------------------------------------
INFO 08:11:55,276 HelpFormatter - --------------------------------------------------------------------------------
INFO 08:11:56,447 GenomeAnalysisEngine - Strictness is SILENT
INFO 08:11:56,587 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO 08:23:50,815 RMDTrackBuilder - Writing Tribble index to disk for file /home/data/genome/All.2ab.vcf.idx
INFO 08:25:12,069 GenomeAnalysisEngine - Preparing for traversal
INFO 08:25:12,082 GenomeAnalysisEngine - Done preparing for traversal
INFO 08:25:12,083 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 08:25:12,083 ProgressMeter - Location processed.sites runtime per.1M.sites completed total.runtime remaining
INFO 08:25:13,766 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 3.1-1-g07a4bf8):
ERROR
ERROR This means that one or more arguments or inputs in your command are incorrect.
ERROR The error message below tells you what is the problem.
ERROR
ERROR If the problem is an invalid argument, please check the online documentation guide
ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
ERROR
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
ERROR
ERROR MESSAGE: File /home/data/genome/All.2ab.vcf fails strict validation: the REF allele is incorrect for the record at position 1:257683, fasta says A vs. VCF says C
ERROR ------------------------------------------------------------------------------------------

Best Answer

Answers

  • shyangshyang USMember
  • shyangshyang USMember

    I used the vcf file downloaded from the above link, however, I still got errors.

    Could you please help again? Thanks!

    INFO 13:34:51,088 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO 13:34:51,088 ProgressMeter - Location processed.sites runtime per.1M.sites completed total.runtime remaining
    INFO 13:34:51,844 ValidateVariants - Reference allele is too long (218) at position 1:1043802; skipping that record.
    INFO 13:34:51,845 ValidateVariants - Reference allele is too long (180) at position 1:1043815; skipping that record.
    INFO 13:34:51,920 ValidateVariants - Reference allele is too long (130) at position 1:1317886; skipping that record.
    INFO 13:34:52,036 ValidateVariants - Reference allele is too long (213) at position 1:2073334; skipping that record.
    INFO 13:34:52,122 ValidateVariants - Reference allele is too long (237) at position 1:2628604; skipping that record.
    INFO 13:34:52,147 ValidateVariants - Reference allele is too long (148) at position 1:2866225; skipping that record.
    INFO 13:34:52,149 ValidateVariants - Reference allele is too long (121) at position 1:2876601; skipping that record.
    INFO 13:34:52,182 ValidateVariants - Reference allele is too long (237) at position 1:3084534; skipping that record.
    INFO 13:34:52,200 ValidateVariants - Reference allele is too long (125) at position 1:3215790; skipping that record.
    INFO 13:34:52,270 ValidateVariants - Reference allele is too long (151) at position 1:3716941; skipping that record.
    INFO 13:34:52,314 ValidateVariants - Reference allele is too long (165) at position 1:3791050; skipping that record.
    INFO 13:34:52,337 ValidateVariants - Reference allele is too long (119) at position 1:4059835; skipping that record.
    INFO 13:34:52,415 ValidateVariants - Reference allele is too long (148) at position 1:4676497; skipping that record.
    INFO 13:34:52,770 ValidateVariants - Reference allele is too long (116) at position 1:7706252; skipping that record.
    INFO 13:34:53,378 ValidateVariants - Reference allele is too long (191) at position 1:12371679; skipping that record.
    INFO 13:34:53,733 ValidateVariants - Reference allele is too long (185) at position 1:16079261; skipping that record.
    INFO 13:34:54,278 ValidateVariants - Reference allele is too long (177) at position 1:20960209; skipping that record.
    INFO 13:34:54,278 ValidateVariants - Reference allele is too long (218) at position 1:20960209; skipping that record.
    INFO 13:34:54,412 ValidateVariants - Reference allele is too long (173) at position 1:22176683; skipping that record.
    INFO 13:34:54,567 ValidateVariants - Reference allele is too long (166) at position 1:23760863; skipping that record.
    INFO 13:34:54,684 ValidateVariants - Reference allele is too long (141) at position 1:24958355; skipping that record.
    INFO 13:34:54,727 ValidateVariants - Reference allele is too long (185) at position 1:25417119; skipping that record.
    INFO 13:34:55,076 ValidateVariants - Reference allele is too long (131) at position 1:29540165; skipping that record.
    INFO 13:34:55,992 ValidateVariants - Reference allele is too long (244) at position 1:37816472; skipping that record.
    INFO 13:34:57,076 ValidateVariants - Reference allele is too long (140) at position 1:48841771; skipping that record.
    INFO 13:34:58,300 ValidateVariants - Reference allele is too long (132) at position 1:62198968; skipping that record.
    INFO 13:34:58,581 ValidateVariants - Reference allele is too long (132) at position 1:65200434; skipping that record.
    INFO 13:34:58,937 ValidateVariants - Reference allele is too long (139) at position 1:69230622; skipping that record.
    INFO 13:34:59,389 ValidateVariants - Reference allele is too long (123) at position 1:74755167; skipping that record.
    INFO 13:34:59,390 ValidateVariants - Reference allele is too long (122) at position 1:74755170; skipping that record.
    INFO 13:34:59,599 ValidateVariants - Reference allele is too long (123) at position 1:77112140; skipping that record.
    INFO 13:34:59,728 ValidateVariants - Reference allele is too long (117) at position 1:78527312; skipping that record.
    INFO 13:35:00,291 ValidateVariants - Reference allele is too long (224) at position 1:83484782; skipping that record.
    INFO 13:35:00,823 ValidateVariants - Reference allele is too long (162) at position 1:88285840; skipping that record.
    INFO 13:35:01,343 ValidateVariants - Reference allele is too long (106) at position 1:93262273; skipping that record.
    INFO 13:35:02,425 ValidateVariants - Reference allele is too long (155) at position 1:104451504; skipping that record.
    INFO 13:35:03,608 GATKRunReport - Uploaded run statistics report to AWS S3

    ERROR ------------------------------------------------------------------------------------------
    ERROR A USER ERROR has occurred (version 3.1-1-g07a4bf8):
    ERROR
    ERROR This means that one or more arguments or inputs in your command are incorrect.
    ERROR The error message below tells you what is the problem.
    ERROR
    ERROR If the problem is an invalid argument, please check the online documentation guide
    ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
    ERROR
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
    ERROR
    ERROR MESSAGE: File /home/shenyang/data/genome/dbsnp.vcf fails strict validation: the REF allele is incorrect for the record at position 1:105827682, fasta says TTTTC vs. VCF says TTTTG
  • shyangshyang USMember

    I deleted the row with position 1:105827682, and it still does not work. Any suggestions? Thank you!

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @shyang‌

    Hi,

    You need to make sure you are using the right reference and the vcf file that matches the reference. It looks like you are not doing this.

    Unfortunately, I cannot help you with this.

    -Sheila

  • shyangshyang USMember

    Thanks Sheila!

    Do you have links to the matched genome and vcf files? I know where to download the reference genome file from the broad institute ftp site, however, I could not find the match snp vcf file. Could you please give me a hint on which folder to look at? Thanks a lot!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @shyang, we are not responsible for managing external data, and Sheila made it clear that she can not help you. You will need to figure this out for yourself.

Sign In or Register to comment.