Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

GATK v4.1.0.0 ValidateVariants, gVCF mode, error; non in v4.0.11.0

manolismanolis Member ✭✭
edited February 13 in Ask the GATK team

GATK v4.0.11.0 & v4.1.0.0, linux server, bash

Hi,

I was running the following codes

${GATK4} --java-options '-Xmx10g -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -XX:ConcGCThreads=1 -XX:ParallelGCThreads=2' HaplotypeCaller -R /shared/resources/hgRef/hg38/Homo_sapiens_assembly38.fasta -I /home/manolis/GATK4/2.BQSR/bqsr_PROVA/WES_16-1239_bqsr.bam -O "PROVA_${version}.g.vcf.gz" -L /home/manolis/GATK4/DB/hg38_SureSelectV6noUTR_S07604514_HC_1-22_XY.intervals -ip 100 -ERC GVCF --max-alternate-alleles 3 -ploidy 2 -A StrandBiasBySample --tmp-dir /home/manolis/GATK4/tmp/

${GATK4} --java-options '-Xmx10g -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -XX:ConcGCThreads=1 -XX:ParallelGCThreads=2' ValidateVariants -R /shared/resources/hgRef/hg38/Homo_sapiens_assembly38.fasta -V "PROVA_${version}.g.vcf.gz" -L/home/manolis/GATK4/DB/hg38_SureSelectV6noUTR_S07604514_HC_1-22_XY.intervals -ip 100 -gvcf -Xtype ALLELES --tmp-dir /home/manolis/GATK4/tmp/

and I created the following files:

HaplotypeCaller v4.0.11.0 -> output "PROVA_v40110.g.vcf.gz"
HaplotypeCaller v4.1.0.0 -> output "PROVA_v4100.g.vcf.gz"

When I'm going to validate them I have the following results:

1) ValidateVariants v4.0.11.0 -> input "PROVA_v40110.g.vcf.gz" ........ Everything OK !!!

2) ValidateVariants v4.0.11.0 -> input "PROVA_v4100.g.vcf.gz" ........ Everything OK !!!

3) ValidateVariants v4.1.0.0 -> input "PROVA_v4100.g.vcf.gz" ........ ERROR !!!

***********************************************************************
A USER ERROR has occurred: In a GVCF all records must ordered. Record: [VC Unknown @ chr2:41350-41765 Q. of type=SYMBOLIC alleles=[A*, <NON_REF>] attr={END=41765} filters= covers a position previously traversed.
***********************************************************************

4) ValidateVariants v4.1.0.0 -> input "PROVA_v40110.g.vcf.gz" ........ ERROR !!!

***********************************************************************
A USER ERROR has occurred: In a GVCF all records must ordered. Record: [VC Unknown @ chr2:41350-41765 Q. of type=SYMBOLIC alleles=[A*, <NON_REF>] attr={END=41765} filters= covers a position previously traversed.
***********************************************************************

If I create a vcf.gz file with HaplotypeCaller v4.1.0.0 (standard mode, NO gVCF ) and I'm going to validate it with ValidateVariants v4.1.0.0 I do not have any error!

For now... can I Validate my g.vcf.gz files generated from HC v4.1.0.0 with ValidateVariants of the v4.0.11.0?

Thanks

Best Answer

Answers

  • kensdkensd TaiwanMember
    Hi @manolis ,

    I had the same issue about ValidateVariants error message.

    HaplotypeCaller v4.1.0.0 => output: WGS_output.g.vcf.gz

    1. Using latest GATK (v4.1.0.0)
    ValidateVariants v4.1.0.0 => input: WGS_output.g.vcf.gz, ERROR!!

    ***********************************************************************

    A USER ERROR has occurred: In a GVCF all records must ordered. Record: [VC Unknown @ chr2:10001-28560 Q. of type=SYMBOLIC alleles=[C*, <NON_REF>] attr={END=28560} filters= covers a position previously traversed.

    ***********************************************************************

    2. Using GATK v4.0.12.0
    ValidateVariants v4.0.12.0 => input: WGS_output.g.vcf.gz, PASS!!

    How did this error message occur? Do you find a way to solve this issue or just use the previous GATK version to validate variants?

    Thanks.
  • manolismanolis Member ✭✭
    edited March 4

    Hi @kensd

    I did not have time to solve it ... I'm still using GATKv4.0.11.0 with ValidateVariants (this version just because it was the last installed before v4.1).

    I do not know if with Picard "SortVcf" we can also sort a g.vcf file, I have to try, and then to validate the new sorted g.vcf with ValidateVariants ... Check it if you want..

    I saw that your problems start with chr2, as in my case.

    Best!

  • kensdkensd TaiwanMember

    Hi @manolis ,

    I've tried Picard SortVcf to sort HaplotypeCaller VCF and then validate the GVCF with ValidateVariants (GATK 4.1.0.0). The same error message occurred again.

    I add "-DGATK_STACKTRACE_ON_USER_EXCEPTION=true" to print the stack trace.
    Here is my command.
    #!/bin/bash -euo pipefail
    gatk --java-options "-Xms3000m -DGATK_STACKTRACE_ON_USER_EXCEPTION=true" \
    ValidateVariants \
    -V NGS1_20170601C.ordered.g.vcf.gz \
    -R Homo_sapiens_assembly38.fasta \
    -L wgs_calling_regions.hg38.interval_list \
    -gvcf \
    --validation-type-to-exclude ALLELES \
    --dbsnp Homo_sapiens_assembly38.dbsnp138.vcf

    ------------Error message----------------------
    00:37:17.696 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/opt/conda/envs/gatkbp-1.0.1/share/gatk4-4.1.0.0-0/gatk-package-4.1.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    00:37:19.874 INFO ValidateVariants - ------------------------------------------------------------
    00:37:19.875 INFO ValidateVariants - The Genome Analysis Toolkit (GATK) v4.1.0.0
    00:37:19.875 INFO ValidateVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
    00:37:19.876 INFO ValidateVariants - Executing as [email protected] on Linux v3.10.0-957.5.1.el7.x86_64 amd64
    00:37:19.876 INFO ValidateVariants - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_152-release-1056-b12
    00:37:19.876 INFO ValidateVariants - Start Date/Time: March 5, 2019 12:37:17 AM UTC
    00:37:19.877 INFO ValidateVariants - ------------------------------------------------------------
    00:37:19.877 INFO ValidateVariants - ------------------------------------------------------------
    00:37:19.877 INFO ValidateVariants - HTSJDK Version: 2.18.2
    00:37:19.878 INFO ValidateVariants - Picard Version: 2.18.25
    00:37:19.878 INFO ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    00:37:19.878 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    00:37:19.878 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    00:37:19.878 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    00:37:19.878 INFO ValidateVariants - Deflater: IntelDeflater
    00:37:19.879 INFO ValidateVariants - Inflater: IntelInflater
    00:37:19.879 INFO ValidateVariants - GCS max retries/reopens: 20
    00:37:19.879 INFO ValidateVariants - Requester pays: disabled
    00:37:19.879 INFO ValidateVariants - Initializing engine
    00:37:21.344 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/NFS/EC2480U-P/backup/EC2480U/rhome/ken/nf/GATKBP-TEST/GATKBP/work/ca/2ae550f2963d178994761ecb8d5441/Homo_sapiens_assembly38.dbsnp138.vcf
    00:37:21.863 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/NFS/EC2480U-P/backup/EC2480U/rhome/ken/nf/GATKBP-TEST/GATKBP/work/ca/2ae550f2963d178994761ecb8d5441/NGS1_20170601C.ordered.g.vcf.gz
    00:37:22.691 INFO IntervalArgumentCollection - Processing 2923745463 bp from intervals
    00:37:22.888 INFO ValidateVariants - Done initializing engine
    00:37:22.930 INFO ProgressMeter - Starting traversal
    00:37:22.931 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
    00:37:33.920 INFO ProgressMeter - chr1:26437835 0.2 6000 32766.0
    00:37:43.951 INFO ProgressMeter - chr1:54810708 0.4 12000 34253.1
    00:37:55.522 INFO ProgressMeter - chr1:89395464 0.5 18000 33138.0
    00:38:06.515 INFO ProgressMeter - chr1:121878963 0.7 24000 33040.4
    00:38:17.590 INFO ProgressMeter - chr1:175481979 0.9 32000 35126.9
    00:38:29.219 INFO ProgressMeter - chr1:208333335 1.1 38000 34397.4
    00:38:40.852 INFO ProgressMeter - chr1:242060617 1.3 45000 34653.6
    00:38:42.760 INFO ValidateVariants - Shutting down engine
    [March 5, 2019 12:38:42 AM UTC] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 1.42 minutes.
    Runtime.totalMemory()=8540127232


    A USER ERROR has occurred: In a GVCF all records must ordered. Record: [VC Unknown @ chr2:10001-58024 Q. of type=SYMBOLIC alleles=[C*, ] attr={END=58024} filters= covers a position previously traversed.


    org.broadinstitute.hellbender.exceptions.UserException: In a GVCF all records must ordered. Record: [VC Unknown @ chr2:10001-58024 Q. of type=SYMBOLIC alleles=[C*, ] attr={END=58024} filters= covers a position previously traversed.
    at org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants.apply(ValidateVariants.java:234)
    at org.broadinstitute.hellbender.engine.VariantWalker.lambda$traverse$0(VariantWalker.java:101)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
    at org.broadinstitute.hellbender.engine.VariantWalker.traverse(VariantWalker.java:99)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:966)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
    at org.broadinstitute.hellbender.Main.main(Main.java:291)
    Using GATK jar /opt/conda/envs/gatkbp-1.0.1/share/gatk4-4.1.0.0-0/gatk-package-4.1.0.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xms3000m -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar /opt/conda/envs/gatkbp-1.0.1/share/gatk4-4.1.0.0-0/gatk-package-4.1.0.0-local.jar ValidateVariants -V NGS1_20170601C.ordered.g.vcf.gz -R Homo_sapiens_assembly38.fasta -L wgs_calling_regions.hg38.interval_list -gvcf --validation-type-to-exclude ALLELES --dbsnp Homo_sapiens_assembly38.dbsnp138.vcf

  • manolismanolis Member ✭✭
    edited March 5

    I think that this is the solution, post, bcftools

    bcftools norm -m +any --do-not-normalize "HaplotypeCaller.g.vcf.gz" -Oz -o "HaplotypeCaller_norm.g.vcf.gz"

    tabix -p vcf HaplotypeCaller_norm.g.vcf.gz

    I have no error in the validation step.

    11:03:41.718 INFO  ProgressMeter -       chr1:161157651              0.9                317000         372036.6
    11:03:51.800 INFO  ProgressMeter -       chr1:229305089              1.0                434000         425441.5
    11:04:01.966 INFO  ProgressMeter -        chr2:74523522              1.2                585000         491782.6
    11:04:11.987 INFO  ProgressMeter -       chr2:167250049              1.4                704000         518963.5
    

    It was an old issue...

    @AdelaideR can you confirm that?

    Best

    Post edited by manolis on
  • ABoursABours Member

    Hi all,

    I just ran into a similar problem running ValidateVariants (GATKv4.1.2.0) on g.vcf files. all (~150) of my g.vcfs throw this error as soon as they reach super-scaffold 2. Running the same program without the "-gvcf" specified I do not encounter these errors.

    To summarize, @AdelaideR, is this the expected behaviour of GATK 4.1.2.0 or not?

    Best

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @ABours

    Could you please post the exact commands you are using that cause the error and the one that does not. Also please post the entire error log.

  • ABoursABours Member

    Hi @bhanuGandham,

    Thanks for replying. I'll post the commands and error down below, must give a heads up that I'm using the program here in a for-loop. This way I only needed to provide one script to do the same task 150 times.

    this causes the error:

    java -Xms32G -Xmx32G -jar /data/biosoftware/GATK/gatk-4.1.2.0/gatk-package-4.1.2.0-local.jar ValidateVariants -R ~/reference/reference.fasta -V $i -gvcf
    

    And it causes the following error for all my files:

    ***********************************************************************
    
    A USER ERROR has occurred: In a GVCF all records must ordered. Record: [VC Unknown @ Super-Scaffold_2:1-4 Q. of type=SYMBOLIC alleles=[G*, <NON_REF>] attr={END=4} filters= covers a position previously traversed.
    
    ***********************************************************************
    

    This doesn't cause the error:

    java -Xms32G -Xmx32G -jar /data/biosoftware/GATK/gatk-4.1.2.0/gatk-package-4.1.2.0-local.jar ValidateVariants -R ~/reference/reference.fasta -V $i
    

    Best,

    Issue · Github
    by bhanuGandham

    Issue Number
    6023
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    michaelgatzen
  • ABoursABours Member

    Hi @bhanuGandham,

    Up-date:
    I just realized that the g.vcf files that I'm running are made in gatk-v4.0.11.0. Therefore I just started running them with the 4.0.11.0 version and without any issue it passed through the first file.

    However, I do wonder why the ValidateVariants of version 4.1.2.0 isn't able to properly recognize g.vcf's made with 4.0.11.0

    Best,

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @ABours

    That is weird. Let me look into why that is. Can you please post the command you used to created the gvcf.

  • ABoursABours Member

    Hi @bhanuGandham,

    This is the HaplotypeCaller command that I used, I do now realize that it was version 4.0.12.0. Of course, still the same issue :/

    java -Xms32G -Xmx32G -jar /data/biosoftware/GATK/gatk-4.0.12.0/gatk-package-4.0.12.0-local.jar HaplotypeCaller -R ${ref} -I input.bam -ERC GVCF -O output.g.vcf
    

    Thanks again,

  • ABoursABours Member

    P.S. I should say that I'm running the validate variants of 4.0.11.0 without issues

    (I've been using to many different versions, haha)

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @ABours

    I will check with the dev team and get back to you.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @ABours

    Would you please post this variant [G*, ] record and the variant record just before this variant. Also which tool created this GVCF?

  • ABoursABours Member

    Hi,

    The GVCF is created with HaplotypeCaller of GATK version 4.0.12.0 (see command from above)

    The variant is (the start of the Super-Scaffold_2):

    Super-Scaffold_2        1       .       G       <NON_REF>       .       .       END=4   GT:DP:GQ:MIN_DP:PL      0/0:31:93:31:0,93,1141
    

    And just before the variant is (the end of Super-Scaffold_1):

    Super-Scaffold_1        9238114 .       T       <NON_REF>       .       .       END=9238123     GT:DP:GQ:MIN_DP:PL      0/0:12:0:11:0,0,0
    

    Thanks again,

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited June 24

    Hi @ABours

    Looks like the sequence dictionary is out of order i.e. in the sequence dictionary Super-Scaffold_2 probably is before Super-Scaffold_1. To confirm please post the entire vcf header.

    PS: Checkout Terra for end-to-end GATK pipelining solutions and let us know what more pipelines we can add that will make using GATK easier for you! For more details on whether this is the right fit for you checkout our blog page.

    Post edited by bhanuGandham on
  • ABoursABours Member

    Hi,

    Attached I provide the header as a txt. However, I don't see any mix up with the Super-Scaffolds,
    Best,
    Andrea

  • z007z007 Member
    Hi,

    I have exactly the same problem.

    I was running the gatk4-exome-analysis-pipeline on a local linux (Fedora 26 x86_64) server and encountered the same problem at the call-ValidateGVCF step (also at the beginning of chr2). The stderr showed:

    <code>
    ... ...
    0:23:15.881 INFO ProgressMeter - chr1:211365359 0.7 568000 841523.0
    20:23:20.668 INFO ValidateVariants - Shutting down engine
    [June 30, 2019 8:23:20 PM EDT] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 0.83 minutes.
    Runtime.totalMemory()=8436318208
    ***********************************************************************

    A USER ERROR has occurred: In a GVCF all records must ordered. Record: [VC Unknown @ chr2:38664-41335 Q. of type=SYMBOLIC alleles=[C*, <NON_REF>] attr={END=41335} filters= covers a position previously
    traversed.

    ***********************************************************************
    </code>

    The error can be avoided by disabling the -gvcf argument.

    I am wondering whether the problem has been fixed or not.

    Thanks,

    Ge
  • z007z007 Member
    I replicated the error by a standalone run and I posted full stdout in case it might help to solve the problem.

    ```
    Using GATK jar /usr/local/Biosoft/gatk/gatk-package-4.1.2.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -Xms3000m -jar /usr/local/Biosoft/gatk/gatk-package-4.1.2.0-local.jar ValidateVariants -V /home/z007/test/test.g.vcf.gz -R /data1/broad-references/hg38/v0/Homo_sapiens_assembly38.fasta -L /data1/broad-references/hg38/v0/exome_calling_regions.v1.interval_list -gvcf --validation-type-to-exclude ALLELES --dbsnp /data1/broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf
    23:26:36.748 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usr/local/Biosoft/gatk/gatk-package-4.1.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Jun 30, 2019 11:26:38 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    23:26:38.544 INFO ValidateVariants - ------------------------------------------------------------
    23:26:38.544 INFO ValidateVariants - The Genome Analysis Toolkit (GATK) v4.1.2.0
    23:26:38.544 INFO ValidateVariants - For support and documentation go to
    23:26:38.544 INFO ValidateVariants - Executing as [email protected] on Linux v4.16.11-100.fc26.x86_64 amd64
    23:26:38.545 INFO ValidateVariants - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_171-b10
    23:26:38.545 INFO ValidateVariants - Start Date/Time: June 30, 2019 11:26:36 PM EDT
    23:26:38.545 INFO ValidateVariants - ------------------------------------------------------------
    23:26:38.545 INFO ValidateVariants - ------------------------------------------------------------
    23:26:38.546 INFO ValidateVariants - HTSJDK Version: 2.19.0
    23:26:38.546 INFO ValidateVariants - Picard Version: 2.19.0
    23:26:38.546 INFO ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    23:26:38.546 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    23:26:38.546 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    23:26:38.546 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    23:26:38.546 INFO ValidateVariants - Deflater: IntelDeflater
    23:26:38.546 INFO ValidateVariants - Inflater: IntelInflater
    23:26:38.546 INFO ValidateVariants - GCS max retries/reopens: 20
    23:26:38.546 INFO ValidateVariants - Requester pays: disabled
    23:26:38.546 INFO ValidateVariants - Initializing engine
    23:26:39.008 INFO FeatureManager - Using codec VCFCodec to read file file:///data1/broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf
    23:26:39.158 INFO FeatureManager - Using codec VCFCodec to read file file:///home/z007/test/test.g.vcf.gz
    23:26:39.995 INFO IntervalArgumentCollection - Processing 220960836 bp from intervals
    23:26:40.081 INFO ValidateVariants - Done initializing engine
    23:26:40.098 INFO ProgressMeter - Starting traversal
    23:26:40.099 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
    23:26:50.192 INFO ProgressMeter - chr1:77417515 0.2 257000 1528245.8
    23:27:00.307 INFO ProgressMeter - chr1:192160270 0.3 513000 1523159.1
    23:27:05.667 INFO ValidateVariants - Shutting down engine
    [June 30, 2019 11:27:05 PM EDT] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 0.48 minutes.
    Runtime.totalMemory()=8401715200
    ***********************************************************************

    A USER ERROR has occurred: In a GVCF all records must ordered. Record: [VC Unknown @ chr2:38664-41335 Q. of type=SYMBOLIC alleles=[C*, <NON_REF>] attr={END=41335} filters= covers a position previously traversed.

    ***********************************************************************
    org.broadinstitute.hellbender.exceptions.UserException: In a GVCF all records must ordered. Record: [VC Unknown @ chr2:38664-41335 Q. of type=SYMBOLIC alleles=[C*, <NON_REF>] attr={END=41335} filters= covers a position previously traversed.
    at org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants.apply(ValidateVariants.java:234)
    at org.broadinstitute.hellbender.engine.VariantWalker.lambda$traverse$0(VariantWalker.java:106)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
    at org.broadinstitute.hellbender.engine.VariantWalker.traverse(VariantWalker.java:104)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1039)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
    at org.broadinstitute.hellbender.Main.main(Main.java:291)
    ```

    Thanks,

    Ge
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @ABours and @z007

    Looks like this is a bug in ValidateVariants. I have created an issue ticket for the dev team and we are looking into it. You can follow the progress issue on this here: https://github.com/broadinstitute/gatk/issues/6023

  • Polar_bearPolar_bear FreedomMember

    @bhanuGandham said:
    Hi @ABours and @z007

    Looks like this is a bug in ValidateVariants. I have created an issue ticket for the dev team and we are looking into it. You can follow the progress issue on this here: https://github.com/broadinstitute/gatk/issues/6023

    I am using five dollar analysis pipeline, which occurs error in ValidateVariants, and I think that ERROR like below is most likely a BUG in version 4.1.2.0

    A USER ERROR has occurred: In a GVCF all records must ordered. Record: [VC Unknown @ 2:25457139-25457364 Q. of type=SYMBOLIC alleles=[C*, ] attr={END=25457364} filters= covers a position previously traversed

    (PS: this site is end of chr1, and start of chr2 in gvcf)

  • bshifawbshifaw Member, Broadie, Moderator admin

    Thanks Polar_Bear,

    Looks like the fix for this bug has been merged to the master branch. Hopefully you won't see this error in the next release. https://github.com/broadinstitute/gatk/pull/6028

  • richtegerichtege GermanyMember
    edited November 27
    Dear GATK support team,

    I am experiencing the same error like ABours and Polar_bear, using GATK 4.1.2.0 in a conda environment. My input gVCFs were generated with HaplotypeCaller following the germline short variant discovery workflow for whole exome sequencing data.

    Here are the details (I’m sorry, somehow my markdown formatting didn’t work out here):

    Command:
    gatk ValidateVariants -R GCA_000001405.15_GRCh38_no_alt_analysis_set.fna -V myinput.g.vcf.gz --dbsnp hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf -gvcf

    Error:
    ***********************************************************************
    A USER ERROR has occurred: In a GVCF all records must ordered. Record: [VC Unknown @ chr2:41508-41510 Q. of type=SYMBOLIC alleles=[T*, <NON_REF>] attr={END=41510} filters= covers a position previously traversed.
    ***********************************************************************

    Not sure if this is relevant, but I used the -L option throughout all steps recommended here: https://gatkforums.broadinstitute.org/gatk/discussion/4133/when-should-i-use-l-to-pass-in-a-list-of-intervals, using an intersected and hg38-lifted interval list specific to my own data and some 1000Genomes data that I added to my pipeline, including a padding interval of 100bp.

    The error occurs for all the samples. I had a look at the respective position in a gVCF and everything seemed fine there. Is it possible that the bug still exists or could I be doing something wrong?

    Best,
    Gesa
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @richtege

    I believe that bug fix went in after the 4.1.2.0 version you are using. Try the latest GATK v4.1.4.0 and let us know if the issue persists.

Sign In or Register to comment.