Attention:
The frontline support team will be slow on the forum because we are occupied with the GATK Workshop on March 21st and 22nd 2019. We will be back and more available to answer questions on the forum on March 25th 2019.

GATK v4.1.0.0 ValidateVariants, gVCF mode, error; non in v4.0.11.0

manolismanolis Member ✭✭
edited February 13 in Ask the GATK team

GATK v4.0.11.0 & v4.1.0.0, linux server, bash

Hi,

I was running the following codes

${GATK4} --java-options '-Xmx10g -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -XX:ConcGCThreads=1 -XX:ParallelGCThreads=2' HaplotypeCaller -R /shared/resources/hgRef/hg38/Homo_sapiens_assembly38.fasta -I /home/manolis/GATK4/2.BQSR/bqsr_PROVA/WES_16-1239_bqsr.bam -O "PROVA_${version}.g.vcf.gz" -L /home/manolis/GATK4/DB/hg38_SureSelectV6noUTR_S07604514_HC_1-22_XY.intervals -ip 100 -ERC GVCF --max-alternate-alleles 3 -ploidy 2 -A StrandBiasBySample --tmp-dir /home/manolis/GATK4/tmp/

${GATK4} --java-options '-Xmx10g -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -XX:ConcGCThreads=1 -XX:ParallelGCThreads=2' ValidateVariants -R /shared/resources/hgRef/hg38/Homo_sapiens_assembly38.fasta -V "PROVA_${version}.g.vcf.gz" -L/home/manolis/GATK4/DB/hg38_SureSelectV6noUTR_S07604514_HC_1-22_XY.intervals -ip 100 -gvcf -Xtype ALLELES --tmp-dir /home/manolis/GATK4/tmp/

and I created the following files:

HaplotypeCaller v4.0.11.0 -> output "PROVA_v40110.g.vcf.gz"
HaplotypeCaller v4.1.0.0 -> output "PROVA_v4100.g.vcf.gz"

When I'm going to validate them I have the following results:

1) ValidateVariants v4.0.11.0 -> input "PROVA_v40110.g.vcf.gz" ........ Everything OK !!!

2) ValidateVariants v4.0.11.0 -> input "PROVA_v4100.g.vcf.gz" ........ Everything OK !!!

3) ValidateVariants v4.1.0.0 -> input "PROVA_v4100.g.vcf.gz" ........ ERROR !!!

***********************************************************************
A USER ERROR has occurred: In a GVCF all records must ordered. Record: [VC Unknown @ chr2:41350-41765 Q. of type=SYMBOLIC alleles=[A*, <NON_REF>] attr={END=41765} filters= covers a position previously traversed.
***********************************************************************

4) ValidateVariants v4.1.0.0 -> input "PROVA_v40110.g.vcf.gz" ........ ERROR !!!

***********************************************************************
A USER ERROR has occurred: In a GVCF all records must ordered. Record: [VC Unknown @ chr2:41350-41765 Q. of type=SYMBOLIC alleles=[A*, <NON_REF>] attr={END=41765} filters= covers a position previously traversed.
***********************************************************************

If I create a vcf.gz file with HaplotypeCaller v4.1.0.0 (standard mode, NO gVCF ) and I'm going to validate it with ValidateVariants v4.1.0.0 I do not have any error!

For now... can I Validate my g.vcf.gz files generated from HC v4.1.0.0 with ValidateVariants of the v4.0.11.0?

Thanks

Best Answer

Answers

  • kensdkensd TaiwanMember
    Hi @manolis ,

    I had the same issue about ValidateVariants error message.

    HaplotypeCaller v4.1.0.0 => output: WGS_output.g.vcf.gz

    1. Using latest GATK (v4.1.0.0)
    ValidateVariants v4.1.0.0 => input: WGS_output.g.vcf.gz, ERROR!!

    ***********************************************************************

    A USER ERROR has occurred: In a GVCF all records must ordered. Record: [VC Unknown @ chr2:10001-28560 Q. of type=SYMBOLIC alleles=[C*, ] attr={END=28560} filters= covers a position previously traversed.

    ***********************************************************************

    2. Using GATK v4.0.12.0
    ValidateVariants v4.0.12.0 => input: WGS_output.g.vcf.gz, PASS!!

    How did this error message occur? Do you find a way to solve this issue or just use the previous GATK version to validate variants?

    Thanks.
  • manolismanolis Member ✭✭
    edited March 4

    Hi @kensd

    I did not have time to solve it ... I'm still using GATKv4.0.11.0 with ValidateVariants (this version just because it was the last installed before v4.1).

    I do not know if with Picard "SortVcf" we can also sort a g.vcf file, I have to try, and then to validate the new sorted g.vcf with ValidateVariants ... Check it if you want..

    I saw that your problems start with chr2, as in my case.

    Best!

  • kensdkensd TaiwanMember

    Hi @manolis ,

    I've tried Picard SortVcf to sort HaplotypeCaller VCF and then validate the GVCF with ValidateVariants (GATK 4.1.0.0). The same error message occurred again.

    I add "-DGATK_STACKTRACE_ON_USER_EXCEPTION=true" to print the stack trace.
    Here is my command.
    #!/bin/bash -euo pipefail
    gatk --java-options "-Xms3000m -DGATK_STACKTRACE_ON_USER_EXCEPTION=true" \
    ValidateVariants \
    -V NGS1_20170601C.ordered.g.vcf.gz \
    -R Homo_sapiens_assembly38.fasta \
    -L wgs_calling_regions.hg38.interval_list \
    -gvcf \
    --validation-type-to-exclude ALLELES \
    --dbsnp Homo_sapiens_assembly38.dbsnp138.vcf

    ------------Error message----------------------
    00:37:17.696 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/opt/conda/envs/gatkbp-1.0.1/share/gatk4-4.1.0.0-0/gatk-package-4.1.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    00:37:19.874 INFO ValidateVariants - ------------------------------------------------------------
    00:37:19.875 INFO ValidateVariants - The Genome Analysis Toolkit (GATK) v4.1.0.0
    00:37:19.875 INFO ValidateVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
    00:37:19.876 INFO ValidateVariants - Executing as [email protected] on Linux v3.10.0-957.5.1.el7.x86_64 amd64
    00:37:19.876 INFO ValidateVariants - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_152-release-1056-b12
    00:37:19.876 INFO ValidateVariants - Start Date/Time: March 5, 2019 12:37:17 AM UTC
    00:37:19.877 INFO ValidateVariants - ------------------------------------------------------------
    00:37:19.877 INFO ValidateVariants - ------------------------------------------------------------
    00:37:19.877 INFO ValidateVariants - HTSJDK Version: 2.18.2
    00:37:19.878 INFO ValidateVariants - Picard Version: 2.18.25
    00:37:19.878 INFO ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    00:37:19.878 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    00:37:19.878 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    00:37:19.878 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    00:37:19.878 INFO ValidateVariants - Deflater: IntelDeflater
    00:37:19.879 INFO ValidateVariants - Inflater: IntelInflater
    00:37:19.879 INFO ValidateVariants - GCS max retries/reopens: 20
    00:37:19.879 INFO ValidateVariants - Requester pays: disabled
    00:37:19.879 INFO ValidateVariants - Initializing engine
    00:37:21.344 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/NFS/EC2480U-P/backup/EC2480U/rhome/ken/nf/GATKBP-TEST/GATKBP/work/ca/2ae550f2963d178994761ecb8d5441/Homo_sapiens_assembly38.dbsnp138.vcf
    00:37:21.863 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/NFS/EC2480U-P/backup/EC2480U/rhome/ken/nf/GATKBP-TEST/GATKBP/work/ca/2ae550f2963d178994761ecb8d5441/NGS1_20170601C.ordered.g.vcf.gz
    00:37:22.691 INFO IntervalArgumentCollection - Processing 2923745463 bp from intervals
    00:37:22.888 INFO ValidateVariants - Done initializing engine
    00:37:22.930 INFO ProgressMeter - Starting traversal
    00:37:22.931 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
    00:37:33.920 INFO ProgressMeter - chr1:26437835 0.2 6000 32766.0
    00:37:43.951 INFO ProgressMeter - chr1:54810708 0.4 12000 34253.1
    00:37:55.522 INFO ProgressMeter - chr1:89395464 0.5 18000 33138.0
    00:38:06.515 INFO ProgressMeter - chr1:121878963 0.7 24000 33040.4
    00:38:17.590 INFO ProgressMeter - chr1:175481979 0.9 32000 35126.9
    00:38:29.219 INFO ProgressMeter - chr1:208333335 1.1 38000 34397.4
    00:38:40.852 INFO ProgressMeter - chr1:242060617 1.3 45000 34653.6
    00:38:42.760 INFO ValidateVariants - Shutting down engine
    [March 5, 2019 12:38:42 AM UTC] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 1.42 minutes.
    Runtime.totalMemory()=8540127232


    A USER ERROR has occurred: In a GVCF all records must ordered. Record: [VC Unknown @ chr2:10001-58024 Q. of type=SYMBOLIC alleles=[C*, ] attr={END=58024} filters= covers a position previously traversed.


    org.broadinstitute.hellbender.exceptions.UserException: In a GVCF all records must ordered. Record: [VC Unknown @ chr2:10001-58024 Q. of type=SYMBOLIC alleles=[C*, ] attr={END=58024} filters= covers a position previously traversed.
    at org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants.apply(ValidateVariants.java:234)
    at org.broadinstitute.hellbender.engine.VariantWalker.lambda$traverse$0(VariantWalker.java:101)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
    at org.broadinstitute.hellbender.engine.VariantWalker.traverse(VariantWalker.java:99)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:966)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
    at org.broadinstitute.hellbender.Main.main(Main.java:291)
    Using GATK jar /opt/conda/envs/gatkbp-1.0.1/share/gatk4-4.1.0.0-0/gatk-package-4.1.0.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xms3000m -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar /opt/conda/envs/gatkbp-1.0.1/share/gatk4-4.1.0.0-0/gatk-package-4.1.0.0-local.jar ValidateVariants -V NGS1_20170601C.ordered.g.vcf.gz -R Homo_sapiens_assembly38.fasta -L wgs_calling_regions.hg38.interval_list -gvcf --validation-type-to-exclude ALLELES --dbsnp Homo_sapiens_assembly38.dbsnp138.vcf

  • manolismanolis Member ✭✭
    edited March 5

    I think that this is the solution, post, bcftools

    bcftools norm -m +any --do-not-normalize "HaplotypeCaller.g.vcf.gz" -Oz -o "HaplotypeCaller_norm.g.vcf.gz"

    tabix -p vcf HaplotypeCaller_norm.g.vcf.gz

    I have no error in the validation step.

    11:03:41.718 INFO  ProgressMeter -       chr1:161157651              0.9                317000         372036.6
    11:03:51.800 INFO  ProgressMeter -       chr1:229305089              1.0                434000         425441.5
    11:04:01.966 INFO  ProgressMeter -        chr2:74523522              1.2                585000         491782.6
    11:04:11.987 INFO  ProgressMeter -       chr2:167250049              1.4                704000         518963.5
    

    It was an old issue...

    @AdelaideR can you confirm that?

    Best

    Post edited by manolis on
Sign In or Register to comment.