Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

GATK 4.1.3.0 GenotypeGVCFs -> ValidateVariants Error

jojojojo GermanyMember
Hello everybody,
i'm following the best practice pipeline from GATK and i bumped into a problem which i couldn't find anywhere in the forum (yet).
When i do the HaplotypeCaller (in ERC GVCF mode) -> CombineGVCF -> GenotypeGVCF steps for my WGS/WES data and run ValidateVariants on the genotype.g.vcf from the GenotypeGVCF step, i get the following error message

```
A USER ERROR has occurred: In a GVCF all records must contain a <NON_REF> allele. Offending record: [VC Unknown @ 1:10150 Q51.70 of type=SNP alleles=[C*, T] attr={AC=1, AF=0.050, AN=20, BaseQRankSum=-2.499e+00, DP=201, ExcessHet=3.0103, FS=2.515, InbreedingCoeff=-0.3456, MLEAC=2, MLEAF=0.100, MQ=38.92, MQRankSum=2.31, QD=3.45, ReadPosRankSum=-4.580e-01, SOR=0.180} filters=
```

Checking the corresponding genotype.g.vcf file confirms the error message; only 1 <NON_REF> occurs, the rest are alternative bases. However, this problem is not faced at the .g.vcf from the HaplotypeCaller & CombineGVCF step.


My general code looks like this

```
java -Xmx8G -jar gatk-package-4.1.3.0-local.jar GenotypeGVCFs \
-R "$PWD/RefGenom/human_g1k_v37.fasta" \
-V $arg_dir \ #input the .g.vcf from the CombineGVCF step
-O "$PWD/GenotypeGVCFs/GenotypeGVCFs_${name}/joint_${SampleID}.g.vcf.gz" \
```

So any idea what's wrong? Many thanks in advance :smile:

Cheers
Jojo

Answers

  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Administrator, Broadie, Moderator admin

    Hi @jojo ,
    The output of GenotypeGVCF should be a vcf, not a gvcf, in which case we would expect the <NON_REF> blocks to be absent. Try re-running GenotypeGVCFS with the output as a vcf & run ValidateVariants again. I hope this helps!

  • jojojojo GermanyMember
    Hi @Tiffany_at_Broad ,
    thanks for your quick response :)

    I just did as you suggested but the problem remains the same; no <NON_REF>'s given, only the same alternative bases as previously.

    My HaplotypeCaller & CombineGVCF command looks like this:

    ```
    java -Xmx8G -jar gatk-package-4.1.3.0-local.jar HaplotypeCaller \
    -R $PWD/RefGenom/human_g1k_v37.fasta \
    -I $arg_dir \
    -O $PWD/HaplotypeCaller/HaplotypeCaller_$2/raw_variants_${SampleID}.g.vcf.gz \
    -ERC GVCF \
    --dbsnp $PWD/GoldData/dbsnp_138.b37.vcf \

    java -Xmx8G -jar gatk-package-4.1.3.0-local.jar CombineGVCFs \
    -R "$PWD/RefGenom/human_g1k_v37.fasta" \
    -O "$PWD/CombineGVCFs/CombineGVCFs_${name}/raw_variants_${f}.g.vcf.gz" \
    $(echo "$samples")\ # filling in the -V flag
    ```

    Might there be problem in one of these steps? Many thanks in advance!

    Cheers
    Jojo
  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Administrator, Broadie, Moderator admin
    edited October 18

    Can you share the full stack trace? When you say the "problem remains the same, no 's given, only the same alternative bases" are you talking about within your input g.vcfs used in the CombineGVCFs step or in the output vcf from GenotypeGVCFS step? Also, you may want to use the gatk launch script, as this article explains.

    Post edited by Tiffany_at_Broad on
  • jojojojo GermanyMember
    Hi Tiffany,

    "When you say the "problem remains the same, no 's given, only the same alternative bases" are you talking about within your input g.vcfs used in the CombineGVCFs step or in the output vcf from GenotypeGVCFS step?"

    Sorry for the ambigutiy. Yes, I'm talking about the Genotype step, the Combine step is fine. Please, see the full stack trace

    ```
    08:24:02.838 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/lustre/miifs03/scratch/m2_jgu-isibela/gatk-package-4.1.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Oct 09, 2019 8:24:04 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    08:24:04.852 INFO ValidateVariants - ------------------------------------------------------------
    08:24:04.853 INFO ValidateVariants - The Genome Analysis Toolkit (GATK) v4.1.3.0
    08:24:04.853 INFO ValidateVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
    08:24:04.857 INFO ValidateVariants - Executing as [email protected] on Linux v3.10.0-957.5.1.el7.x86_64 amd64
    08:24:04.858 INFO ValidateVariants - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_162-b12
    08:24:04.858 INFO ValidateVariants - Start Date/Time: October 9, 2019 8:24:02 AM CEST
    08:24:04.858 INFO ValidateVariants - ------------------------------------------------------------
    08:24:04.858 INFO ValidateVariants - ------------------------------------------------------------
    08:24:04.859 INFO ValidateVariants - HTSJDK Version: 2.20.1
    08:24:04.859 INFO ValidateVariants - Picard Version: 2.20.5
    08:24:04.859 INFO ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    08:24:04.859 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    08:24:04.860 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    08:24:04.860 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    08:24:04.860 INFO ValidateVariants - Deflater: IntelDeflater
    08:24:04.860 INFO ValidateVariants - Inflater: IntelInflater
    08:24:04.860 INFO ValidateVariants - GCS max retries/reopens: 20
    08:24:04.861 INFO ValidateVariants - Requester pays: disabled
    08:24:04.861 INFO ValidateVariants - Initializing engine
    08:24:05.268 INFO FeatureManager - Using codec VCFCodec to read file file:///lustre/miifs03/scratch/m2_jgu-isibela/GoldData/dbsnp_138.b37.vcf
    08:24:05.519 INFO FeatureManager - Using codec VCFCodec to read file file:///lustre/miifs03/scratch/m2_jgu-isibela/GenotypeGVCFs/GenotypeGVCFs_111-120/joint_raw_variants_111_112_113_114_115_116_117_118_119_120.vcf.gz
    08:24:05.929 INFO ValidateVariants - Done initializing engine
    08:24:05.931 WARN ValidateVariants - GVCF format is currently incompatible with allele validation. Not validating Alleles.
    08:24:05.931 INFO ProgressMeter - Starting traversal
    08:24:05.932 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
    08:24:06.137 INFO ValidateVariants - Shutting down engine
    [October 9, 2019 8:24:06 AM CEST] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 0.06 minutes.
    Runtime.totalMemory()=2304245760
    ***********************************************************************

    A USER ERROR has occurred: In a GVCF all records must contain a <NON_REF> allele. Offending record: [VC Unknown @ 1:10150 Q51.70 of type=SNP alleles=[C*, T] attr={AC=1, AF=0.050, AN=20, BaseQRankSum=-2.499e+00, DP=201, ExcessHet=3.0103, FS=2.515, InbreedingCoeff=-0.3456, MLEAC=2, MLEAF=0.100, MQ=38.92, MQRankSum=2.31, QD=3.45, ReadPosRankSum=-4.580e-01, SOR=0.180} filters=

    ***********************************************************************
    org.broadinstitute.hellbender.exceptions.UserException: In a GVCF all records must contain a <NON_REF> allele. Offending record: [VC Unknown @ 1:10150 Q51.70 of type=SNP alleles=[C*, T] attr={AC=1, AF=0.050, AN=20, BaseQRankSum=-2.499e+00, DP=201, ExcessHet=3.0103, FS=2.515, InbreedingCoeff=-0.3456, MLEAC=2, MLEAF=0.100, MQ=38.92, MQRankSum=2.31, QD=3.45, ReadPosRankSum=-4.580e-01, SOR=0.180} filters=
    at org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants.validateGVCFVariant(ValidateVariants.java:364)
    at org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants.apply(ValidateVariants.java:258)
    at org.broadinstitute.hellbender.engine.VariantWalker.lambda$traverse$0(VariantWalker.java:104)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
    at org.broadinstitute.hellbender.engine.VariantWalker.traverse(VariantWalker.java:102)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
    at org.broadinstitute.hellbender.Main.main(Main.java:291)

    ```

    About the gatk launch script: I know about that, i'm just too lazy to adapt to it :)

    Cheers
    Jojo
  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Administrator, Broadie, Moderator admin

    Can you show me your commands for running ValidateVariants? I assume you checked your code with the tool docs. It looks like you are running ValidateVariants on a vcf, so I am confused by the GVCF error. Do you have the -gvcf flag on by accident?

  • jojojojo GermanyMember
    edited October 24
    Hi Tiffany,
    you're right, the -gvcf flag was still added! I removed it and now the error does not show up anymore! Man, my stupidity really gnaws at my ego. Anyway, many thanks for your help and to your deduction skills :)

    Cheers
    Jojo
  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Administrator, Broadie, Moderator admin

    @jojo LOL. no worries at all! Good luck with your work!

Sign In or Register to comment.