Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

SelectVariants - java.lang.IllegalStateException: Allele in genotype not in the variant context

mpmachadompmachado LisboaMember

Hi everyone,

I'm trying to select variants with SelectVariants but for some reason it stops saying that Allele in genotype CT* not in the variant context [CT*, C].
I tryied to find a CT* in the VCF file, but I couldn't and the ValidateVariants does not return anything wrong.
Can you help me trying to solve this problem?
I'm using GATK v4.1.2.0 (HTSJDK v2.18.2 and Picard v2.18.25) through the GATK Docker image broadinstitute/gatk:4.1.2.0.
Bellow you can find the commands used and I attached the input VCF file.

Thank you in advance.

Best regards,

Miguel


ValidateVariants
gatk --java-options "-XX:ParallelGCThreads=1" ValidateVariants --reference /mnt/gatk_select_variants-20190603_160115-5296_reference_0/reference.fasta --variant /mnt/gatk_select_variants-20190603_160115-5296_outdir/hard_filtered_merged.vcf.gz

Using GATK jar /gatk/gatk-package-4.1.0.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -XX:ParallelGCThreads=1 -jar /gatk/gatk-package-4.1.0.0-local.jar ValidateVariants --reference /mnt/gatk_select_variants-20190603_160115-5296_reference_0/reference.fasta --variant /mnt/gatk_select_variants-20190603_160115-5296_outdir/hard_filtered_merged.vcf.gz
09:07:11.936 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
09:07:19.201 INFO  ValidateVariants - ------------------------------------------------------------
09:07:19.202 INFO  ValidateVariants - The Genome Analysis Toolkit (GATK) v4.1.0.0
09:07:19.202 INFO  ValidateVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
09:07:19.203 INFO  ValidateVariants - Executing as [email protected] on Linux v2.6.32-696.23.1.el6.x86_64 amd64
09:07:19.203 INFO  ValidateVariants - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_191-8u191-b12-0ubuntu0.16.04.1-b12
09:07:19.204 INFO  ValidateVariants - Start Date/Time: June 4, 2019 9:07:11 AM UTC
09:07:19.204 INFO  ValidateVariants - ------------------------------------------------------------
09:07:19.204 INFO  ValidateVariants - ------------------------------------------------------------
09:07:19.205 INFO  ValidateVariants - HTSJDK Version: 2.18.2
09:07:19.205 INFO  ValidateVariants - Picard Version: 2.18.25
09:07:19.205 INFO  ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
09:07:19.206 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
09:07:19.206 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
09:07:19.206 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
09:07:19.206 INFO  ValidateVariants - Deflater: IntelDeflater
09:07:19.206 INFO  ValidateVariants - Inflater: IntelInflater
09:07:19.206 INFO  ValidateVariants - GCS max retries/reopens: 20
09:07:19.206 INFO  ValidateVariants - Requester pays: disabled
09:07:19.206 INFO  ValidateVariants - Initializing engine
09:07:19.864 INFO  FeatureManager - Using codec VCFCodec to read file file:///mnt/gatk_select_variants-20190603_160115-5296_outdir/hard_filtered_merged.vcf.gz
09:07:20.018 INFO  ValidateVariants - Done initializing engine
09:07:20.019 INFO  ProgressMeter - Starting traversal
09:07:20.020 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute
09:07:20.505 INFO  ProgressMeter -             unmapped              0.0                   144          17851.2
09:07:20.506 INFO  ProgressMeter - Traversal complete. Processed 144 total variants in 0.0 minutes.
09:07:20.506 INFO  ValidateVariants - Shutting down engine
[June 4, 2019 9:07:20 AM UTC] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 0.14 minutes.
Runtime.totalMemory()=646447104

SelectVariants
gatk --java-options "-XX:ParallelGCThreads=1" SelectVariants --output /mnt/gatk_select_variants-20190603_160115-5296_outdir/final.vcf --variant /mnt/gatk_select_variants-20190603_160115-5296_outdir/hard_filtered_merged.vcf.gz --keep-original-ac true --keep-original-dp true --set-filtered-gt-to-nocall true --exclude-filtered true --exclude-non-variants true --reference /mnt/gatk_select_variants-20190603_160115-5296_reference_0/reference.fasta --remove-unused-alternates true

Using GATK jar /gatk/gatk-package-4.1.2.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -XX:ParallelGCThreads=1 -jar /gatk/gatk-package-4.1.2.0-local.jar SelectV
ariants --output /mnt/gatk_select_variants-20190603_160115-5296_outdir/final.vcf --variant /mnt/gatk_select_variants-20190603_160115-5296_outdir/hard_filtered_merged.vcf.gz --keep-original-ac true --keep-original-dp true --set-filtered-g
t-to-nocall true --exclude-filtered true --exclude-non-variants true --reference /mnt/gatk_select_variants-20190603_160115-5296_reference_0/reference.fasta --remove-unused-alternates true
09:25:38.165 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jun 04, 2019 9:25:40 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
09:25:40.307 INFO  SelectVariants - ------------------------------------------------------------
09:25:40.309 INFO  SelectVariants - The Genome Analysis Toolkit (GATK) v4.1.2.0
09:25:40.309 INFO  SelectVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
09:25:40.310 INFO  SelectVariants - Executing as [email protected] on Linux v2.6.32-696.23.1.el6.x86_64 amd64
09:25:40.311 INFO  SelectVariants - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_191-8u191-b12-0ubuntu0.16.04.1-b12
09:25:40.312 INFO  SelectVariants - Start Date/Time: June 4, 2019 9:25:38 AM UTC
09:25:40.312 INFO  SelectVariants - ------------------------------------------------------------
09:25:40.312 INFO  SelectVariants - ------------------------------------------------------------
09:25:40.314 INFO  SelectVariants - HTSJDK Version: 2.19.0
09:25:40.314 INFO  SelectVariants - Picard Version: 2.19.0
09:25:40.315 INFO  SelectVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
09:25:40.315 INFO  SelectVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
09:25:40.315 INFO  SelectVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
09:25:40.315 INFO  SelectVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
09:25:40.316 INFO  SelectVariants - Deflater: IntelDeflater
09:25:40.316 INFO  SelectVariants - Inflater: IntelInflater
09:25:40.316 INFO  SelectVariants - GCS max retries/reopens: 20
09:25:40.316 INFO  SelectVariants - Requester pays: disabled
09:25:40.317 INFO  SelectVariants - Initializing engine
09:25:41.061 INFO  FeatureManager - Using codec VCFCodec to read file file:///mnt/gatk_select_variants-20190603_160115-5296_outdir/hard_filtered_merged.vcf.gz
09:25:41.172 INFO  SelectVariants - Done initializing engine
09:25:41.300 INFO  ProgressMeter - Starting traversal
09:25:41.301 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute
09:25:41.553 INFO  SelectVariants - Shutting down engine
[June 4, 2019 9:25:41 AM UTC] org.broadinstitute.hellbender.tools.walkers.variantutils.SelectVariants done. Elapsed time: 0.06 minutes.
Runtime.totalMemory()=652214272
java.lang.IllegalStateException: Allele in genotype CT* not in the variant context [CT*, C]
        at htsjdk.variant.variantcontext.VariantContext.validateGenotypes(VariantContext.java:1363)
        at htsjdk.variant.variantcontext.VariantContext.validate(VariantContext.java:1298)
        at htsjdk.variant.variantcontext.VariantContext.<init>(VariantContext.java:400)
        at htsjdk.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:579)
        at htsjdk.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:573)
        at org.broadinstitute.hellbender.tools.walkers.variantutils.SelectVariants.apply(SelectVariants.java:587)
        at org.broadinstitute.hellbender.engine.VariantWalker.lambda$traverse$0(VariantWalker.java:106)
        at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
        at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
        at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
        at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
        at java.util.Iterator.forEachRemaining(Iterator.java:116)
        at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
        at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
        at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
        at org.broadinstitute.hellbender.engine.VariantWalker.traverse(VariantWalker.java:104)
        at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1039)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
        at org.broadinstitute.hellbender.Main.main(Main.java:291)

Answers

  • cnormancnorman United StatesMember, Broadie, Dev ✭✭
    edited June 4

    @mpmachado Unfortunately, ValidateVariants is somewhat non-intuitive in that it silently does no actual validation in your example (we're working on a fix - see this). In the meantime I'd suggest re-running it with the addition of -validation-type-to-exclude IDS to the command line and see if that reveals the underlying problem.

  • mpmachadompmachado LisboaMember

    Hi @cnorman,

    I rerun the ValidateVariants command with the extra option and in fact it returns an error (command and output bellow):

    one or more of the ALT allele(s) for the record at position 1:146019442 are not observed at all in the sample genotypes

    This happens because I ran VariantFiltration to filter some loci using both INFO and FORMAT fields.
    I thought that I could use SelectVariants to clean the resulted VCF like the locus bellow:

    #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT RCS1 RCS17 RCS8 RCS9
    1 146019442 . G GCCTCCCTGCCCCGGACCCTTGTGACTATGAA 128.30 QUAL_min AC=0;AF=0.00;AN=2;AS_BaseQRankSum=-0.900;AS_FS=0.000;AS_InbreedingCoeff=-0.1091;AS_MQ=43.02;AS_MQRankSum=-2.100;AS_QD=4.10;AS_ReadPosRankSum=-1.100;AS_SOR=0.818;BaseQRankSum=0.052;DP=652;ExcessHet=3.1037;FS=4.555;InbreedingCoeff=-0.1091;MLEAC=2;MLEAF=0.042;MQ=43.87;MQRankSum=-9.990e-01;NDA=4;QD=4.14;ReadPosRankSum=1.15;SOR=1.659 GT:AD:DP:FT:GQ:PGT:PID:PL:PS ./.:61,0:61:GQ_min_FORMAT:24:.:.:0,24,360 0/0:36,0:39:PASS:99:.:.:0,113,1531 ./.:27,4:31:GQ_min_FORMAT:86:.:.:86,0,988 ./.:38,0:38:GQ_min_FORMAT:15:.:.:0,15,225

    With this said, I remembered that I could try running SelectVariants without the --remove-unused-alternates option (command and output bellow). It turns out that it worked. This is not the ideal, but it's something that I can live with.

    Thank you for the tip.

    Best regards,

    Miguel


    ValidateVariants
    gatk --java-options "-XX:ParallelGCThreads=1" ValidateVariants --reference /mnt/gatk_select_variants-20190603_160115-5296_reference_0/reference.fasta --variant /mnt/gatk_select_variants-20190603_160115-5296_outdir/hard_filtered_merged.vcf.gz --validation-type-to-exclude IDS

    Using GATK jar /gatk/gatk-package-4.1.2.0-local.jar
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -XX:ParallelGCThreads=1 -jar /gatk/gatk-package-4.1.2.0-local.jar ValidateVariants --reference /mnt/gatk_select_variants-20190603_160115-5296_reference_0/reference.fasta --variant /mnt/gatk_select_variants-20190603_160115-5296_outdir/hard_filtered_merged.vcf.gz --validation-type-to-exclude IDS
    08:02:55.714 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Jun 05, 2019 8:02:57 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    08:02:57.930 INFO  ValidateVariants - ------------------------------------------------------------
    08:02:57.932 INFO  ValidateVariants - The Genome Analysis Toolkit (GATK) v4.1.2.0
    08:02:57.932 INFO  ValidateVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
    08:02:57.933 INFO  ValidateVariants - Executing as [email protected] on Linux v2.6.32-696.23.1.el6.x86_64 amd64
    08:02:57.934 INFO  ValidateVariants - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_191-8u191-b12-0ubuntu0.16.04.1-b12
    08:02:57.935 INFO  ValidateVariants - Start Date/Time: June 5, 2019 8:02:55 AM UTC
    08:02:57.935 INFO  ValidateVariants - ------------------------------------------------------------
    08:02:57.935 INFO  ValidateVariants - ------------------------------------------------------------
    08:02:57.937 INFO  ValidateVariants - HTSJDK Version: 2.19.0
    08:02:57.937 INFO  ValidateVariants - Picard Version: 2.19.0
    08:02:57.938 INFO  ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    08:02:57.938 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    08:02:57.938 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    08:02:57.938 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    08:02:57.939 INFO  ValidateVariants - Deflater: IntelDeflater
    08:02:57.939 INFO  ValidateVariants - Inflater: IntelInflater
    08:02:57.939 INFO  ValidateVariants - GCS max retries/reopens: 20
    08:02:57.940 INFO  ValidateVariants - Requester pays: disabled
    08:02:57.940 INFO  ValidateVariants - Initializing engine
    08:02:58.634 INFO  FeatureManager - Using codec VCFCodec to read file file:///mnt/gatk_select_variants-20190603_160115-5296_outdir/hard_filtered_merged.vcf.gz
    08:02:58.737 INFO  ValidateVariants - Done initializing engine
    08:02:58.738 INFO  ProgressMeter - Starting traversal
    08:02:58.738 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute
    08:02:58.809 INFO  ValidateVariants - Shutting down engine
    [June 5, 2019 8:02:58 AM UTC] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 0.05 minutes.
    Runtime.totalMemory()=667418624
    ***********************************************************************
    
    A USER ERROR has occurred: Input /mnt/gatk_select_variants-20190603_160115-5296_outdir/hard_filtered_merged.vcf.gz fails strict validation: one or more of the ALT allele(s) for the record at position 1:146019442 are not observed at all in the sample genotypes of type:
    
    ***********************************************************************
    Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
    

    SelectVariants
    gatk --java-options "-XX:ParallelGCThreads=1" SelectVariants --output /mnt/gatk_select_variants-20190603_160115-5296_outdir/final.vcf --variant /mnt/gatk_select_variants-20190603_160115-5296_outdir/hard_filtered_merged.vcf.gz --keep-original-ac true --keep-original-dp true --set-filtered-gt-to-nocall true --exclude-filtered true --exclude-non-variants true --reference /mnt/gatk_select_variants-20190603_160115-5296_reference_0/reference.fasta

    Using GATK jar /gatk/gatk-package-4.1.2.0-local.jar
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -XX:ParallelGCThreads=1 -jar /gatk/gatk-package-4.1.2.0-local.jar SelectVariants --output /mnt/gatk_select_variants-20190603_160115-5296_outdir/final.vcf --variant /mnt/gatk_select_variants-20190603_160115-5296_outdir/hard_filtered_merged.vcf.gz --keep-original-ac true --keep-original-dp true --set-filtered-gt-to-nocall true --exclude-filtered true --exclude-non-variants true --reference /mnt/gatk_select_variants-20190603_160115-5296_reference_0/reference.fasta
    08:27:44.173 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Jun 05, 2019 8:27:46 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    08:27:46.214 INFO  SelectVariants - ------------------------------------------------------------
    08:27:46.215 INFO  SelectVariants - The Genome Analysis Toolkit (GATK) v4.1.2.0
    08:27:46.216 INFO  SelectVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
    08:27:46.217 INFO  SelectVariants - Executing as [email protected] on Linux v2.6.32-696.23.1.el6.x86_64 amd64
    08:27:46.217 INFO  SelectVariants - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_191-8u191-b12-0ubuntu0.16.04.1-b12
    08:27:46.218 INFO  SelectVariants - Start Date/Time: June 5, 2019 8:27:44 AM UTC
    08:27:46.218 INFO  SelectVariants - ------------------------------------------------------------
    08:27:46.219 INFO  SelectVariants - ------------------------------------------------------------
    08:27:46.220 INFO  SelectVariants - HTSJDK Version: 2.19.0
    08:27:46.221 INFO  SelectVariants - Picard Version: 2.19.0
    08:27:46.221 INFO  SelectVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    08:27:46.221 INFO  SelectVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    08:27:46.221 INFO  SelectVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    08:27:46.222 INFO  SelectVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    08:27:46.222 INFO  SelectVariants - Deflater: IntelDeflater
    08:27:46.222 INFO  SelectVariants - Inflater: IntelInflater
    08:27:46.223 INFO  SelectVariants - GCS max retries/reopens: 20
    08:27:46.223 INFO  SelectVariants - Requester pays: disabled
    08:27:46.223 INFO  SelectVariants - Initializing engine
    08:27:46.903 INFO  FeatureManager - Using codec VCFCodec to read file file:///mnt/gatk_select_variants-20190603_160115-5296_outdir/hard_filtered_merged.vcf.gz
    08:27:47.008 INFO  SelectVariants - Done initializing engine
    08:27:47.130 INFO  ProgressMeter - Starting traversal
    08:27:47.131 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute
    08:27:47.499 INFO  ProgressMeter -             unmapped              0.0                   144          23542.2
    08:27:47.499 INFO  ProgressMeter - Traversal complete. Processed 144 total variants in 0.0 minutes.
    08:27:47.635 INFO  SelectVariants - Shutting down engine
    [June 5, 2019 8:27:47 AM UTC] org.broadinstitute.hellbender.tools.walkers.variantutils.SelectVariants done. Elapsed time: 0.06 minutes.
    Runtime.totalMemory()=656932864
    
Sign In or Register to comment.