VariantRecalibrator fails after traversal

dbeckerdbecker MunichMember ✭✭
edited October 2018 in Ask the GATK team

Hi,

VariantRecalibrator fails during Indel recalibration at the moment when it should finish. I use GATK4.0.4.0. and it's the first run to fail at this point.
The output recal file has no entry after the header, the tranches-file is completetly empty.

Error:

13:54:08.684 INFO  ProgressMeter -        chrY:56879627             24.2               5075692         210002.4
13:54:08.684 INFO  ProgressMeter - Traversal complete. Processed 5075692 total variants in 24.2 minutes.
13:54:08.831 INFO  VariantDataManager - QD:      mean = 25.11    standard deviation = 6.61
13:54:08.925 INFO  VariantDataManager - FS:      mean = 0.38     standard deviation = 2.35
13:54:09.020 INFO  VariantDataManager - SOR:     mean = 0.99     standard deviation = 0.64
13:54:09.110 INFO  VariantDataManager - MQRankSum:       mean = -0.01    standard deviation = 0.33
13:54:09.257 INFO  VariantDataManager - ReadPosRankSum:          mean = 0.39     standard deviation = 1.03
13:54:10.024 INFO  VariantDataManager - Annotations are now ordered by their information content: [QD, SOR, ReadPosRankSum, FS, MQRankSum]
13:54:10.078 INFO  VariantDataManager - Training with 301439 variants after standard deviation thresholding.
13:54:10.117 INFO  GaussianMixtureModel - Initializing model with 100 k-means iterations...
13:54:22.733 INFO  VariantRecalibratorEngine - Finished iteration 0.
13:54:26.379 INFO  VariantRecalibratorEngine - Finished iteration 5.    Current change in mixture coefficients = 1.01851
13:54:30.054 INFO  VariantRecalibratorEngine - Finished iteration 10.   Current change in mixture coefficients = 0.32170
13:54:33.267 INFO  VariantRecalibratorEngine - Convergence after 14 iterations!
13:54:33.742 INFO  VariantRecalibratorEngine - Evaluating full set of 696457 variants...
13:54:33.742 WARN  VariantRecalibratorEngine - Evaluate datum returned a NaN.
13:54:33.793 INFO  VariantDataManager - Selected worst 0 scoring variants --> variants with LOD <= -5.0000.
13:54:33.824 INFO  VariantRecalibrator - Shutting down engine
[8. Oktober 2018 13:54:33 MESZ] org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator done. Elapsed time: 24.70 minutes.
Runtime.totalMemory()=7774666752
java.lang.IllegalArgumentException: No data found.
        at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:34)
        at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.onTraversalSuccess(VariantRecalibrator.java:629)
        at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:894)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:134)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
        at org.broadinstitute.hellbender.Main.main(Main.java:289)

Command:

/opt/gatk/4.0.4.0/gatk VariantRecalibrator \
    -R GRCh38_latest_genomic_final.fa \
    -V recalibrated_snps_raw_indels.vcf \
    -O recalibrate_INDEL.recal \
    --tranches-file recalibrate_INDEL.tranches \
    --rscript-file vrecalibrate_INDEL_plots.R \
    -an QD \
    -an FS \
    -an SOR \
    -an MQRankSum \
    -an ReadPosRankSum \
    --resource mills,known=false,training=true,truth=true,prior=12.0:Mills.hg38.vcf \
    --resource dbsnp,known=true,training=false,truth=false,prior=2.0:dbsnp.vcf \
    --mode INDEL \
    --truth-sensitivity-tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 \
    --max-gaussians 4

Best,
Daniel

Best Answer

Answers

  • dbeckerdbecker MunichMember ✭✭
    edited October 2018

    Hi,

    got the same thing agian in another run. This time it happened after snp recalibration ...

    05:23:40.168 INFO  ProgressMeter -        chrY:56885951             25.6               5260371         205559.1
    05:23:40.169 INFO  ProgressMeter - Traversal complete. Processed 5260371 total variants in 25.6 minutes.
    05:23:40.821 INFO  VariantDataManager - QD:      mean = 20.00    standard deviation = 6.41
    05:23:41.627 INFO  VariantDataManager - FS:      mean = 0.36     standard deviation = 1.57
    05:23:42.415 INFO  VariantDataManager - SOR:     mean = 0.95     standard deviation = 0.56
    05:23:43.187 INFO  VariantDataManager - MQ:      mean = 52.06    standard deviation = 7.15
    05:23:43.975 INFO  VariantDataManager - MQRankSum:   mean = 0.01     standard deviation = 0.27
    05:23:45.045 INFO  VariantDataManager - ReadPosRankSum:      mean = 0.36     standard deviation = 0.88
    05:23:48.863 INFO  VariantDataManager - Annotations are now ordered by their information content: [MQ, QD, MQRankSum, SOR, ReadPosRankSum, FS]
    05:23:49.134 INFO  VariantDataManager - Training with 3477989 variants after standard deviation thresholding.
    05:23:49.134 WARN  VariantDataManager - WARNING: Very large training set detected. Downsampling to 2500000 training variants.
    05:23:49.304 INFO  GaussianMixtureModel - Initializing model with 100 k-means iterations...
    05:26:33.084 INFO  VariantRecalibratorEngine - Finished iteration 0.
    05:27:21.176 INFO  VariantRecalibratorEngine - Finished iteration 5.    Current change in mixture coefficients = 2.09969
    05:28:08.483 INFO  VariantRecalibratorEngine - Finished iteration 10.   Current change in mixture coefficients = 8.76455
    05:28:59.522 INFO  VariantRecalibratorEngine - Finished iteration 15.   Current change in mixture coefficients = 0.02448
    05:29:30.434 INFO  VariantRecalibratorEngine - Convergence after 18 iterations!
    05:29:36.064 INFO  VariantRecalibratorEngine - Evaluating full set of 4525622 variants...
    05:29:36.064 WARN  VariantRecalibratorEngine - Evaluate datum returned a NaN.
    05:29:36.281 INFO  VariantDataManager - Selected worst 0 scoring variants --> variants with LOD <= -5.0000.
    05:29:36.292 INFO  VariantRecalibrator - Shutting down engine
    [11. Oktober 2018 05:29:36 MESZ] org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator done. Elapsed time: 31.64 minutes.
    Runtime.totalMemory()=28639756288
    java.lang.IllegalArgumentException: No data found.
        at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:34)
        at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.onTraversalSuccess(VariantRecalibrator.java:629)
        at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:894)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:134)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
        at org.broadinstitute.hellbender.Main.main(Main.java:289)
    Using GATK jar /opt/gatk/4.0.4.0/gatk-package-4.0.4.0-local.jar
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx50G -jar /opt/gatk/4.0.4.0/gatk-package-4.0.4.0-local.jar ApplyVQSR -R /srv/nfs/ngsdata/genome_builds/GRCh38_latest_genomic_final.fa -V /srv/nfs/ngsdata/GATK/181009_NB551378_0039_AH5H35AFXY/_gatk/all_contigs.vcf -O /srv/nfs/ngsdata/GATK/181009_NB551378_0039_AH5H35AFXY/_gatk/recalibrated_snps_raw_indels.vcf --mode SNP --truth-sensitivity-filter-level 99.0 --recal-file /srv/nfs/ngsdata/GATK/181009_NB551378_0039_AH5H35AFXY/_gatk/recalibrate_SNP.recal --tranches-file /srv/nfs/ngsdata/GATK/181009_NB551378_0039_AH5H35AFXY/_gatk/recalibrate_SNP.tranches
    Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/srv/nfs/ngsdata/toolbox/gatk/cromwell-executions/run_callGermlineSNP/0a3949b4-9db6-47f4-8ae4-d735a2e91117/call-vqsr/execution/tmp.SaAPpN
    05:29:41.163 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/opt/gatk/4.0.4.0/gatk-package-4.0.4.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    05:29:41.327 INFO  ApplyVQSR - ------------------------------------------------------------
    05:29:41.328 INFO  ApplyVQSR - The Genome Analysis Toolkit (GATK) v4.0.4.0
    05:29:41.328 INFO  ApplyVQSR - For support and documentation go to https://software.broadinstitute.org/gatk/
    05:29:41.331 INFO  ApplyVQSR - Executing as [email protected] on Linux v3.10.0-862.3.2.el7.x86_64 amd64
    05:29:41.331 INFO  ApplyVQSR - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_102-b14
    05:29:41.331 INFO  ApplyVQSR - Start Date/Time: 11. Oktober 2018 05:29:41 MESZ
    05:29:41.331 INFO  ApplyVQSR - ------------------------------------------------------------
    05:29:41.331 INFO  ApplyVQSR - ------------------------------------------------------------
    05:29:41.332 INFO  ApplyVQSR - HTSJDK Version: 2.14.3
    05:29:41.332 INFO  ApplyVQSR - Picard Version: 2.18.2
    05:29:41.332 INFO  ApplyVQSR - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    05:29:41.332 INFO  ApplyVQSR - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    05:29:41.332 INFO  ApplyVQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    05:29:41.332 INFO  ApplyVQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    05:29:41.332 INFO  ApplyVQSR - Deflater: IntelDeflater
    05:29:41.332 INFO  ApplyVQSR - Inflater: IntelInflater
    05:29:41.332 INFO  ApplyVQSR - GCS max retries/reopens: 20
    05:29:41.332 INFO  ApplyVQSR - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
    05:29:41.332 INFO  ApplyVQSR - Initializing engine
    05:29:42.003 INFO  FeatureManager - Using codec VCFCodec to read file file:///srv/nfs/ngsdata/GATK/181009_NB551378_0039_AH5H35AFXY/_gatk/recalibrate_SNP.recal
    05:29:42.033 INFO  FeatureManager - Using codec VCFCodec to read file file:///srv/nfs/ngsdata/GATK/181009_NB551378_0039_AH5H35AFXY/_gatk/all_contigs.vcf
    05:29:42.330 INFO  ApplyVQSR - Done initializing engine
    05:29:42.351 INFO  ApplyVQSR - Shutting down engine
    [11. Oktober 2018 05:29:42 MESZ] org.broadinstitute.hellbender.tools.walkers.vqsr.ApplyVQSR done. Elapsed time: 0.02 minutes.
    Runtime.totalMemory()=2084044800
    ***********************************************************************
    
    A USER ERROR has occurred: No tranches were found in the file or were above the truth sensitivity filter level 99.0
    
    ***********************************************************************
    

    Anyone seen this before and has an idea how to solve this? I use 35 whole exomes and ~20 panels in this step. There should be enough data.

    Best,
    Daniel

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Hi @dbecker,

    Thanks for all your contributions on the GATK forum. Try lowering the --max-gaussians from 4 to 3 then 2. It may be that for indels in your particular dataset (which you say is derived from exomes), there is insufficient data to generate four clusters. Another user, @ekofman, came up with this solution in this thread.

  • dbeckerdbecker MunichMember ✭✭

    Hi @shlee,

    I'll try this when I get back to the office after ASHG.

    Yet, I don't really understand why it should help. Since we do mostly panels, I have 35 exomes that I merge into every run. This way I can use VQSR. It not only worked for the last ~30 runs, but VQSR also worked for only those 35 exomes.
    Shouldn't it work better and at least with the same amount of gaussians if I add (even so small) panels to a cohort that I could already perform VQSR to by itself?

    Best,
    Daniel

  • dbeckerdbecker MunichMember ✭✭

    Thank you!

    I'll try to build a buffer set that matches my samples better. The CNN Workflow sounds great, but since we are in diagnostics, its a difficult process to switch to tis workflow.

    Best,
    Daniel

Sign In or Register to comment.