Attention:
The frontline support team will be unavailable to answer questions on April 15th and 17th 2019. We will be back soon after. Thank you for your patience and we apologize for any inconvenience!

VariantRecalibrator fails after traversal

dbeckerdbecker MunichMember ✭✭
edited October 2018 in Ask the GATK team

Hi,

VariantRecalibrator fails during Indel recalibration at the moment when it should finish. I use GATK4.0.4.0. and it's the first run to fail at this point.
The output recal file has no entry after the header, the tranches-file is completetly empty.

Error:

13:54:08.684 INFO  ProgressMeter -        chrY:56879627             24.2               5075692         210002.4
13:54:08.684 INFO  ProgressMeter - Traversal complete. Processed 5075692 total variants in 24.2 minutes.
13:54:08.831 INFO  VariantDataManager - QD:      mean = 25.11    standard deviation = 6.61
13:54:08.925 INFO  VariantDataManager - FS:      mean = 0.38     standard deviation = 2.35
13:54:09.020 INFO  VariantDataManager - SOR:     mean = 0.99     standard deviation = 0.64
13:54:09.110 INFO  VariantDataManager - MQRankSum:       mean = -0.01    standard deviation = 0.33
13:54:09.257 INFO  VariantDataManager - ReadPosRankSum:          mean = 0.39     standard deviation = 1.03
13:54:10.024 INFO  VariantDataManager - Annotations are now ordered by their information content: [QD, SOR, ReadPosRankSum, FS, MQRankSum]
13:54:10.078 INFO  VariantDataManager - Training with 301439 variants after standard deviation thresholding.
13:54:10.117 INFO  GaussianMixtureModel - Initializing model with 100 k-means iterations...
13:54:22.733 INFO  VariantRecalibratorEngine - Finished iteration 0.
13:54:26.379 INFO  VariantRecalibratorEngine - Finished iteration 5.    Current change in mixture coefficients = 1.01851
13:54:30.054 INFO  VariantRecalibratorEngine - Finished iteration 10.   Current change in mixture coefficients = 0.32170
13:54:33.267 INFO  VariantRecalibratorEngine - Convergence after 14 iterations!
13:54:33.742 INFO  VariantRecalibratorEngine - Evaluating full set of 696457 variants...
13:54:33.742 WARN  VariantRecalibratorEngine - Evaluate datum returned a NaN.
13:54:33.793 INFO  VariantDataManager - Selected worst 0 scoring variants --> variants with LOD <= -5.0000.
13:54:33.824 INFO  VariantRecalibrator - Shutting down engine
[8. Oktober 2018 13:54:33 MESZ] org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator done. Elapsed time: 24.70 minutes.
Runtime.totalMemory()=7774666752
java.lang.IllegalArgumentException: No data found.
        at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:34)
        at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.onTraversalSuccess(VariantRecalibrator.java:629)
        at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:894)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:134)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
        at org.broadinstitute.hellbender.Main.main(Main.java:289)

Command:

/opt/gatk/4.0.4.0/gatk VariantRecalibrator \
    -R GRCh38_latest_genomic_final.fa \
    -V recalibrated_snps_raw_indels.vcf \
    -O recalibrate_INDEL.recal \
    --tranches-file recalibrate_INDEL.tranches \
    --rscript-file vrecalibrate_INDEL_plots.R \
    -an QD \
    -an FS \
    -an SOR \
    -an MQRankSum \
    -an ReadPosRankSum \
    --resource mills,known=false,training=true,truth=true,prior=12.0:Mills.hg38.vcf \
    --resource dbsnp,known=true,training=false,truth=false,prior=2.0:dbsnp.vcf \
    --mode INDEL \
    --truth-sensitivity-tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 \
    --max-gaussians 4

Best,
Daniel

Best Answer

Answers

  • dbeckerdbecker MunichMember ✭✭
    edited October 2018

    Hi,

    got the same thing agian in another run. This time it happened after snp recalibration ...

    05:23:40.168 INFO  ProgressMeter -        chrY:56885951             25.6               5260371         205559.1
    05:23:40.169 INFO  ProgressMeter - Traversal complete. Processed 5260371 total variants in 25.6 minutes.
    05:23:40.821 INFO  VariantDataManager - QD:      mean = 20.00    standard deviation = 6.41
    05:23:41.627 INFO  VariantDataManager - FS:      mean = 0.36     standard deviation = 1.57
    05:23:42.415 INFO  VariantDataManager - SOR:     mean = 0.95     standard deviation = 0.56
    05:23:43.187 INFO  VariantDataManager - MQ:      mean = 52.06    standard deviation = 7.15
    05:23:43.975 INFO  VariantDataManager - MQRankSum:   mean = 0.01     standard deviation = 0.27
    05:23:45.045 INFO  VariantDataManager - ReadPosRankSum:      mean = 0.36     standard deviation = 0.88
    05:23:48.863 INFO  VariantDataManager - Annotations are now ordered by their information content: [MQ, QD, MQRankSum, SOR, ReadPosRankSum, FS]
    05:23:49.134 INFO  VariantDataManager - Training with 3477989 variants after standard deviation thresholding.
    05:23:49.134 WARN  VariantDataManager - WARNING: Very large training set detected. Downsampling to 2500000 training variants.
    05:23:49.304 INFO  GaussianMixtureModel - Initializing model with 100 k-means iterations...
    05:26:33.084 INFO  VariantRecalibratorEngine - Finished iteration 0.
    05:27:21.176 INFO  VariantRecalibratorEngine - Finished iteration 5.    Current change in mixture coefficients = 2.09969
    05:28:08.483 INFO  VariantRecalibratorEngine - Finished iteration 10.   Current change in mixture coefficients = 8.76455
    05:28:59.522 INFO  VariantRecalibratorEngine - Finished iteration 15.   Current change in mixture coefficients = 0.02448
    05:29:30.434 INFO  VariantRecalibratorEngine - Convergence after 18 iterations!
    05:29:36.064 INFO  VariantRecalibratorEngine - Evaluating full set of 4525622 variants...
    05:29:36.064 WARN  VariantRecalibratorEngine - Evaluate datum returned a NaN.
    05:29:36.281 INFO  VariantDataManager - Selected worst 0 scoring variants --> variants with LOD <= -5.0000.
    05:29:36.292 INFO  VariantRecalibrator - Shutting down engine
    [11. Oktober 2018 05:29:36 MESZ] org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator done. Elapsed time: 31.64 minutes.
    Runtime.totalMemory()=28639756288
    java.lang.IllegalArgumentException: No data found.
        at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:34)
        at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.onTraversalSuccess(VariantRecalibrator.java:629)
        at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:894)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:134)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
        at org.broadinstitute.hellbender.Main.main(Main.java:289)
    Using GATK jar /opt/gatk/4.0.4.0/gatk-package-4.0.4.0-local.jar
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx50G -jar /opt/gatk/4.0.4.0/gatk-package-4.0.4.0-local.jar ApplyVQSR -R /srv/nfs/ngsdata/genome_builds/GRCh38_latest_genomic_final.fa -V /srv/nfs/ngsdata/GATK/181009_NB551378_0039_AH5H35AFXY/_gatk/all_contigs.vcf -O /srv/nfs/ngsdata/GATK/181009_NB551378_0039_AH5H35AFXY/_gatk/recalibrated_snps_raw_indels.vcf --mode SNP --truth-sensitivity-filter-level 99.0 --recal-file /srv/nfs/ngsdata/GATK/181009_NB551378_0039_AH5H35AFXY/_gatk/recalibrate_SNP.recal --tranches-file /srv/nfs/ngsdata/GATK/181009_NB551378_0039_AH5H35AFXY/_gatk/recalibrate_SNP.tranches
    Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/srv/nfs/ngsdata/toolbox/gatk/cromwell-executions/run_callGermlineSNP/0a3949b4-9db6-47f4-8ae4-d735a2e91117/call-vqsr/execution/tmp.SaAPpN
    05:29:41.163 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/opt/gatk/4.0.4.0/gatk-package-4.0.4.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    05:29:41.327 INFO  ApplyVQSR - ------------------------------------------------------------
    05:29:41.328 INFO  ApplyVQSR - The Genome Analysis Toolkit (GATK) v4.0.4.0
    05:29:41.328 INFO  ApplyVQSR - For support and documentation go to https://software.broadinstitute.org/gatk/
    05:29:41.331 INFO  ApplyVQSR - Executing as [email protected] on Linux v3.10.0-862.3.2.el7.x86_64 amd64
    05:29:41.331 INFO  ApplyVQSR - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_102-b14
    05:29:41.331 INFO  ApplyVQSR - Start Date/Time: 11. Oktober 2018 05:29:41 MESZ
    05:29:41.331 INFO  ApplyVQSR - ------------------------------------------------------------
    05:29:41.331 INFO  ApplyVQSR - ------------------------------------------------------------
    05:29:41.332 INFO  ApplyVQSR - HTSJDK Version: 2.14.3
    05:29:41.332 INFO  ApplyVQSR - Picard Version: 2.18.2
    05:29:41.332 INFO  ApplyVQSR - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    05:29:41.332 INFO  ApplyVQSR - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    05:29:41.332 INFO  ApplyVQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    05:29:41.332 INFO  ApplyVQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    05:29:41.332 INFO  ApplyVQSR - Deflater: IntelDeflater
    05:29:41.332 INFO  ApplyVQSR - Inflater: IntelInflater
    05:29:41.332 INFO  ApplyVQSR - GCS max retries/reopens: 20
    05:29:41.332 INFO  ApplyVQSR - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
    05:29:41.332 INFO  ApplyVQSR - Initializing engine
    05:29:42.003 INFO  FeatureManager - Using codec VCFCodec to read file file:///srv/nfs/ngsdata/GATK/181009_NB551378_0039_AH5H35AFXY/_gatk/recalibrate_SNP.recal
    05:29:42.033 INFO  FeatureManager - Using codec VCFCodec to read file file:///srv/nfs/ngsdata/GATK/181009_NB551378_0039_AH5H35AFXY/_gatk/all_contigs.vcf
    05:29:42.330 INFO  ApplyVQSR - Done initializing engine
    05:29:42.351 INFO  ApplyVQSR - Shutting down engine
    [11. Oktober 2018 05:29:42 MESZ] org.broadinstitute.hellbender.tools.walkers.vqsr.ApplyVQSR done. Elapsed time: 0.02 minutes.
    Runtime.totalMemory()=2084044800
    ***********************************************************************
    
    A USER ERROR has occurred: No tranches were found in the file or were above the truth sensitivity filter level 99.0
    
    ***********************************************************************
    

    Anyone seen this before and has an idea how to solve this? I use 35 whole exomes and ~20 panels in this step. There should be enough data.

    Best,
    Daniel

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @dbecker,

    Thanks for all your contributions on the GATK forum. Try lowering the --max-gaussians from 4 to 3 then 2. It may be that for indels in your particular dataset (which you say is derived from exomes), there is insufficient data to generate four clusters. Another user, @ekofman, came up with this solution in this thread.

  • dbeckerdbecker MunichMember ✭✭

    Hi @shlee,

    I'll try this when I get back to the office after ASHG.

    Yet, I don't really understand why it should help. Since we do mostly panels, I have 35 exomes that I merge into every run. This way I can use VQSR. It not only worked for the last ~30 runs, but VQSR also worked for only those 35 exomes.
    Shouldn't it work better and at least with the same amount of gaussians if I add (even so small) panels to a cohort that I could already perform VQSR to by itself?

    Best,
    Daniel

  • dbeckerdbecker MunichMember ✭✭

    Thank you!

    I'll try to build a buffer set that matches my samples better. The CNN Workflow sounds great, but since we are in diagnostics, its a difficult process to switch to tis workflow.

    Best,
    Daniel

Sign In or Register to comment.