Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

GATK 3.8 No data Found

jay_hujay_hu Member
edited January 2018 in Ask the GATK team

Hi,I ran VariantRecalibrator to do the VQSR,first I got the ERROR message 'Unable to retrive the result' and I remove nt parameter. Then I got the following error message. I also saw Geraldine_VdAuwera said that VQSR is not available to pretty small data. I wonder if it is suitable to some panel data like 600Mb size ?

 java -Xmx4g -jar ../GenomeAnalysisTK-3.8-0-ge9d806836/GenomeAnalysisTK.jar \
-T VariantRecalibrator \
-R ../resource_bundle/ucsc.hg19.fasta \
-input annotated.vcf \
-recalFile out.recal \
-tranchesFile out.tranches  \
-resource:hapmap,known=false,training=true,truth=true,prior=15.0 ../resource_bundle/hapmap_3.3.hg19.sites.vcf \
-resource:omni,known=false,training=true,truth=true,prior=12.0 ../resource_bundle/1000G_omni2.5.hg19.sites.vcf \
-resource:1000G,known=false,training=true,truth=false,prior=10.0 ../resource_bundle/1000G_phase1.snps.high_confidence.hg19.sites.vcf \
-resource:dbsnp,known=true,training=false,truth=false,prior=2.0 ../resource_bundle/dbsnp_138.hg19.vcf \
-an QD -an MQ -an FS -an SOR -an MQRankSum -an ReadPosRankSum \
-mode SNP \
-L ../56gene171230/56gene-20170328.bed 
INFO  13:21:42,163 HelpFormatter - ---------------------------------------------------------------------------------- 
INFO  13:21:42,165 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.8-0-ge9d806836, Compiled 2017/07/28 21:26:50 
INFO  13:21:42,165 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute 
INFO  13:21:42,166 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk 
INFO  13:21:42,166 HelpFormatter - [Tue Jan 02 13:21:42 CST 2018] Executing on Mac OS X 10.12.6 x86_64 
INFO  13:21:42,166 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_144-b01 
INFO  13:21:42,170 HelpFormatter - Program Args: -T VariantRecalibrator -R ../resource_bundle/ucsc.hg19.fasta -input annotated.vcf -recalFile out.recal -tranchesFile out.tranches -resource:hapmap,known=false,training=true,truth=true,prior=15.0 ../resource_bundle/hapmap_3.3.hg19.sites.vcf -resource:omni,known=false,training=true,truth=true,prior=12.0 ../resource_bundle/1000G_omni2.5.hg19.sites.vcf -resource:1000G,known=false,training=true,truth=false,prior=10.0 ../resource_bundle/1000G_phase1.snps.high_confidence.hg19.sites.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 ../resource_bundle/dbsnp_138.hg19.vcf -an QD -an MQ -an FS -an SOR -an MQRankSum -an ReadPosRankSum -mode SNP -L ../56gene171230/56gene-20170328.bed 
INFO  13:21:42,175 HelpFormatter - Executing as [email protected] on Mac OS X 10.12.6 x86_64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_144-b01. 
INFO  13:21:42,175 HelpFormatter - Date/Time: 2018/01/02 13:21:42 
INFO  13:21:42,176 HelpFormatter - ---------------------------------------------------------------------------------- 
INFO  13:21:42,176 HelpFormatter - ---------------------------------------------------------------------------------- 
ERROR StatusLogger Unable to create class org.apache.logging.log4j.core.impl.Log4jContextFactory specified in jar:file:/Users/bioinformatician/Programs/gatk/GenomeAnalysisTK-3.8-0-ge9d806836/GenomeAnalysisTK.jar!/META-INF/log4j-provider.properties
ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...
INFO  13:21:42,303 GenomeAnalysisEngine - Deflater: IntelDeflater 
INFO  13:21:42,303 GenomeAnalysisEngine - Inflater: IntelInflater 
INFO  13:21:42,304 GenomeAnalysisEngine - Strictness is SILENT 
INFO  13:21:42,376 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 
INFO  13:21:43,036 IntervalUtils - Processing 228705 bp from intervals 
INFO  13:21:43,095 GenomeAnalysisEngine - Preparing for traversal 
INFO  13:21:43,096 GenomeAnalysisEngine - Done preparing for traversal 
INFO  13:21:43,096 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
INFO  13:21:43,097 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining 
INFO  13:21:43,097 ProgressMeter -        Location |     sites | elapsed |     sites | completed | runtime |   runtime 
INFO  13:21:43,099 TrainingSet - Found hapmap track:    Known = false   Training = true     Truth = true    Prior = Q15.0 
INFO  13:21:43,099 TrainingSet - Found omni track:  Known = false   Training = true     Truth = true    Prior = Q12.0 
INFO  13:21:43,099 TrainingSet - Found 1000G track:     Known = false   Training = true     Truth = false   Prior = Q10.0 
INFO  13:21:43,100 TrainingSet - Found dbsnp track:     Known = true    Training = false    Truth = false   Prior = Q2.0 
INFO  13:21:44,050 VariantDataManager - QD:      mean = 20.71    standard deviation = 9.05 
INFO  13:21:44,051 VariantDataManager - MQ:      mean = 60.13    standard deviation = 1.53 
INFO  13:21:44,051 VariantDataManager - FS:      mean = 2.31     standard deviation = 3.39 
INFO  13:21:44,052 VariantDataManager - SOR:     mean = 1.05     standard deviation = 0.59 
INFO  13:21:44,053 VariantDataManager - MQRankSum:   mean = -0.28    standard deviation = 1.09 
INFO  13:21:44,053 VariantDataManager - ReadPosRankSum:      mean = 0.03     standard deviation = 1.00 
INFO  13:21:44,057 VariantDataManager - Annotations are now ordered by their information content: [MQ, QD, FS, SOR, ReadPosRankSum, MQRankSum] 
INFO  13:21:44,058 VariantDataManager - Training with 158 variants after standard deviation thresholding. 
WARN  13:21:44,058 VariantDataManager - WARNING: Training with very few variant sites! Please check the model reporting PDF to ensure the quality of the model is reliable. 
INFO  13:21:44,061 GaussianMixtureModel - Initializing model with 100 k-means iterations... 
INFO  13:21:44,117 VariantRecalibratorEngine - Finished iteration 0. 
INFO  13:21:44,137 VariantRecalibratorEngine - Finished iteration 5.    Current change in mixture coefficients = 0.11471 
INFO  13:21:44,145 VariantRecalibratorEngine - Finished iteration 10.   Current change in mixture coefficients = 0.01279 
INFO  13:21:44,157 VariantRecalibratorEngine - Finished iteration 15.   Current change in mixture coefficients = 0.02997 
INFO  13:21:44,165 VariantRecalibratorEngine - Finished iteration 20.   Current change in mixture coefficients = 0.01192 
INFO  13:21:44,175 VariantRecalibratorEngine - Finished iteration 25.   Current change in mixture coefficients = 0.00462 
INFO  13:21:44,182 VariantRecalibratorEngine - Finished iteration 30.   Current change in mixture coefficients = 0.00288 
INFO  13:21:44,189 VariantRecalibratorEngine - Convergence after 34 iterations! 
WARN  13:21:44,193 VariantRecalibratorEngine - Model could not pre-compute denominators. 
INFO  13:21:44,197 VariantDataManager - Selected worst 0 scoring variants --> variants with LOD <= -5.0000. 
##### ERROR --
##### ERROR stack trace 
java.lang.IllegalArgumentException: No data found.
    at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:88)
    at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:536)
    at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:191)
    at org.broadinstitute.gatk.engine.executive.Accumulator$StandardAccumulator.finishTraversal(Accumulator.java:129)
    at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:115)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:323)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.8-0-ge9d806836):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: No data found.
##### ERROR ------------------------------------------------------------------------------------------
Post edited by shlee on

Best Answer

Answers

  • hi, @shlee ,thank you for your reply. I will try some exom data and make sure if it is a problem about the data size.

  • splaisansplaisan Leuven (Belgium)Member ✭✭
    edited June 2018

    I have a related issue under 4.0.5.
    I adapted my command to reflect the new (and welcome) syntax changes but it seems that I still mis something.

    This command has worked under version 4.0.4 recently (with old syntax)

    the former syntax was:

    java -jar $GATK/gatk.jar VariantRecalibrator \
        -R $BWA_INDEXES/NCBI_GRCh38.fa \
        -V ${samplename}-chr22.vcf.gz \
        --resource hapmap,known=false,training=true,truth=true,prior=15.0:${truetraining15} \
        --resource omni,known=false,training=true,truth=true,prior=12.0:${truetraining12} \
        --resource 1000g,known=false,training=true,truth=false,prior=10.0:${nontruetraining10} \
        --resource dbsnp,known=true,training=false,truth=false,prior=2.0:${knowntraining2} \
        --resource Mills_and_1000G_gold,known=false,training=true,truth=true,prior=12.0:${truetrainingindel12}\
        -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -an DP \
        -mode BOTH \
        --output chr22-output.recal \
        --tranches-file chr22-output.tranches \
        --rscript-file chr22-output.plots.R
    

    my new syntax attempt

    # https://gatkforums.broadinstitute.org/gatk/discussion/1259/which-training-sets-arguments-should-i-use-for-running-vqsr
    # True sites training resource: HapMap
    truetraining15=reference/hg38_v0_hapmap_3.3.hg38.vcf.gz
    # True sites training resource: Omni
    truetraining12=reference/hg38_v0_1000G_omni2.5.hg38.vcf.gz
    # Non-true sites training resource: 1000G
    nontruetraining10=reference/hg38_v0_Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
    # Known sites resource, not used in training: dbSNP
    knowntraining2=reference/hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf.gz
    # indels True sites training resource: Mills
    truetrainingindel12=reference/hg38_v0_Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
    
    java -jar $GATK/gatk.jar VariantRecalibrator \
        -R $BWA_INDEXES/NCBI_GRCh38.fa \
        -V bwa_mappings/${samplename}_${p}.vcf.gz \
        --resource hapmap,known=false,training=true,truth=true,prior=15.0:${truetraining15} \
        --resource omni,known=false,training=true,truth=true,prior=12.0:${truetraining12} \
        --resource 1000g,known=false,training=true,truth=false,prior=10.0:${nontruetraining10} \
        --resource dbsnp,known=true,training=false,truth=false,prior=2.0:${knowntraining2} \
        --resource Mills_and_1000G_gold,known=false,training=true,truth=true,prior=12.0:${truetrainingindel12} \
        -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -an DP \
        --mode BOTH \
        --output bwa_mappings/output.recal_${p} \
        --tranches-file bwa_mappings/output.tranches_${p} \
        --rscript-file bwa_mappings/output.plots_${p}.R
    

    Any idea what could be the reason (there are 67000+ variants in that VCF !)

    the job stdout stderr

    java -jar $GATK/gatk.jar VariantRecalibrator \
    > -R $BWA_INDEXES/NCBI_GRCh38.fa \
    > -V bwa_mappings/${samplename}_${p}.vcf.gz \
    > --resource hapmap,known=false,training=true,truth=true,prior=15.0:${truetraining15} \
    > --resource omni,known=false,training=true,truth=true,prior=12.0:${truetraining12} \
    > --resource 1000g,known=false,training=true,truth=false,prior=10.0:${nontruetraining10} \
    > --resource dbsnp,known=true,training=false,truth=false,prior=2.0:${knowntraining2} \
    > --resource Mills_and_1000G_gold,known=false,training=true,truth=true,prior=12.0:${truetrainingindel12} \
    > -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -an DP \
    > --mode BOTH \
    > --output bwa_mappings/output.recal_${p} \
    > --tranches-file bwa_mappings/output.tranches_${p} \
    > --rscript-file bwa_mappings/output.plots_${p}.R
    14:48:27.555 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/opt/biotools/gatk-4.0.5.0/gatk-package-4.0.5.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    14:48:27.708 INFO  VariantRecalibrator - ------------------------------------------------------------
    14:48:27.708 INFO  VariantRecalibrator - The Genome Analysis Toolkit (GATK) v4.0.5.0
    14:48:27.708 INFO  VariantRecalibrator - For support and documentation go to https://software.broadinstitute.org/gatk/
    14:48:27.709 INFO  VariantRecalibrator - Executing as [email protected] on Linux v4.4.0-127-generic amd64
    14:48:27.709 INFO  VariantRecalibrator - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_171-8u171-b11-0ubuntu0.16.04.1-b11
    14:48:27.709 INFO  VariantRecalibrator - Start Date/Time: June 8, 2018 2:48:27 PM CEST
    14:48:27.709 INFO  VariantRecalibrator - ------------------------------------------------------------
    14:48:27.709 INFO  VariantRecalibrator - ------------------------------------------------------------
    14:48:27.710 INFO  VariantRecalibrator - HTSJDK Version: 2.15.1
    14:48:27.710 INFO  VariantRecalibrator - Picard Version: 2.18.2
    14:48:27.710 INFO  VariantRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    14:48:27.710 INFO  VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    14:48:27.710 INFO  VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    14:48:27.710 INFO  VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    14:48:27.710 INFO  VariantRecalibrator - Deflater: IntelDeflater
    14:48:27.710 INFO  VariantRecalibrator - Inflater: IntelInflater
    14:48:27.710 INFO  VariantRecalibrator - GCS max retries/reopens: 20
    14:48:27.710 INFO  VariantRecalibrator - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
    14:48:27.710 INFO  VariantRecalibrator - Initializing engine
    14:48:28.203 INFO  FeatureManager - Using codec VCFCodec to read file file:///mnt/freenas/NGS_Variant-Analysis-training2018/reference/hg38_v0_hapmap_3.3.hg38.vcf.gz
    14:48:28.363 INFO  FeatureManager - Using codec VCFCodec to read file file:///mnt/freenas/NGS_Variant-Analysis-training2018/reference/hg38_v0_1000G_omni2.5.hg38.vcf.gz
    14:48:28.501 INFO  FeatureManager - Using codec VCFCodec to read file file:///mnt/freenas/NGS_Variant-Analysis-training2018/reference/hg38_v0_Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
    14:48:28.624 INFO  FeatureManager - Using codec VCFCodec to read file file:///mnt/freenas/NGS_Variant-Analysis-training2018/reference/hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf.gz
    14:48:28.756 INFO  FeatureManager - Using codec VCFCodec to read file file:///mnt/freenas/NGS_Variant-Analysis-training2018/reference/hg38_v0_Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
    14:48:28.931 INFO  FeatureManager - Using codec VCFCodec to read file file:///mnt/freenas/NGS_Variant-Analysis-training2018/bwa_mappings/NA12878_0.033.vcf.gz
    14:48:29.028 INFO  VariantRecalibrator - Done initializing engine
    14:48:29.039 INFO  TrainingSet - Found hapmap track:    Known = false   Training = true         Truth = true    Prior = Q15.0
    14:48:29.040 INFO  TrainingSet - Found omni track:      Known = false   Training = true         Truth = true    Prior = Q12.0
    14:48:29.040 INFO  TrainingSet - Found 1000g track:     Known = false   Training = true         Truth = false   Prior = Q10.0
    14:48:29.040 INFO  TrainingSet - Found dbsnp track:     Known = true    Training = false        Truth = false   Prior = Q2.0
    14:48:29.040 INFO  TrainingSet - Found Mills_and_1000G_gold track:      Known = false   Training = true         Truth = true    Prior = Q12.0
    14:48:29.048 WARN  GATKVariantContextUtils - Can't determine output variant file format from output file extension "033". Defaulting to VCF.
    14:48:29.074 INFO  ProgressMeter - Starting traversal
    14:48:29.074 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute
    14:48:34.736 INFO  ProgressMeter -       chr22:50469426              0.1                 67609         716576.6
    14:48:34.737 INFO  ProgressMeter - Traversal complete. Processed 67609 total variants in 0.1 minutes.
    14:48:34.752 INFO  VariantDataManager - QD:      mean = 18.80    standard deviation = 10.09
    14:48:34.763 INFO  VariantDataManager - MQ:      mean = 59.85    standard deviation = 1.58
    14:48:34.772 INFO  VariantDataManager - MQRankSum:       mean = -0.01    standard deviation = 0.15
    14:48:34.779 INFO  VariantDataManager - ReadPosRankSum:          mean = 0.01     standard deviation = 0.84
    14:48:34.786 INFO  VariantDataManager - FS:      mean = 1.22     standard deviation = 2.54
    14:48:34.791 INFO  VariantDataManager - SOR:     mean = 1.19     standard deviation = 0.77
    14:48:34.794 INFO  VariantDataManager - DP:      mean = 8.74     standard deviation = 3.21
    14:48:34.853 INFO  VariantDataManager - Annotations are now ordered by their information content: [MQ, QD, DP, MQRankSum, SOR, FS, ReadPosRankSum]
    14:48:34.860 INFO  VariantDataManager - Training with 24550 variants after standard deviation thresholding.
    14:48:34.863 INFO  GaussianMixtureModel - Initializing model with 100 k-means iterations...
    14:48:35.702 INFO  VariantRecalibratorEngine - Finished iteration 0.
    14:48:36.182 INFO  VariantRecalibratorEngine - Finished iteration 5.    Current change in mixture coefficients = 1.72393
    14:48:36.729 INFO  VariantRecalibratorEngine - Finished iteration 10.   Current change in mixture coefficients = 0.51378
    14:48:37.256 INFO  VariantRecalibratorEngine - Finished iteration 15.   Current change in mixture coefficients = 0.01497
    14:48:37.733 INFO  VariantRecalibratorEngine - Finished iteration 20.   Current change in mixture coefficients = 0.00340
    14:48:37.921 INFO  VariantRecalibratorEngine - Convergence after 22 iterations!
    14:48:38.017 WARN  VariantRecalibratorEngine - Model could not pre-compute denominators.
    14:48:38.032 INFO  VariantDataManager - Selected worst 0 scoring variants --> variants with LOD <= -5.0000.
    14:48:38.058 INFO  VariantRecalibrator - Shutting down engine
    [June 8, 2018 2:48:38 PM CEST] org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator done. Elapsed time: 0.18 minutes.
    Runtime.totalMemory()=3897032704
    java.lang.IllegalArgumentException: No data found.
            at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:34)
            at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.onTraversalSuccess(VariantRecalibrator.java:630)
            at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:982)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:135)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:180)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:199)
            at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
            at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
            at org.broadinstitute.hellbender.Main.main(Main.java:289)
    
  • splaisansplaisan Leuven (Belgium)Member ✭✭
    edited June 2018

    OK it seems that I do not have enough 'good' variants after all.
    This was a VCF obtained from a low-depth coverage (10x) BAM chr22 subset and when I use the full depth coverage VCF (300x) instead I do get a working command.

    It is apparently not so much the number of variants which make the BIG data job succeed (91203 of which 27436 variants after standard deviation thresholding) but likely the fact that they have better coverage or scores while the calls present in the low-coverage data are probably not suitable for calibration.

    Sorry for the post, maybe it will help others playing with low-coverage!!

Sign In or Register to comment.