Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

VariantRecalibrator error only on tumor, only in INDEL mode? Possible workaround?

Hello,

I have been trying to follow the best practices outlined in the slide at https://www.broadinstitute.org/gatk/guide/best-practices (DNAseq), and everything has gone well so far. I tried 1 tumor sample, and 1 matched normal sample, each run individually all the way through.

When I ran VariantRecalibrator on the tumor sample in INDEL mode, I received an error. I tried again, removing the '-nt 4' option and received the same error. Normal SNP mode, normal INDEL mode, and tumor SNP mode all ran without any error.

ERROR stack trace

java.lang.IllegalArgumentException: No data found.
at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:88)
at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:399)
at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:143)
at org.broadinstitute.gatk.engine.executive.Accumulator$StandardAccumulator.finishTraversal(Accumulator.java:129)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:116)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:319)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:107)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.3-0-g37228af):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: No data found.
ERROR ------------------------------------------------------------------------------------------

Through some searching I found here on the forums that more samples may need to be added, or 'padded' with 1000 genomes data, or that 'we are not sure if VQSR is doing the right thing with tumor data.' I am trying out workflows here on an Intel i5 computer with 4 GB of memory and I am not sure how well I will be able to run a large number of samples together.

Is there something I can do to possibly remedy the error, or continue with the analysis? Thank you so much.

Here is the command I used:

java -Xmx3g -jar /gatk_3.3/GenomeAnalysisTK.jar -T VariantRecalibrator -R /gatk_3.3/resources/hg19/ucsc.hg19.fasta -input tumor.realigned.recal.HaplotypeCaller.vcf -recalFile tumor.realigned.recal.HaplotypeCaller.vcf.INDELs.recal -tranchesFile tumor.realigned.recal.HaplotypeCaller.vcf.INDELs.tranches -nt 4 --maxGaussians 4 -resource:mills,known=false,training=true,truth=true,prior=12.0 /gatk_3.3/resources/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /gatk_3.3/resources/hg19/dbsnp_138.hg19.vcf -an QD -an FS -an SOR -an ReadPosRankSum -an MQRankSum -mode INDEL

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Most probably a problem of not enough indels in your tumor sample. One solution would be to run both tumor and normal through VQSR together (which I think is the right thing to do anyway). But that requires you to have called variants on them together. What are you using as variant caller, by the way? MuTect, I hope? The GATK callers are currently not appropriate for calling somatic variants due to how they utilize allele frequency in the variant likelihood modeling.

  • rontonronton USAMember

    I used HaplotypeCaller.

    I will try BWA mem, picard sort, picard mark duplicates, GATK indel realignment, base recalibration, and then MuTect using tumor.bam and normal.bam. Next, VQSR on tumor and normal together, using MuTect output, and compare that with the output from HaplotypeCaller on the normal.

  • rontonronton USAMember

    Thank you so much for your help. I will read through the MuTect paper and documentation, and likely do exactly what you suggest. I am just trying to make sense of it.

Sign In or Register to comment.