Bug Bulletin: we have identified a bug that affects indexing when producing gzipped VCFs. This will be fixed in the upcoming 3.2 release; in the meantime you need to reindex gzipped VCFs using Tabix.

VariantRecalibrator crashes

ondrej77ondrej77 Posts: 2Member

I have ran VariantRecalibrator on a smaller VCF file (made with UnifiedGenotyper -L chr1) and it finished with no errors. Then I ran VCF file (made with UnifiedGenotyper without -L parameter) and it crashed but also without any error. The log file of the first successful run looks like this

... INFO 13:26:21,377 VariantRecalibrator - Building FS x DP plot... INFO 13:26:21,379 VariantRecalibratorEngine - Evaluating full set of 18354 variants... INFO 13:26:23,556 VariantRecalibratorEngine - Evaluating full set of 18354 variants... INFO 13:26:25,378 VariantRecalibrator - Building QD x DP plot... INFO 13:26:25,379 VariantRecalibratorEngine - Evaluating full set of 6384 variants... INFO 13:26:26,140 VariantRecalibratorEngine - Evaluating full set of 6384 variants... INFO 13:26:26,832 VariantRecalibrator - Executing: Rscript /cluster11/podlaha/AllelicImbalance/Data/GATK_VCF_Output/13_L_VariantRecalibrator/13_all_snp.plots.R INFO 13:26:28,400 ProgressMeter - chrY:59358202 5.66e+07 14.0 m 14.0 s 98.7% 14.2 m 11.0 s INFO 13:26:58,410 ProgressMeter - chrY:59358202 5.66e+07 14.5 m 15.0 s 98.7% 14.7 m 11.0 s INFO 13:27:28,420 ProgressMeter - chrY:59358202 5.66e+07 15.0 m 15.0 s 98.7% 15.2 m 12.0 s INFO 13:27:58,429 ProgressMeter - chrY:59358202 5.66e+07 15.5 m 16.0 s 98.7% 15.7 m 12.0 s INFO 13:28:28,439 ProgressMeter - chrY:59358202 5.66e+07 16.0 m 16.0 s 98.7% 16.2 m 12.0 s INFO 13:28:58,449 ProgressMeter - chrY:59358202 5.66e+07 16.5 m 17.0 s 98.7% 16.7 m 13.0 s INFO 13:29:28,460 ProgressMeter - chrY:59358202 5.66e+07 17.0 m 18.0 s 98.7% 17.2 m 13.0 s INFO 13:29:58,469 ProgressMeter - chrY:59358202 5.66e+07 17.5 m 18.0 s 98.7% 17.7 m 14.0 s INFO 13:30:28,479 ProgressMeter - chrY:59358202 5.66e+07 18.0 m 19.0 s 98.7% 18.2 m 14.0 s INFO 13:30:58,489 ProgressMeter - chrY:59358202 5.66e+07 18.5 m 19.0 s 98.7% 18.7 m 14.0 s INFO 13:31:28,155 VariantRecalibrator - Executing: Rscript (resource)org/broadinstitute/sting/gatk/walkers/variantrecalibration/plot_Tranches.R /cluster11/podlaha/AllelicImbalance/Data/GATK_VCF_Output/13_L_VariantRecalibrator/13_all_snp.tranches 2.15 INFO 13:31:28,499 ProgressMeter - chrY:59358202 5.66e+07 19.0 m 20.0 s 98.7% 19.3 m 15.0 s INFO 13:31:28,847 ProgressMeter - done 5.66e+07 19.0 m 20.0 s 98.7% 19.3 m 15.0 s INFO 13:31:28,847 ProgressMeter - Total runtime 1140.77 secs, 19.01 min, 0.32 hours I ...

The log file of the crashed run ENDS like this ... INFO 19:09:28,987 VariantRecalibrator - Building FS x QD plot... INFO 19:09:28,988 VariantRecalibratorEngine - Evaluating full set of 7300 variants... INFO 19:09:30,191 VariantRecalibratorEngine - Evaluating full set of 7300 variants... INFO 19:09:31,214 VariantRecalibrator - Building FS x DP plot... INFO 19:09:31,217 VariantRecalibratorEngine - Evaluating full set of 21170 variants... INFO 19:09:34,703 VariantRecalibratorEngine - Evaluating full set of 21170 variants... INFO 19:09:37,357 VariantRecalibrator - Building QD x DP plot... INFO 19:09:37,358 VariantRecalibratorEngine - Evaluating full set of 7250 variants... INFO 19:09:38,552 VariantRecalibratorEngine - Evaluating full set of 7250 variants... INFO 19:09:39,550 VariantRecalibrator - Executing: Rscript /cluster11/podlaha/AllelicImbalance/Data/GATK_VCF_Output/13_VariantRecalibrator/13_all_snp.plots.R INFO 19:09:51,101 ProgressMeter - chrY:59358159 5.68e+07 16.5 m 17.0 s 98.7% 16.7 m 13.0 s INFO 19:10:21,111 ProgressMeter - chrY:59358159 5.68e+07 17.0 m 17.0 s 98.7% 17.2 m 13.0 s INFO 19:10:51,119 ProgressMeter - chrY:59358159 5.68e+07 17.5 m 18.0 s 98.7% 17.7 m 14.0 s INFO 19:11:21,129 ProgressMeter - chrY:59358159 5.68e+07 18.0 m 19.0 s 98.7% 18.2 m 14.0 s INFO 19:11:51,139 ProgressMeter - chrY:59358159 5.68e+07 18.5 m 19.0 s 98.7% 18.7 m 14.0 s INFO 19:12:21,148 ProgressMeter - chrY:59358159 5.68e+07 19.0 m 20.0 s 98.7% 19.3 m 15.0 s INFO 19:12:51,157 ProgressMeter - chrY:59358159 5.68e+07 19.5 m 20.0 s 98.7% 19.8 m 15.0 s INFO 19:13:21,167 ProgressMeter - chrY:59358159 5.68e+07 20.0 m 21.0 s 98.7% 20.3 m 16.0 s INFO 19:13:51,176 ProgressMeter - chrY:59358159 5.68e+07 20.5 m 21.0 s 98.7% 20.8 m 16.0 s

I am suspecting that the execution of the tranches Rscript crashed it? Like I said, no error showed up when it crashed. Any suggestions how to make it work?

My command line: INFO 18:53:18,124 HelpFormatter - Program Args: -T VariantRecalibrator -R /cluster8/podlaha/HumanGenome/ucsc.hg19.fasta -mode SNP -input /cluster11/podlaha/AllelicImbalance/Data/GATK_VCF_Output/12_UnifiedGenotyper/12_UG_all.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 /cluster11/podlaha/Software/GATK/Resources/hapmap_3.3.hg19.vcf -resource:omni,known=false,training=true,truth=false,prior=12.0 /cluster11/podlaha/Software/GATK/Resources/1000G_omni2.5.hg19.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=6.0 /cluster11/podlaha/Software/GATK/Resources/dbsnp_137.hg19.vcf -resource:1000G,known=false,training=true,truth=false,prior=10.0 /cluster11/podlaha/Software/GATK/Resources/1000G_phase1.snps.high_confidence.hg19.vcf -an MQ -an MQ0 -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an FS -an DP -an BaseQRankSum -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 -numBad 1000 --maxGaussians 4 -recalFile /cluster11/podlaha/AllelicImbalance/Data/GATK_VCF_Output/13_VariantRecalibrator/13_all_snp.recal -tranchesFile /cluster11/podlaha/AllelicImbalance/Data/GATK_VCF_Output/13_VariantRecalibrator/13_all_snp.tranches -rscriptFile /cluster11/podlaha/AllelicImbalance/Data/GATK_VCF_Output/13_VariantRecalibrator/13_all_snp.plots.R

Running GATK 2.7.4. and R version 3.0.2 (2013-09-25) -- "Frisbee Sailing" and java version "1.7.0_40"

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,293Administrator, GSA Member admin

    Hi there,

    What makes you think the program crashed? Typically if the program crashes it spits out a crash report (stack trace). Also, if it's just the Rscript that crashes, it doesn't crash the java program; it just produces a warning line in the console output.

    Geraldine Van der Auwera, PhD

  • ondrej77ondrej77 Posts: 2Member

    I partially "resolved" the issue by running it on a server rather than a cluster. I think it crashed because my pipe stops at the VariantRecalibrator (while executing R?) and doesn't continue. The VariantRecalibrator is only one step in my pipe, from formating bam files for GATK to indel realignment, base recalibration, to variant annotation. It simply stops while recalibrating snps and doesn't continue to recalibrating indels and later to applying recalibration ... etc.. As I mentioned earlier, doing this with ~12 samples and -L chr1 only, works great. Once I removed the chr1 interval restriction, it quits without an error. My colleagues have suggested this morning to run it on a server, rather than a cluster, and the pipe finishes with all the plots and final annotated VCF files. I suppose our cluster environment could be the culprit. Up until this point, while building the pipe I was resolving many issues from the stack traces I was getting. But now there is not stack trace from this. What I posted above is stdout.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,293Administrator, GSA Member admin

    Ah, I see. This does sound like a system issue, more than anything specifically wrong with GATK. When GATK crashes because of a bug or internal problem, it is not shy about telling you: it will spit out a lot of information. So if you're just seeing it quit without any further output, that sounds like it was terminated externally by the cluster jobrunner.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.