Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

VariantRecalibrator returning empty result files, no error, just "Killed"

LindsayLiangLindsayLiang Member
edited September 2017 in Ask the GATK team

Hi, I'm running the VariantRecalibrator step on a pretty small data set (50 samples in the cohort, but only for Chr21 from a whole exome sequencing project), and GATK is returning empty result files (without throwing errors), and is terminating early.

The output is as follows:

INFO  17:53:58,900 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  17:53:58,908 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7-0-gcfedb67, Compiled 2016/12/12 11:21:18 
INFO  17:53:58,908 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute 
INFO  17:53:58,909 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk 
INFO  17:53:58,909 HelpFormatter - [Thu Sep 21 17:53:58 UTC 2017] Executing on Linux 4.9.41-moby amd64 
INFO  17:53:58,910 HelpFormatter - OpenJDK 64-Bit Server VM 1.8.0_102-8u102-b14.1-1~bpo8+1-b14 
INFO  17:53:58,914 HelpFormatter - Program Args: -T VariantRecalibrator -R /vqsr_snp_model/localDir/human_g1k_v37.fasta -nt 8 -mode SNP -input /vqsr_snp_model/localDir/cohort.gt.vcf -recalFile /vqsr_snp_model/localDir/Output/cohort.gt.snp.recal.model -tranchesFile /vqsr_snp_model/localDir/Output/cohort.gt.snp.tranches -rscriptFile /vqsr_snp_model/localDir/Output/cohort.gt.snp.plots.R --use_annotation QD --use_annotation MQ --use_annotation MQRankSum --use_annotation FS --use_annotation SOR --resource:hapmap,known=false,training=true,truth=true,prior=15.0 /vqsr_snp_model/localDir/hapmap_3.3.b37.vcf --resource:omni,known=false,training=true,truth=true,prior=12.0 /vqsr_snp_model/localDir/1000G_omni2.5.b37.vcf --resource:1000G,known=false,training=true,truth=false,prior=10.0 /vqsr_snp_model/localDir/1000G_phase1.snps.high_confidence.b37.vcf --resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /vqsr_snp_model/localDir/dbsnp_138.b37.vcf 
INFO  17:53:58,929 HelpFormatter - Executing as [email protected] on Linux 4.9.41-moby amd64; OpenJDK 64-Bit Server VM 1.8.0_102-8u102-b14.1-1~bpo8+1-b14. 
INFO  17:53:58,929 HelpFormatter - Date/Time: 2017/09/21 17:53:58 
INFO  17:53:58,930 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  17:53:58,930 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  17:53:58,983 GenomeAnalysisEngine - Strictness is SILENT 
INFO  17:53:59,163 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 
INFO  17:53:59,805 MicroScheduler - Running the GATK in parallel mode with 8 total threads, 1 CPU thread(s) for each of 8 data thread(s), of 4 processors available on this machine 
WARN  17:53:59,805 MicroScheduler - Number of requested GATK threads 8 is more than the number of available processors on this machine 4 
INFO  17:54:00,034 GenomeAnalysisEngine - Preparing for traversal 
INFO  17:54:00,042 GenomeAnalysisEngine - Done preparing for traversal 
INFO  17:54:00,043 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining 
INFO  17:54:00,043 ProgressMeter -        Location |     sites | elapsed |     sites | completed | runtime |   runtime 
INFO  17:54:00,054 TrainingSet - Found hapmap track:    Known = false   Training = true     Truth = true    Prior = Q15.0 
INFO  17:54:00,054 TrainingSet - Found omni track:  Known = false   Training = true     Truth = true    Prior = Q12.0 
INFO  17:54:00,055 TrainingSet - Found 1000G track:     Known = false   Training = true     Truth = false   Prior = Q10.0 
INFO  17:54:00,055 TrainingSet - Found dbsnp track:     Known = true    Training = false    Truth = false   Prior = Q2.0 

The input was:

java -Xmx12g 
-jar /usr/GenomeAnalysisTK.jar 
-T VariantRecalibrator 
-R human_g1k_v37.fasta
-nt 8 
-mode SNP
-input cohort.gt.vcf 
-recalFile cohort.gt.snp.recal.model 
-tranchesFile cohort.gt.snp.tranches 
-rscriptFile cohort.gt.snp.plots.R 
--use_annotation QD 
--use_annotation MQ 
--use_annotation MQRankSum 
--use_annotation FS 
--use_annotation SOR 
--use_annotation ReadPosRankSum
--resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.b37.vcf 
--resource:omni,known=false,training=true,truth=true,prior=12.0 1000G_omni2.5.b37.vcf
--resource:1000G,known=false,training=true,truth=false,prior=10.0 1000G_phase1.snps.high_confidence.b37.vcf
--resource:dbsnp,known=true,training=false,truth=false,prior=2.0" dbsnp_138.b37.vcf


Edit: I ran the same command again with all chromosomes of the whole exome sequences and the same error occured

Post edited by LindsayLiang on


Sign In or Register to comment.