We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Having trouble with variantrecalibrator

christyvjchristyvj MelbourneMember

Hi there. I am hoping someone might be able to help me. We are trying to test the performance of GATK’s variant calling and comparing it to our current method of using samtools.

Our first test is on a small region on chromosome 29 for ~200 animals. So far we have successfully run HaplotypeCaller to generate the .g.vcf files using the following command:

java -Xmx4G -jar /group/dairy/christy/gatk_test/GenomeAnalysisTK.jar -R ${TMPDIR}/umd_3_1_reference_1000_bull_genomes.fa -T HaplotypeCaller -I ${TMPDIR}/SIMUSAM000002144976.realigned.recalibrated.bam -o SIMUSAM000002144976.realigned.recalibrated.g.vcf -L Chr29:1000000-2000000 -ERC GVCF -variant_index_type LINEAR -variant_index_parameter 128000

And then performed joint calling with GenotypeGVCFs with the following command:

java -Xmx4G -jar /group/dairy/christy/gatk_test/GenomeAnalysisTK.jar -R ${TMPDIR}/umd_3_1_reference_1000_bull_genomes.fa -T GenotypeGVCFs -V ${TMPDIR}/mygvcfs.list -o jointcalls.vcf

We now would like to filter our variants using VariantRecalibrator but are running into some difficulties and we haven't been able to find an answer in any forums. We have created our own vcf file as a “resource” which consists of all Chr29 SNP from a HD chip. I have pasted the first 10 lines of our vcf file below. VariantRecalibrator seems to accept the vcf file and tries to run, gives a warning “WARNING: Training with very few variant sites!” and then says a runtime error has occurred. We are very new at this and can’t figure out what we are doing wrong. Is it because the vcf file is missing data? Or because we are only looking at a small region? Or is it something to do with how we have specified the –resource variable (I have pasted the command below). Any advice/help would be greatly appreciated.


VariantRecalibrator command:

java -Xmx4G -jar /group/dairy/christy/gatk_test/GenomeAnalysisTK.jar -R ${TMPDIR}/umd_3_1_reference_1000_bull_genomes.fa -T VariantRecalibrator -input jointcalls.vcf -resource:800k,known=false,training=true,truth=true,prior=2.0 sorted_800k_truth_set.vcf -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -an InbreedingCoeff -mode SNP -recalFile output.AS.recal -tranchesFile output.AS.tranches -rscriptFile output.plots.AS.R

VCF file format:



Chr29 323185 BOVINEHD2900000018 A C . . .
Chr29 330231 BOVINEHD2900000019 A G . . .
Chr29 376652 BOVINEHD2900000025 A G . . .
Chr29 400533 ARS-BFGL-NGS-38901 A C . . .
Chr29 486153 BOVINEHD2900000037 A G . . .
Chr29 570024 BOVINEHD2900000049 A G . . .
Chr29 581858 BOVINEHD2900000052 A G . . .
Chr29 589629 HAPMAP40201-BTA-105062 A C . . .


Sign In or Register to comment.