We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

VQSR: Bad input: Values for DP annotation not detected for ANY training variant in the input callset

TesaTesa Netherlands Cancer InstituteMember

Hi team,
I'm have a vcf callset file generated using HaplotypeCaller in --emitRefConfidence GVCF mode with subsequent GenotypeGVCFs.
I used the generated output.vcf file as input for VariantRecalibration
The command:
java -jar $GATK_HOME/GenomeAnalysisTK.jar \ -T VariantRecalibrator \ -R $REFERENCE \ -input exome_set_output.vcf \ -resource:hapmap,known=false,training=true,truth=true,prior=15.0 $GOLD_STANDARD_HAPMAP \ -resource:omni,known=false,training=true,truth=true,prior=12.0 $GOLD_STANDARD_OMNI \ -resource:1000G,known=false,training=true,truth=false,prior=10.0 $GOLD_STANDARD_1000G \ -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 $GOLD_STANDARD_DBSNP \ -an DP -an QD -an FS -an SOR -an MQ -an MQRankSum -an ReadPosRankSum -an InbreedingCoeff \ -mode SNP \ -tranche 100.0 -tranche 99.9 \ -recalFile exome_set_output_SNP.recal \ -tranchesFile recal_SNP.tranches \ -rscriptFile recal_SNP_plots.R \

But I get the following error:
##### ERROR MESSAGE: Bad input: Values for DP annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations.

My input vcf file looks like this:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT RTN005 RTN007 RTN009 RTN024 RTN028 RTN038 RTN039 RTN045 RTN051 RTN097 RTN102 RTN108 RTN122 RTN126 RTN127 RTN133 1 762273 . G A 23942.06 . AC=31;AF=0.969;AN=32;BaseQRankSum=-9.420e-01;ClippingRankSum=-6.500e-02;DP=743;FS=2.053;GQ_MEAN=139.56;GQ_STDDEV=44.68;InbreedingCoeff=-0.0323;MLEAC=31;MLEAF=0.969;MQ=42.48;MQ0=0;MQRankSum=-2.517e+00;NCC=0;QD=32.27;ReadPosRankSum=-1.485e+00 GT:AD:DP:GQ:PL 1/1:0,42:42:99:1426,126,0 1/1:0,28:28:84:945,84,0 1/1:0,69:69:99:2430,208,0 1/1:0,38:38:99:1295,114,0 1/1:0,54:54:99:1876,162,0 1/1:0,28:28:84:977,84,0 1/1:0,37:37:99:1282,111,0 1/1:0,65:65:99:2207,195,0 1/1:0,46:46:99:1572,138,0 1/1:0,45:45:99:1565,135,0 1/1:0,60:60:99:2019,180,0 1/1:0,52:52:99:1758,156,0 1/1:0,69:69:99:2404,208,0 1/1:0,19:19:57:657,57,0 1/1:0,41:41:99:1413,123,0 0/1:40,9:49:99:152,0,1320 1 762353 . G C 59.07 . AC=1;AF=0.031;AN=32;BaseQRankSum=0.111;ClippingRankSum=-5.560e-01;DP=321;FS=0.000;GQ_MEAN=51.81;GQ_STDDEV=19.63;InbreedingCoeff=-0.0328;MLEAC=1;MLEAF=0.031;MQ=42.47;MQ0=0;MQRankSum=0.779;NCC=0;QD=1.97;ReadPosRankSum=0.501 GT:AD:DP:GQ:PL 0/0:22,0:22:63:0,63,945 0/0:23,0:23:60:0,60,832 0/0:15,0:15:23:0,23,574 0/0:22,0:22:60:0,60,900 0/0:26,0:26:63:0,63,945 0/0:8,0:8:21:0,21,303 0/0:15,0:15:42:0,42,489 0/0:26,0:26:60:0,60,900 0/0:11,0:11:30:0,30,450 0/0:23,0:23:63:0,63,945 0/1:25,5:30:95:95,0,783 0/0:23,0:23:60:0,60,900 0/0:29,0:29:63:0,63,945 0/0:20,0:20:60:0,60,685 0/0:15,0:15:36:0,36,540 0/0:13,0:13:30:0,30,450 1 861630 . G A 1958.44 . AC=22;AF=0.688;AN=32;BaseQRankSum=-1.380e+00;ClippingRankSum=0.198;DP=88;FS=7.101;GQ_MEAN=24.38;GQ_STDDEV=23.34;InbreedingCoeff=0.0754;MLEAC=25;MLEAF=0.781;MQ=60.00;MQ0=0;MQRankSum=0.00;NCC=0;QD=25.43;ReadPosRankSum=0.720 GT:AD:DP:GQ:PL 0/0:3,0:3:0:0,0,51 0/1:4,2:6:46:46,0,146 1/1:0,4:4:12:133,12,0 1/1:0,4:4:12:133,12,0 1/1:0,6:6:18:197,18,0 1/1:0,2:2:6:64,6,0 1/1:0,7:7:21:209,21,0 1/1:0,8:8:24:264,24,0 1/1:0,7:7:21:232,21,0 0/1:4,3:7:76:76,0,137 0/0:4,0:4:0:0,0,93 0/1:2,4:6:64:113,0,64 0/1:2,5:7:51:135,0,51 1/1:0,10:10:30:321,30,0 1/1:0,3:3:9:100,9,0 0/0:4,0:4:0:0,0,88

Can someone point me to what I'm doing wrong? I can see that there are DP values in the vcf file so I don't understand why it complains there aren't any annotations.

Thanks very much,

Best Answer


  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    Hi Tesa,

    Check your training files to make sure they have the DP annotation. It looks like this is an issue with your input resource files.


  • TesaTesa Netherlands Cancer InstituteMember

    Hi Sheila,
    I guess I don't understand the training files. I downloaded the training files from the resource bundle. And based on how many are generated, I don't expect them to have DP annotation. There are no .bam files to annotate DP.

    A head of the dbsnp_138.b37.vcf.gz file:
    #CHROM POS ID REF ALT QUAL FILTER INFO 1 10019 rs376643643 TA T . . OTHERKG;R5;RS=376643643;RSPOS=10020;SAO=0;SSR=0;VC=DIV;VP=0x050000020001000002000200;WGT=1;dbSNPBuildID=138 1 10054 rs373328635 CAA C,CA . . NOC;OTHERKG;R5;RS=373328635;RSPOS=10055;SAO=0;SSR=0;VC=DIV;VP=0x050000020001000002000210;WGT=1;dbSNPBuildID=138 1 10109 rs376007522 A T . . OTHERKG;R5;RS=376007522;RSPOS=10109;SAO=0;SSR=0;VC=SNV;VP=0x050000020001000002000100;WGT=1;dbSNPBuildID=138

    Do I just need to turn off the -an DP in VariantRecalibrator?
    I used it because that is how the documentation shows the to do it.


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @Tesa Actually the training files don't need to have any of the annotations specified for modeling, they are just used to select sites based on position out of your dataset. The model will be trained using the annotations of those sites in your data.

    I see you have DP annotated in your callset so the problem is probably one of the more obscure issues we've seen with VQSR over the years (there are a couple where the error message is not actually informative of the real problem).

    Specifically, it may be that the program is not finding enough sites that overlap between the resource and your callset can you check what number of sites they have in common?

  • TesaTesa Netherlands Cancer InstituteMember

    Hi Geraldine, I only checked dbsnp_138.b37.vcf because it took quite some time to check.

    Command for vcftools:
    vcftools --vcf exome_set_output.vcf --diff dbsnp_138.b37.sorted.vcf > overlap_dbsnp.txt

    The output is:
    Found 249460 SNPs common to both files. Found 21917 SNPs only in main file. Found 58962115 SNPs only in second file.
    So there is a lot of overlap at least in this file with my vcf.

    Thanks for looking into it,

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @Tesa, I just realized I never got back to you on this, sorry! Next time please don't hesitate to ping us if you don't get a response within a few days.

    That should be enough for VQSR to work (even though 15 exomes is below our recommendation of 30 samples -- this may cause you further trouble when you try to recalibrate indels) so I'm not sure why you're getting this error. Any chance your $GOLD_STANDARD_DBSNP variable is not pointing to the right file? We have seen that happen. Btw, I wouldn't call a dsbnp resource a "gold standard" of anything... there's a lot of junk in there, which is why we don't use dbsnp for training the model, only for differentiating novel vs known variants.

  • TesaTesa Netherlands Cancer InstituteMember

    Hi Geraldine,
    Thanks for getting back to me. I know the number of exomes is below recommendations, but I thought I'd give it a try. The $GOLD_STANDARD file is downloaded from the resource bundle, but it wasn't sorted and annotated the same as my human reference.

    To fix this, I ran the perl script vcfsorter to (https://code.google.com/p/vcfsorter/) to re-sort the file based on my reference. Is it possible that is the problem? If so, what do you recommend to re-sort/annotate ('chr1' to '1', 'chrMT' to 'M') a bundle file? And thanks for the reminder that dbSNP is full of junk. I mean, there's even COSMIC variants in there!

    Thanks for the help,

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    Hi Tesa,

    Ah, so you are using b37 as your Reference, but the hg19 resource files? You can download the b37 resource files from our bundle. They are in the b37 directory. I hope this fixed the issue.


  • TesaTesa Netherlands Cancer InstituteMember

    Hi Sheila, I am using b37 (Homo_sapiens.GRCh37) for both reference and downloaded resource files (dbsnp_138.b37.vcf.gz).

    But when I first began the re-alignment process and ran RealignerTargetCreator there was an error and the detailed message said my resource and reference file had incompatible contigs.

    `ERROR MESSAGE: Input files /home/NKI/t.severson/tools/resources/hapmap_3.3.b37.vcf and reference have incompatible contigs: Relative ordering of overlapping contigs differs, which is unsafe.

    ERROR /home/NKI/t.severson/tools/resources/hapmap_3.3.b37.vcf contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, MT]
    ERROR reference contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, X, Y, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, MT]`

    So I re-sorted with a tool which uses the reference, vcfsorter.pl.
    I then re-aligned the files and did base recalibration and ran HaplotypeCaller in --emitRefConfidence GVCF mode. All of those processes seemed to work fine.

    Now, when I try to run VariantRecalibrator it starts up fine and gives a progress meter but I get the warning that there are not enough sites that overlap between the callset and the training files but we've determined there should be enough sites.

    The only thing I can think is that something is going wrong with the re-sort of the files, but earlier GATK steps didn't throw an error. Is there an official way to re-sort files for when these errors are thrown? Any help would be greatly appreciated.

  • TesaTesa Netherlands Cancer InstituteMember

    Hi Geraldine,
    I think that was the problem. One of the files I had was malformed, I'm not sure what happened. But the header was right so I think that's why it passed through the other steps ok.
    Now when I run VariantRecalibrator I get a different error, which I suspect is related to the relatively low amount of overlapping data in the omni training sites (79008 sites), but that is not for this thread. And it's probably because I'm using too few samples.
    Thanks very much for your help, I really appreciate it,

  • SyedSyed IndiaMember

    Hi All,
    How do you know that files are Malformed ? I am getting exactly same problem too.

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭
  • jmcclurejmcclure UMassMedMember, Broadie

    I'm getting the same error at the same stage of my analysis (attempting VQSR).

    This error message is misleading - the DP annotation does indeed exist at the location of training variants in my input vcf. There are no annotations in my training sets, but I believe these are just used for the variant location and the only annotations needed are in the sample data, correct?

    Perhaps I do not have enough samples. I see above 30 exomes are recommended and 15 may not work. I have 21 WGS samples from dogs - is this an insufficient sample number for VQSR and could that be the cause of this error message? If not, what else can I try?

    I've tried to follow the comment above on ValidateVariants, but I'm not clear on how that'd be used. It clearly can't be used on the training/truth sets as those do not have subject/sample columns and according to the documentation one of the checks of ValidateVariants is to ensure every alternate allele is present in at least one sample. Perhaps ValdiateVariants could be run on my input sample vcf - but this came directly from other GATK tools: HaplotypeCaller -> GenotypeGVCFs -> CatVariants.

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭


    Yes, VariantRecalibration only uses the sites in the resource files.

    Can you please tell us the exact command you ran and what version of GATK you are using? Also, what kind of data are you working with? Whole genome or whole exome?


  • alongaloralongalor Member
    edited September 2017

    Unfortunately I am getting the same error using VariantRecalibrator for Indels. Please see attached stderr. Also attaching my pipeline and including the relevant command below:

    ${Paths["java"]} -jar ${Paths["gatk"]} \
          -T VariantRecalibrator \
          -R ${Paths["refFasta"]} \
          -input ${Raw_VCF} \
          -recalFile ${CohortName}_${Chromosome}_Indels.recal \
          -tranchesFile ${CohortName}_${Chromosome}_Indels.tranches \
          --maxGaussians 4 \
          -resource:mills,known=false,training=true,truth=true,prior=12.0 ${Paths["mills"]} \
          -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 ${Paths["dbsnp"]} \
          -an QD -an DP -an FS -an SOR -an ReadPosRankSum -an MQRankSum ${default="None" InbreedingCoeff} \
          -mode INDEL \


    Paths["dbsnp"] = "dbsnp_138.b37.vcf" 
    Paths["mills"] = "/home/sl279/BiO/Install/GATK-bundle/2.8/b37/Mills_and_1000G_gold_standard.indels.b37.vcf"

    My input vcf also has the DP annotation among others (which is allegedly the source of the error?).

    I should also mention I used only 2 .bam files in my pipeline, both at a size of ~1G.

    Also, the file that is generating the error is named Synapse_MT_rawVariants.vcf.idx - this is a .vcf split by chromosome, with information for MT.

    Any help here would be much appreciated.

    Thanks a lot,


    Post edited by alongalor on
  • I should mention I also edited the above command to include an -L ${Chromosome} option. Still get the same error.

  • Still unable to get past this error! Any help would be much appreciated! Thanks so much!

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭


    Are you using at least 30 exomes or 1 genome? This error usually occurs when the tool does not have enough data to produce a good model.


  • Strange. I thought I already posted that I resolved this by using 2 genomes. Thanks a lot for your help Sheila!

  • hi , I got the same problem . And I'm dealing with the whole genome. My input vcf file does have annotion DP. Anyone has already figured out it ?
    thanks !

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭
    edited November 2017


    Is it possible to add some more genomes to your dataset?

    If not, can you try adding -maxGaussians 4 to your command.


  • Hey ho,

    I guess it's not fun to open this discussion again, but I got the same error message and none of the solutions here and elsewhere have helped to far. This is the setting:
    48 whole human genomes processed following your guideline for germline calling. I now have one multi-sample file combined from cohort gvcfs, which is 353 GB in size.

    It all went fine until VQSR:

    java -jar gatk-package- VariantRecalibrator
    -variant combined_VC_1_48.g.vcf.gz
    --resource dbsnp,known=true,training=false,truth=false,prior=2.0:/hg38_p/Homo_sapiens_assembly38.dbsnp138.vcf
    --resource 1000G,known=false,training=true,truth=false,prior=10.0:ftp.broadinstitute.org/bundle/hg38/1000G_phase1.snps.high_confidence.hg38.vcf
    --resource hapmap,known=false,training=true,truth=true,prior=15.0:ftp.broadinstitute.org/bundle/hg38/hapmap_3.3.hg38.vcf
    --resource omni,known=false,training=true,truth=false,prior=12.0:ftp.broadinstitute.org/bundle/hg38/1000G_omni2.5.hg38.vcf
    -an DP
    -an MQRankSum
    -an ReadPosRankSum
    -mode SNP
    -O recalibrate_SNP.recal
    -tranches-file recalibrate_SNP.tranches
    -rscript-file recalibrate_SNP_plots.R
    -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0

    18:29:43.329 INFO ProgressMeter - chrX:151155680 256.8 2808817000 10939762.4
    18:29:53.330 INFO ProgressMeter - chrX:153240702 256.9 2810808000 10940415.2
    18:30:03.330 INFO ProgressMeter - chrX:155284029 257.1 2812644000 10940463.5
    18:30:13.330 INFO ProgressMeter - chrY:4664123 257.3 2814835000 10941892.4
    18:30:23.332 INFO ProgressMeter - chrY:7102470 257.4 2817067000 10943478.0
    18:30:33.331 INFO ProgressMeter - chrY:9541992 257.6 2819294000 10945042.8
    18:30:43.335 INFO ProgressMeter - chrY:12260445 257.8 2821308000 10945776.5
    18:30:53.336 INFO ProgressMeter - chrY:14567034 257.9 2823529000 10947313.9
    18:31:03.337 INFO ProgressMeter - chrY:17004945 258.1 2825757000 10948876.4
    18:31:13.337 INFO ProgressMeter - chrY:20632749 258.3 2827981000 10950422.1
    18:31:23.337 INFO ProgressMeter - chrY:56765403 258.4 2830217000 10952012.3
    18:31:23.707 INFO ProgressMeter - chrY_KI270740v1_random:35796 258.4 2830274953 10951975.9
    18:31:23.707 INFO ProgressMeter - Traversal complete. Processed 2830274953 total variants in 258.4 minutes.
    18:31:23.734 INFO VariantRecalibrator - Shutting down engine
    [5. Juni 2019 18:31:23 MESZ] org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator done. Elapsed time: 258.45 minutes.

    A USER ERROR has occurred: Bad input: Values for DP annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations.


    So it runs for more than 6 hours through all chromosomes and then stops with an error on DP annotation.

    I checked that my .vcf contains DP annotation. I also tried the same command with all suggested -an fields but then tried to reduce to the very basic I can be sure they exist in my file.

    I checked the overlap with dbSNP (can't recall how much, but there was sufficient overlap) and intersect with hapmap has been running for three days and has produced about 1.3 million variants of overlap - so I guess that's enough (?).

    I tried to give paths to resources in different ways (global and local paths). I tried unzipped and gzipped resource files. I also downloaded the .gz.tbi files for all resources, but was not sure if they are needed/used at all.

    The error stays - so I would be really grateful for help and suggestions.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited June 2019

    HI @millchmaedchen

    Here are some recommendations:
    1) please post the exact command used to create "one multi-sample file combined from cohort gvcfs"
    2) Validate your gvcf
    3) Can you try to rerun VariantRecalibrator with the latest v4.1.2.0 and see if the error persists.

  • Dear Bhanu,
    thank you for your reply! While preparing my answer I found a mistake in the Genotyping command and thus I guess I tried to recalibrate the variants that were not genotyped.
    Thanks a lot for the rubber duck debugging ;-)
    VariantRecalibration has completed successfully now!
  • SakhaaSakhaa Member
    edited September 2019

    Hello everyone,
    Sorry for re-open distinction again, I got the same error, and I am using Gatk 4.1 version. and the data is one sample of WGS.

    the first step I used the Baserecalipration , then variantrecalpration. Is that fine?

    and my command is :

    gatk VariantRecalibrator
    -V SNP_INDLS_only_VCF_T.vcf 
    -O Test_snps_recal_data.recal 
    --rscript-file Test_recalibrate_SNP_plots.R 
    --tranches-file VR_Test_snps.tranches 
    -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -mode SNP
    -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 
    --max-gaussians 6
    --resource:hapmap,known=false,training=true,truth=true,prior=15.0 ${hapmap}
    --resource:omni,known=false,training=true,truth=true,prior=12.0 ${Omi2} 
    --resource:1000G,known=false,training=true,truth=false,prior=10.0 ${G1000} 
    --resource:dbsnp,known=true,training=false,truth=false,prior=2.0 ${dbSNP}

    and I got the new error and this is my log

    17:29:38.736 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/sw/csi/gatk/!/com/intel/gkl/native/libgkl_compression.so
    Aug 30, 2019 5:29:40 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    17:29:40.762 INFO VariantRecalibrator - ------------------------------------------------------------
    17:29:40.763 INFO VariantRecalibrator - The Genome Analysis Toolkit (GATK) v4.1.2.0
    17:29:40.764 INFO VariantRecalibrator - For support and documentation go to https://software.broadinstitute.org/gatk/
    17:29:40.764 INFO VariantRecalibrator - Executing as [email protected] on Linux v3.10.0-957.12.1.el7.x86_64 amd64
    17:29:40.764 INFO VariantRecalibrator - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_162-b12
    17:29:40.765 INFO VariantRecalibrator - Start Date/Time: August 30, 2019 5:29:38 PM AST
    17:29:40.765 INFO VariantRecalibrator - ------------------------------------------------------------
    17:29:40.765 INFO VariantRecalibrator - ------------------------------------------------------------
    17:29:40.766 INFO VariantRecalibrator - HTSJDK Version: 2.19.0
    17:29:40.766 INFO VariantRecalibrator - Picard Version: 2.19.0
    17:29:40.767 INFO VariantRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    17:29:40.767 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    17:29:40.767 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    17:29:40.767 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    17:29:40.768 INFO VariantRecalibrator - Deflater: IntelDeflater
    17:29:40.768 INFO VariantRecalibrator - Inflater: IntelInflater
    17:29:40.768 INFO VariantRecalibrator - GCS max retries/reopens: 20
    17:29:40.768 INFO VariantRecalibrator - Requester pays: disabled
    17:29:40.769 INFO VariantRecalibrator - Initializing engine
    17:29:40.996 INFO FeatureManager - Using codec VCFCodec to read file file:///ibex/scratch/projects/c2014/genomics/reference/hg37/hapmap_3.3.b37.vcf
    17:29:41.053 INFO FeatureManager - Using codec VCFCodec to read file file:///ibex/scratch/projects/c2014/genomics/reference/hg37/1000G_omni2.5.b37.vcf
    17:29:41.074 INFO FeatureManager - Using codec VCFCodec to read file file:///ibex/scratch/projects/c2014/genomics/reference/hg37/1000G_phase1.snps.high_confidence.b37.vcf
    17:29:41.138 INFO FeatureManager - Using codec VCFCodec to read file file:///ibex/scratch/projects/c2014/genomics/reference/hg37/dbsnp_138.b37.vcf
    17:29:41.206 INFO FeatureManager - Using codec VCFCodec to read file file:///ibex/scratch/projects/c2029/PUBLIC_NIST/VCF/RMNISTHS_30xdownsample_SNP_INDLS_only_VCF_T.vcf
    17:29:45.164 WARN IndexUtils - Feature file "/ibex/scratch/projects/c2014/genomics/reference/hg37/1000G_phase1.snps.high_confidence.b37.vcf" appears to contain no sequence dictionary. Attempting to retrieve a sequence dictionary from the associated index file
    17:29:45.212 INFO VariantRecalibrator - Done initializing engine
    17:29:45.216 INFO TrainingSet - Found hapmap track: Known = false Training = true Truth = true Prior = Q15.0
    17:29:45.216 INFO TrainingSet - Found omni track: Known = false Training = true Truth = true Prior = Q12.0
    17:29:45.216 INFO TrainingSet - Found 1000G track: Known = false Training = true Truth = false Prior = Q10.0
    17:29:45.216 INFO TrainingSet - Found dbsnp track: Known = true Training = false Truth = false Prior = Q2.0
    17:29:45.225 WARN GATKVariantContextUtils - Can't determine output variant file format from output file extension "recal". Defaulting to VCF.

    And this is the error:

    A USER ERROR has occurred: Bad input: Values for QD annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations" 

    May I know where is the issue? and how I fix it ?

    Post edited by bshifaw on
  • bshifawbshifaw Member, Broadie, Moderator admin


    Try validating your VCF and try using resources files from the resource bundle to avoid compatibility issues.

  • SakhaaSakhaa Member

    Thank you @bshifaw I will give it a try

Sign In or Register to comment.