The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Powered by Vanilla. Made with Bootstrap.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.
Register now for the upcoming GATK Best Practices workshop, Feb 20-22 in Leuven, Belgium. Open to all comers! More info and signup at

problem with reference genome in variant calling

siniskarpsiniskarp University of Oulu, FinlandMember Posts: 1

I’m trying to use GATK for variant calling. I have had some problems preparing the hg19 reference genome. I haven’t done the alignment myself, but gotten the .bam files already done. I how ever know that this should be the same reference used for the alignment.
I have downloaded the hg19 from
and downloaded the new version of mitochondrial genome (NC_012920.1).
and then tried:
cat chr*.fa > hg19.fa

After this I have followed the instructions on (how to) Prepare a reference for use with BWA and GATK.

When I try to call the variants using HaplotypeCaller (as instructed on: (howto) Call variants on a diploid genome with the HaplotypeCaller):

java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R hg19.fa -I 8526RU.rmdup.bam -L 20 --genotyping_mode DISCOVERY --output_mode EMIT_VARIANTS_ONLY -stand_emit_conf 10 -stand_call_conf 30 -o raw_variants8526RU.vcf

I get the error message: “Badly formed genome loc: Contig '20' does not match any contig in the GATK sequence dictionary derived from the reference; are you sure you are using the correct reference fasta file?”
Can you tell me what the problem is? And how to fix this?
I know there have been some similar questions considering the contigs, but I haven’t been able to solve the problem based on them.

Thank you,


  • Geraldine_VdAuweraGeraldine_VdAuwera Administrator, Dev Posts: 11,118 admin

    Hi Sini, that's because you didn't adapt the command line that you copied (which includes an argument to limit the run to chromosome 20) to your reference. Our example commands use the b37 reference, which has numbers for contig names, so when they use an interval it looks like -L 20. But with hg19, which has 'chr' prepended to the contig numbers, it should look like -L chr20.

    Geraldine Van der Auwera, PhD

  • marinjorianmarinjorian PhiladelphiaMember Posts: 1
    edited August 2014

    It is somehow confusing to put the -L 20 argument on the tutorial because people like me who overlooked the function of -L might think that it was a standard parameter to use for all scenarios.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator, Dev Posts: 4,443 admin
    edited August 2014



    Thanks. I added a note to the tutorial to clarify this.


Sign In or Register to comment.