Bug Bulletin: we have identified a bug that affects indexing when producing gzipped VCFs. This will be fixed in the upcoming 3.2 release; in the meantime you need to reindex gzipped VCFs using Tabix.

problem with reference genome in variant calling

siniskarpsiniskarp University of Oulu, FinlandPosts: 1Member

Hi, I’m trying to use GATK for variant calling. I have had some problems preparing the hg19 reference genome. I haven’t done the alignment myself, but gotten the .bam files already done. I how ever know that this should be the same reference used for the alignment. I have downloaded the hg19 from ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz and downloaded the new version of mitochondrial genome (NC_012920.1). and then tried: cat chr*.fa > hg19.fa

After this I have followed the instructions on (how to) Prepare a reference for use with BWA and GATK.

When I try to call the variants using HaplotypeCaller (as instructed on: (howto) Call variants on a diploid genome with the HaplotypeCaller):

java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R hg19.fa -I 8526RU.rmdup.bam -L 20 --genotyping_mode DISCOVERY --output_mode EMIT_VARIANTS_ONLY -stand_emit_conf 10 -stand_call_conf 30 -o raw_variants8526RU.vcf

I get the error message: “Badly formed genome loc: Contig '20' does not match any contig in the GATK sequence dictionary derived from the reference; are you sure you are using the correct reference fasta file?” Can you tell me what the problem is? And how to fix this? I know there have been some similar questions considering the contigs, but I haven’t been able to solve the problem based on them.

Thank you, Sini

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,213Administrator, GSA Member admin

    Hi Sini, that's because you didn't adapt the command line that you copied (which includes an argument to limit the run to chromosome 20) to your reference. Our example commands use the b37 reference, which has numbers for contig names, so when they use an interval it looks like -L 20. But with hg19, which has 'chr' prepended to the contig numbers, it should look like -L chr20.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.