problem with reference genome in variant calling
I’m trying to use GATK for variant calling. I have had some problems preparing the hg19 reference genome. I haven’t done the alignment myself, but gotten the .bam files already done. I how ever know that this should be the same reference used for the alignment.
I have downloaded the hg19 from ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz
and downloaded the new version of mitochondrial genome (NC_012920.1).
and then tried:
cat chr*.fa > hg19.fa
After this I have followed the instructions on (how to) Prepare a reference for use with BWA and GATK.
When I try to call the variants using HaplotypeCaller (as instructed on: (howto) Call variants on a diploid genome with the HaplotypeCaller):
java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R hg19.fa -I 8526RU.rmdup.bam -L 20 --genotyping_mode DISCOVERY --output_mode EMIT_VARIANTS_ONLY -stand_emit_conf 10 -stand_call_conf 30 -o raw_variants8526RU.vcf
I get the error message: “Badly formed genome loc: Contig '20' does not match any contig in the GATK sequence dictionary derived from the reference; are you sure you are using the correct reference fasta file?”
Can you tell me what the problem is? And how to fix this?
I know there have been some similar questions considering the contigs, but I haven’t been able to solve the problem based on them.