problem with reference genome in variant calling

siniskarpsiniskarp University of Oulu, FinlandPosts: 1Member

Hi,
I’m trying to use GATK for variant calling. I have had some problems preparing the hg19 reference genome. I haven’t done the alignment myself, but gotten the .bam files already done. I how ever know that this should be the same reference used for the alignment.
I have downloaded the hg19 from ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz
and downloaded the new version of mitochondrial genome (NC_012920.1).
and then tried:
cat chr*.fa > hg19.fa

After this I have followed the instructions on (how to) Prepare a reference for use with BWA and GATK.

When I try to call the variants using HaplotypeCaller (as instructed on: (howto) Call variants on a diploid genome with the HaplotypeCaller):

java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R hg19.fa -I 8526RU.rmdup.bam -L 20 --genotyping_mode DISCOVERY --output_mode EMIT_VARIANTS_ONLY -stand_emit_conf 10 -stand_call_conf 30 -o raw_variants8526RU.vcf

I get the error message: “Badly formed genome loc: Contig '20' does not match any contig in the GATK sequence dictionary derived from the reference; are you sure you are using the correct reference fasta file?”
Can you tell me what the problem is? And how to fix this?
I know there have been some similar questions considering the contigs, but I haven’t been able to solve the problem based on them.

Thank you,
Sini

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,023Administrator, GATK Dev admin

    Hi Sini, that's because you didn't adapt the command line that you copied (which includes an argument to limit the run to chromosome 20) to your reference. Our example commands use the b37 reference, which has numbers for contig names, so when they use an interval it looks like -L 20. But with hg19, which has 'chr' prepended to the contig numbers, it should look like -L chr20.

    Geraldine Van der Auwera, PhD

  • marinjorianmarinjorian PhiladelphiaPosts: 1Member
    edited August 2014

    It is somehow confusing to put the -L 20 argument on the tutorial because people like me who overlooked the function of -L might think that it was a standard parameter to use for all scenarios.

    Post edited by marinjorian on
  • SheilaSheila Broad InstitutePosts: 1,405Member, GATK Dev, Broadie, Moderator, DSDE Dev admin
    edited August 2014

    @marinjorian

    Hi,

    Thanks. I added a note to the tutorial to clarify this.

    -Sheila

    Post edited by Sheila on
Sign In or Register to comment.