Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office on October 14, 2019, due to the U.S. holiday. We will return to monitoring the forum on October 15.

problem with reference genome in variant calling

siniskarpsiniskarp University of Oulu, FinlandMember

Hi,
I’m trying to use GATK for variant calling. I have had some problems preparing the hg19 reference genome. I haven’t done the alignment myself, but gotten the .bam files already done. I how ever know that this should be the same reference used for the alignment.
I have downloaded the hg19 from ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz
and downloaded the new version of mitochondrial genome (NC_012920.1).
and then tried:
cat chr*.fa > hg19.fa

After this I have followed the instructions on (how to) Prepare a reference for use with BWA and GATK.

When I try to call the variants using HaplotypeCaller (as instructed on: (howto) Call variants on a diploid genome with the HaplotypeCaller):

java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R hg19.fa -I 8526RU.rmdup.bam -L 20 --genotyping_mode DISCOVERY --output_mode EMIT_VARIANTS_ONLY -stand_emit_conf 10 -stand_call_conf 30 -o raw_variants8526RU.vcf

I get the error message: “Badly formed genome loc: Contig '20' does not match any contig in the GATK sequence dictionary derived from the reference; are you sure you are using the correct reference fasta file?”
Can you tell me what the problem is? And how to fix this?
I know there have been some similar questions considering the contigs, but I haven’t been able to solve the problem based on them.

Thank you,
Sini

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera admin Cambridge, MAMember, Administrator, Broadie admin

    Hi Sini, that's because you didn't adapt the command line that you copied (which includes an argument to limit the run to chromosome 20) to your reference. Our example commands use the b37 reference, which has numbers for contig names, so when they use an interval it looks like -L 20. But with hg19, which has 'chr' prepended to the contig numbers, it should look like -L chr20.

  • marinjorianmarinjorian PhiladelphiaMember
    edited August 2014

    It is somehow confusing to put the -L 20 argument on the tutorial because people like me who overlooked the function of -L might think that it was a standard parameter to use for all scenarios.

  • SheilaSheila admin Broad InstituteMember, Broadie, Moderator admin
    edited August 2014

    @marinjorian‌

    Hi,

    Thanks. I added a note to the tutorial to clarify this.

    -Sheila

Sign In or Register to comment.