Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

FASTA problem - RealignerTargetCreator

SJLSJL Member
edited February 2013 in Ask the GATK team

Hi,
I am having problem to run the RealignerTargetCreator.

java -Xmx4g -jar /mit/sjlabrie/software/GenomeAnalysisTK-2.3-6-gebbba25/GenomeAnalysisTK.jar \ -T RealignerTargetCreator \ -R $fasta \ -I alnSortedNoDupRG1.bam \ -o alnSortedNoDupRG1.intervals

Here is the error message:

ERROR MESSAGE: Invalid command line: Failed to load reference dictionary

When I create a dictionary with a Picard CreateSequenceDictionary it seems good:

@HD VN:1.4 SO:unsorted @SQ SN:m4-202 LN:37850 UR:file:/net/eaps-80-11/data/sjlabrie/m4-202_AAAA/m4_202_reaRev.gb.txt M5:c34f2bad5f5667604f34a26cd8baf86e

This file has a .txt extension. To make everything conform with my internal nomenclature, I renamed that file .fasta:

mv fl.txt fl.fasta
Then recreate a dict with CreateSequenceDictionary and here is the new dictionary:
@HD VN:1.4 SO:unsorted

This is really puzzling me and driving me somewhere I don't really want to go.

Thank you for your help,

Simon

Best Answer

Answers

  • KurtKurt Member ✭✭✭

    Did you also create the *fai file with samtools faidx? In the end, you'll need three files; foo.fasta, foo.dict (this is from picard) and foo.fasta.fai. A previous discussion. http://gatkforums.broadinstitute.org/discussion/comment/343

  • SJLSJL Member

    Yes, I ran at the beginning
    bwa index foo.fasta

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    See this article for instructions on how to prepare your reference properly:

    http://www.broadinstitute.org/gatk/guide/article?id=1601

  • FarahFarah Member

    Hi, I'm new to GATK. After following the online course for first timers, I managed to run my test sample through the BWA alignment and mapping, and the Picard sorting, duplicate marking and adding readgroups. I now want to do the GATK local indel realignment. But I get this error when I try to run. I used hg19 as the reference genome from http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz. But to run GATK I am calling the hg19 ref from the GATK bundle ucsc.hg19.fasta.
    It looks like these are not compatible because the contigs are not ordered right (please see error below). Is there an easy way to fix this or do I have to restart from step 1?

    Appreciate any help, thanks!

    ERROR MESSAGE: Input files reads and reference have incompatible contigs: Order of contigs differences, which is unsafe.

    ERROR reads contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chrM]
    ERROR reference contigs = [chrM, chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chr1_gl000191_random, chr1_gl000192_random, chr4_ctg9_hap1, chr4_gl000193_random, chr4_gl000194_random, chr6_apd_hap1, chr6_cox_hap2, chr6_dbb_hap3, chr6_mann_hap4, chr6_mcf_hap5, chr6_qbl_hap6, chr6_ssto_hap7, chr7_gl000195_random, chr8_gl000196_random, chr8_gl000197_random, chr9_gl000198_random, chr9_gl000199_random, chr9_gl000200_random, chr9_gl000201_random, chr11_gl000202_random, chr17_ctg5_hap1, chr17_gl000203_random, chr17_gl000204_random, chr17_gl000205_random, chr17_gl000206_random, chr18_gl000207_random, chr19_gl000208_random, chr19_gl000209_random, chr21_gl000210_random, chrUn_gl000211, chrUn_gl000212, chrUn_gl000213, chrUn_gl000214, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chrUn_gl000218, chrUn_gl000219, chrUn_gl000220, chrUn_gl000221, chrUn_gl000222, chrUn_gl000223, chrUn_gl000224, chrUn_gl000225, chrUn_gl000226, chrUn_gl000227, chrUn_gl000228, chrUn_gl000229, chrUn_gl000230, chrUn_gl000231, chrUn_gl000232, chrUn_gl000233, chrUn_gl000234, chrUn_gl000235, chrUn_gl000236, chrUn_gl000237, chrUn_gl000238, chrUn_gl000239, chrUn_gl000240, chrUn_gl000241, chrUn_gl000242, chrUn_gl000243, chrUn_gl000244, chrUn_gl000245, chrUn_gl000246, chrUn_gl000247, chrUn_gl000248, chrUn_gl000249]
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    You just need to reorder contigs in your bam. I think we have a doc on that somewhere, try using the search box (upper right corner). (I'm on a phone so I can't give you the exact link right now, sorry)

  • FarahFarah Member

    No worries, I did find something you had posted before. So I used ReorderSam from Picard and it worked ok. I retried the indel realignment using my new contig-correctly ordered-bam file but it returned this new error (below); says it won't work because the Bam file is not indexed. I have all the requisite index files (downloaded from GATK) to support the reference bam already (.dict, .fasta.fai etc), so it can't be that the index files are absent for the reference bam. Do I need to index the input bam file as well?

    ERROR MESSAGE: Invalid command line: Cannot process the provided BAM file(s) because they were not indexed. The GATK does offer limited processing of unindexed BAMs in --unsafe mode, but this GATK feature is currently unsupported.

    Thanks!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Glad you found it. Yes, every BAM file should have an index. We're pretty picky about inputs -- there's a doc that explains why as well (basically, performance and safety).

Sign In or Register to comment.