We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

GATK bundle fai files do not work with samtools

csaunderscsaunders Member
edited August 2012 in Ask the GATK team

Hi --

I've had multiple Strelka user issues recently reported which were traced back to non-standard fai files found in the GATK bundle. The problem appears to be that some GATK bundle fai files contain spaces in the first column used for the contig name, for instance (with tabs entered as \t for clarity):

1 dna:chromosome chromosome:GRCh37:1:1:249250621:1\t249250621\t52\t60\t61

In this case the first column value of "1 dna:chromosome chromosome:GRCh37:1:1:249250621:1" causes samtools (0.1.18) to crash when dealing with this file. Note that this same line would be accepted in the header of a fasta file, but not in the first column of a fasta index. This also causes tools like strelka which heavily use libbam to fail. Taking the NCBI v37 fasta files supplied in the GATK bundle and running "samtools faidx human_g1k_v37_decoy.fasta" produces:

$ head -1 human_g1k_v37_decoy.fasta.fai


The problematic fai files appear associated with all of the NCBI v37 fastas:





Can the fai files be corrected and the bundles updated? I'd greatly appreciate the problems this would reduce for folks using samtools based software.



Best Answer

  • ebanksebanks Broad Institute ✭✭✭✭
    Accepted Answer

    Okay, I've pushed a patch for the .fai generation code. We're hoping to update the bundle in the next week or two, so you'll get updated indexes when that happens. Thanks again for the report.


Sign In or Register to comment.