The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

GATK uses wrong bam index when two plausible indices exist

wkretzschwkretzsch Posts: 6
edited September 2013 in Ask the GATK team

I get the following error when running the unified genotyper on the attached bam file:

ERROR MESSAGE: SAM/BAM file ./MD_CHW_AAC_2699.recalibrated.bam is malformed: Premature EOF; BinaryCodec in readmode; file: /data/itch/winni/proj/marchini/converge/variantCalling/./MD_CHW_AAC_2699.recalibrated.bam

I was using this code:


/usr/local/bin/java -Xmx128G -jar ~/src/GenomeAnalysisTK-2.7-2-g6bda569/GenomeAnalysisTK.jar -T UnifiedGenotyper -I ./MD_CHW_AAC_2699.recalibrated.bam -R /data/1kg/reference_v37d5/hs37d5.fa -o ./test.vcf

There are two bam index files located in the same directory:


MD_CHW_AAC_2699.recalibrated.bam.bai
MD_CHW_AAC_2699.recalibrated.bai

The second index (.bai) is outdated and older than the bam file. The first index (.bam.bai) is newer than the bam file. The first index is valid and the second index is not valid. What I would expect is for GATK to do one of the following:

  1. Use the newer of the two index files and run through
  2. Use the .bai file and die with an error saying that the index file is older than the bam file

Instead, what I think GATK is doing is checking that one of the two index files is newer than the bam file, and then using the .bai file regardless of which index was in fact the newer of the two.

I see this error in GATK 2.6-5 and 2.7-2. Is this a bug?

Regards,
Warren Kretzschmar

tgz
tgz
minimalExample.tgz
3M

Best Answer

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAPosts: 11,743 admin
    Accepted Answer

    Hi Warren,

    Actually, the GATK first looks for the index that fits the naming convention where the base name is identical, but the extension is 'bai' instead of 'bam'. The file date doesn't matter at that point. So GATK is using the correct index file in your directory according to its name-based convention. Now, if the problem is that the .bai file is older and out of date, you should see a warning in the console output to that effect. That being said, the error suggests that it is your bam file that is damaged.

    In any case, the simplest way to diagnose and/or fix this issue is to delete both index files and regenerate a healthy index using Picard or samtools; if the bam is at fault, they will error out too.

    Geraldine Van der Auwera, PhD

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAPosts: 11,743 admin
    Accepted Answer

    Hi Warren,

    Actually, the GATK first looks for the index that fits the naming convention where the base name is identical, but the extension is 'bai' instead of 'bam'. The file date doesn't matter at that point. So GATK is using the correct index file in your directory according to its name-based convention. Now, if the problem is that the .bai file is older and out of date, you should see a warning in the console output to that effect. That being said, the error suggests that it is your bam file that is damaged.

    In any case, the simplest way to diagnose and/or fix this issue is to delete both index files and regenerate a healthy index using Picard or samtools; if the bam is at fault, they will error out too.

    Geraldine Van der Auwera, PhD

  • wkretzschwkretzsch Posts: 6

    Thanks. I can see the warning about the index being older than the bam file now that I look for it. When I delete the outdated index everything works fine.

Sign In or Register to comment.