GATK uses wrong bam index when two plausible indices exist
I get the following error when running the unified genotyper on the attached bam file:
ERROR MESSAGE: SAM/BAM file ./MD_CHW_AAC_2699.recalibrated.bam is malformed: Premature EOF; BinaryCodec in readmode; file: /data/itch/winni/proj/marchini/converge/variantCalling/./MD_CHW_AAC_2699.recalibrated.bam
I was using this code:
/usr/local/bin/java -Xmx128G -jar ~/src/GenomeAnalysisTK-2.7-2-g6bda569/GenomeAnalysisTK.jar -T UnifiedGenotyper -I ./MD_CHW_AAC_2699.recalibrated.bam -R /data/1kg/reference_v37d5/hs37d5.fa -o ./test.vcf
There are two bam index files located in the same directory:
The second index (.bai) is outdated and older than the bam file. The first index (.bam.bai) is newer than the bam file. The first index is valid and the second index is not valid. What I would expect is for GATK to do one of the following:
- Use the newer of the two index files and run through
- Use the .bai file and die with an error saying that the index file is older than the bam file
Instead, what I think GATK is doing is checking that one of the two index files is newer than the bam file, and then using the .bai file regardless of which index was in fact the newer of the two.
I see this error in GATK 2.6-5 and 2.7-2. Is this a bug?