The current GATK version is 3.7-0
IndelRealigner input file

I am a first-time user of GATK and have spent some time now on trying to get the input bam files in the appropriate format. To run IndelRealigner, I have added ReadGroups, Reordered and Index my bam file with the respective Picard-Tools.

My command-line is the following:

java'pwd'/tmp -jar GenomeAnalysisTK.jar -I ./add_read_groups_reorder_index.bam -R ./genome.fa -T IndelRealigner -targetIntervals ./gatk.intervals -o ./*.bam -known ./Mills-1000G-indels.vcf --consensusDeterminationModel KNOWNS_ONLY -LOD 0.4

I get the following message:

SAM/BAM file /home/gp53/tophat2-merge-ctl-1st-2nd-readgroups-reorder-index.bam is malformed: SAM file doesn't have any read groups defined in the header.

My reads are paired-end aligned with TopHat2
I will appreciate your help on this.

Post edited by Geraldine_VdAuwera on

Best Answers


  edited February 2013

    Hello Geraldine,
    Thanks for your help.

    I have checked my read groups and headers to make sure they look like the one specified in the GATK website (
    I am now trying to run RealignerTargetCreator and I get the following error:

    ERROR MESSAGE: SAM/BAM file SAMFileReader{/home/gp53/tophat2-eber-2nd-R1-readgroups-reorder.bam} is malformed: Read HWI-ST830:129:D1459ACXX:8:1208:6666:45578 is either missing the read group or its read group is not defined in the BAM header, both of which are required by the GATK

    My header looks like this:

    VN:1.0  SO:coordinate
    @SQ     SN:chrM LN:16571        UR:file:/home/gp53/bwa/genome.fa        M5:d2ed829b8a1628d16cbeee88e88e39eb
    @SQ     SN:chr1 LN:249250621    UR:file:/home/gp53/bwa/genome.fa        M5:1b22b98cdeb4a9304cb5d48026a85128
    @SQ     SN:chr2 LN:243199373    UR:file:/home/gp53/bwa/genome.fa        M5:a0d9851da00400dec1098a9255ac712e
    @SQ     SN:chr3 LN:198022430    UR:file:/home/gp53/bwa/genome.fa        M5:641e4338fa8d52a5b781bd2a2c08d3c3
    @SQ     SN:chr4 LN:191154276    UR:file:/home/gp53/bwa/genome.fa        M5:23dccd106897542ad87d2765d28a19a1
    @SQ     SN:chr5 LN:180915260    UR:file:/home/gp53/bwa/genome.fa        M5:0740173db9ffd264d728f32784845cd7
    @SQ     SN:chr6 LN:171115067    UR:file:/home/gp53/bwa/genome.fa        M5:1d3a93a248d92a729ee764823acbbc6b
    @SQ     SN:chr7 LN:159138663    UR:file:/home/gp53/bwa/genome.fa        M5:618366e953d6aaad97dbe4777c29375e
    @SQ     SN:chr8 LN:146364022    UR:file:/home/gp53/bwa/genome.fa        M5:96f514a9929e410c6651697bded59aec
    @SQ     SN:chr9 LN:141213431    UR:file:/home/gp53/bwa/genome.fa        M5:3e273117f15e0a400f01055d9f393768
    @SQ     SN:chr10        LN:135534747    UR:file:/home/gp53/bwa/genome.fa        M5:988c28e000e84c26d552359af1ea2e1d
    @SQ     SN:chr11        LN:135006516    UR:file:/home/gp53/bwa/genome.fa        M5:98c59049a2df285c76ffb1c6db8f8b96
    @SQ     SN:chr12        LN:133851895    UR:file:/home/gp53/bwa/genome.fa        M5:51851ac0e1a115847ad36449b0015864
    @SQ     SN:chr13        LN:115169878    UR:file:/home/gp53/bwa/genome.fa        M5:283f8d7892baa81b510a015719ca7b0b
    @SQ     SN:chr14        LN:107349540    UR:file:/home/gp53/bwa/genome.fa        M5:98f3cae32b2a2e9524bc19813927542e
    @SQ     SN:chr15        LN:102531392    UR:file:/home/gp53/bwa/genome.fa        M5:e5645a794a8238215b2cd77acb95a078
    @SQ     SN:chr16        LN:90354753     UR:file:/home/gp53/bwa/genome.fa        M5:fc9b1a7b42b97a864f56b348b06095e6
    @SQ     SN:chr17        LN:81195210     UR:file:/home/gp53/bwa/genome.fa        M5:351f64d4f4f9ddd45b35336ad97aa6de
    @SQ     SN:chr18        LN:78077248     UR:file:/home/gp53/bwa/genome.fa        M5:b15d4b2d29dde9d3e4f93d1d0f2cbc9c
    @SQ     SN:chr19        LN:59128983     UR:file:/home/gp53/bwa/genome.fa        M5:1aacd71f30db8e561810913e0b72636d
    @SQ     SN:chr20        LN:63025520     UR:file:/home/gp53/bwa/genome.fa        M5:0dec9660ec1efaaf33281c0d5ea2560f
    @SQ     SN:chr21        LN:48129895     UR:file:/home/gp53/bwa/genome.fa        M5:2979a6085bfe28e3ad6f552f361ed74d
    @SQ     SN:chr22        LN:51304566     UR:file:/home/gp53/bwa/genome.fa        M5:a718acaa6135fdca8357d5bfe94211dd
    @SQ     SN:chrX LN:155270560    UR:file:/home/gp53/bwa/genome.fa        M5:7e0e2e580297b7764e31dbc80c2540dd
    @SQ     SN:chrY LN:59373566     UR:file:/home/gp53/bwa/genome.fa        M5:1e86411d73e6f00a10590f976be01623
    @RG     ID:null PL:illumina     PU:single_lane  LB:unstranded   SM:tophat-eber-2nd-R1
    @PG     ID:TopHat       VN:2.0.5        CL:/usr/local/bin/tophat2 -p 16 -g 1 -z pigz -G /home/gp53/tophat/genes.gtf --no-novel-juncs -o tophat-eber-2nd-R1 /home/administrator/Bowtie2Index/genome /media/Elements/Genaro/input/eber-2nd-R1.fastq

    Also, the GATK guide indicates that I have an indexed file, but then GATK-2.3-9 wont accept indexed bam files.
    I would appreciate your help on this.

    Post edited by Geraldine_VdAuwera on
  • Hi Geraldine,
    I got my issue fixed, I think.
    I have the RealignerTargetCreator running now in both BWA and TopHat2 alignments.The one thing I changed is to leave the ID=string option as default=1 in AddOrReplaceReadGroups.jar.
    That pretty much eliminated the recurring error: Read HWI-ST830:129:D1459ACXX:8:1208:6666:45578 is either missing the read group or its read group is not defined in the BAM header, both of which are required by the GATK.
    The other problem I had was that I was confused/unaware that GATK would go and look for the bam.bai file given a bam file in the input.
    I am not a bioinformatician by training, so this was not obvious to me.
    Thanks again for your help.

