IndelRealigner input file

genaro_pimientagenaro_pimienta Posts: 15Member
edited February 2013 in Ask the GATK team

Hello, I am a first-time user of GATK and have spent some time now on trying to get the input bam files in the appropriate format. To run IndelRealigner, I have added ReadGroups, Reordered and Index my bam file with the respective Picard-Tools.

My command-line is the following:

java -Djava.io.tmpdir='pwd'/tmp -jar GenomeAnalysisTK.jar -I ./add_read_groups_reorder_index.bam -R ./genome.fa -T IndelRealigner -targetIntervals ./gatk.intervals -o ./*.bam -known ./Mills-1000G-indels.vcf --consensusDeterminationModel KNOWNS_ONLY -LOD 0.4

I get the following message:

SAM/BAM file /home/gp53/tophat2-merge-ctl-1st-2nd-readgroups-reorder-index.bam is malformed: SAM file doesn't have any read groups defined in the header.

My reads are paired-end aligned with TopHat2 I will appreciate your help on this. Thanks, G.

Post edited by Geraldine_VdAuwera on

Best Answers

Answers

  • genaro_pimientagenaro_pimienta Posts: 15Member
    edited February 2013

    Hello Geraldine, Thanks for your help.

    I have checked my read groups and headers to make sure they look like the one specified in the GATK website (http://www.broadinstitute.org/gatk/guide/article?id=1204) I am now trying to run RealignerTargetCreator and I get the following error:

    ERROR MESSAGE: SAM/BAM file SAMFileReader{/home/gp53/tophat2-eber-2nd-R1-readgroups-reorder.bam} is malformed: Read HWI-ST830:129:D1459ACXX:8:1208:6666:45578 is either missing the read group or its read group is not defined in the BAM header, both of which are required by the GATK
    

    My header looks like this:

    VN:1.0  SO:coordinate
    @SQ     SN:chrM LN:16571        UR:file:/home/gp53/bwa/genome.fa        M5:d2ed829b8a1628d16cbeee88e88e39eb
    @SQ     SN:chr1 LN:249250621    UR:file:/home/gp53/bwa/genome.fa        M5:1b22b98cdeb4a9304cb5d48026a85128
    @SQ     SN:chr2 LN:243199373    UR:file:/home/gp53/bwa/genome.fa        M5:a0d9851da00400dec1098a9255ac712e
    @SQ     SN:chr3 LN:198022430    UR:file:/home/gp53/bwa/genome.fa        M5:641e4338fa8d52a5b781bd2a2c08d3c3
    @SQ     SN:chr4 LN:191154276    UR:file:/home/gp53/bwa/genome.fa        M5:23dccd106897542ad87d2765d28a19a1
    @SQ     SN:chr5 LN:180915260    UR:file:/home/gp53/bwa/genome.fa        M5:0740173db9ffd264d728f32784845cd7
    @SQ     SN:chr6 LN:171115067    UR:file:/home/gp53/bwa/genome.fa        M5:1d3a93a248d92a729ee764823acbbc6b
    @SQ     SN:chr7 LN:159138663    UR:file:/home/gp53/bwa/genome.fa        M5:618366e953d6aaad97dbe4777c29375e
    @SQ     SN:chr8 LN:146364022    UR:file:/home/gp53/bwa/genome.fa        M5:96f514a9929e410c6651697bded59aec
    @SQ     SN:chr9 LN:141213431    UR:file:/home/gp53/bwa/genome.fa        M5:3e273117f15e0a400f01055d9f393768
    @SQ     SN:chr10        LN:135534747    UR:file:/home/gp53/bwa/genome.fa        M5:988c28e000e84c26d552359af1ea2e1d
    @SQ     SN:chr11        LN:135006516    UR:file:/home/gp53/bwa/genome.fa        M5:98c59049a2df285c76ffb1c6db8f8b96
    @SQ     SN:chr12        LN:133851895    UR:file:/home/gp53/bwa/genome.fa        M5:51851ac0e1a115847ad36449b0015864
    @SQ     SN:chr13        LN:115169878    UR:file:/home/gp53/bwa/genome.fa        M5:283f8d7892baa81b510a015719ca7b0b
    @SQ     SN:chr14        LN:107349540    UR:file:/home/gp53/bwa/genome.fa        M5:98f3cae32b2a2e9524bc19813927542e
    @SQ     SN:chr15        LN:102531392    UR:file:/home/gp53/bwa/genome.fa        M5:e5645a794a8238215b2cd77acb95a078
    @SQ     SN:chr16        LN:90354753     UR:file:/home/gp53/bwa/genome.fa        M5:fc9b1a7b42b97a864f56b348b06095e6
    @SQ     SN:chr17        LN:81195210     UR:file:/home/gp53/bwa/genome.fa        M5:351f64d4f4f9ddd45b35336ad97aa6de
    @SQ     SN:chr18        LN:78077248     UR:file:/home/gp53/bwa/genome.fa        M5:b15d4b2d29dde9d3e4f93d1d0f2cbc9c
    @SQ     SN:chr19        LN:59128983     UR:file:/home/gp53/bwa/genome.fa        M5:1aacd71f30db8e561810913e0b72636d
    @SQ     SN:chr20        LN:63025520     UR:file:/home/gp53/bwa/genome.fa        M5:0dec9660ec1efaaf33281c0d5ea2560f
    @SQ     SN:chr21        LN:48129895     UR:file:/home/gp53/bwa/genome.fa        M5:2979a6085bfe28e3ad6f552f361ed74d
    @SQ     SN:chr22        LN:51304566     UR:file:/home/gp53/bwa/genome.fa        M5:a718acaa6135fdca8357d5bfe94211dd
    @SQ     SN:chrX LN:155270560    UR:file:/home/gp53/bwa/genome.fa        M5:7e0e2e580297b7764e31dbc80c2540dd
    @SQ     SN:chrY LN:59373566     UR:file:/home/gp53/bwa/genome.fa        M5:1e86411d73e6f00a10590f976be01623
    @RG     ID:null PL:illumina     PU:single_lane  LB:unstranded   SM:tophat-eber-2nd-R1
    @PG     ID:TopHat       VN:2.0.5        CL:/usr/local/bin/tophat2 -p 16 -g 1 -z pigz -G /home/gp53/tophat/genes.gtf --no-novel-juncs -o tophat-eber-2nd-R1 /home/administrator/Bowtie2Index/genome /media/Elements/Genaro/input/eber-2nd-R1.fastq
    

    Also, the GATK guide indicates that I have an indexed file, but then GATK-2.3-9 wont accept indexed bam files. I would appreciate your help on this. Genaro

    Post edited by Geraldine_VdAuwera on
  • genaro_pimientagenaro_pimienta Posts: 15Member

    Hi Geraldine, I got my issue fixed, I think. I have the RealignerTargetCreator running now in both BWA and TopHat2 alignments.The one thing I changed is to leave the ID=string option as default=1 in AddOrReplaceReadGroups.jar. That pretty much eliminated the recurring error: Read HWI-ST830:129:D1459ACXX:8:1208:6666:45578 is either missing the read group or its read group is not defined in the BAM header, both of which are required by the GATK. The other problem I had was that I was confused/unaware that GATK would go and look for the bam.bai file given a bam file in the input. I am not a bioinformatician by training, so this was not obvious to me. Thanks again for your help. G,

Sign In or Register to comment.