Problem with Indel Target Realigner, extra contig added?

I run into a problem after the pre-processing, it seems that extra contigs where added to my bam file compared to the reference I used, which make the indel realigner step impossible to do. I have checked the headers of my file and the reference is the same but my bam file as a hundreds of additional contigs. Not sure what happen.
The steps to get the bam where:

  • Aligned with bwa mem
  • Transform to bam and sort (Samtools)
  • Dedup (picard)
  • Add read group (picard)
  • Index bam (samtools)
  • Run Realigner target creator
    When I check the header of my bam file it still show the right contigs but when running it complains of difference (additional) compare to my reference. I am currently re-testing the whole pipeline on a single sample but if you have any pointer to what could cause this, maybe a problem with the bam formating?
    I am running GATK 3.3.0-g37228af
    Java 1.7
    I have attached the ouput log from the command.


PS: I attended your workshop in Cambridge!

Best Answer


    Thanks Geraldine.
    It worked afer I re-added the header to the vcf file has your script removed them. Obviously this resulted in the column spacing being mixed up but it is running now!

