The Frontline Support team will be offline February 18 for President's Day but will be back February 19th. Thank you for your patience as we get to all of your questions!
Problem with Indel Target Realigner, extra contig added?

Hello,
I run into a problem after the pre-processing, it seems that extra contigs where added to my bam file compared to the reference I used, which make the indel realigner step impossible to do. I have checked the headers of my file and the reference is the same but my bam file as a hundreds of additional contigs. Not sure what happen.
The steps to get the bam where:
- Aligned with bwa mem
- Transform to bam and sort (Samtools)
- Dedup (picard)
- Add read group (picard)
- Index bam (samtools)
- Run Realigner target creator
When I check the header of my bam file it still show the right contigs but when running it complains of difference (additional) compare to my reference. I am currently re-testing the whole pipeline on a single sample but if you have any pointer to what could cause this, maybe a problem with the bam formating?
I am running GATK 3.3.0-g37228af
Java 1.7
I have attached the ouput log from the command.
Thanks,
Julien
PS: I attended your workshop in Cambridge!
Best Answer
-
Geraldine_VdAuwera Cambridge, MA admin
Hi Julien,
I think your bam file is fine. The error message states
Input files known and reference have incompatible contigs
Which suggests that perhaps the file you're using as known sites was not derived directly from the same reference, or that it's sorted differently. We have a script that can resort a VCF based on a reference, which you can find here. Have a go at that and let me know if that doesn't work out.
Answers
Hi Julien,
I think your bam file is fine. The error message states
Which suggests that perhaps the file you're using as known sites was not derived directly from the same reference, or that it's sorted differently. We have a script that can resort a VCF based on a reference, which you can find here. Have a go at that and let me know if that doesn't work out.
Thanks Geraldine.
It worked afer I re-added the header to the vcf file has your script removed them. Obviously this resulted in the column spacing being mixed up but it is running now!