Bug Bulletin: The recent 3.2 release fixes many issues. If you run into a problem, please try the latest version before posting a bug report, as your problem may already have been solved.

Input files reads and reference have incompatible contigs while Running GATK:RealignerTargetCreator

sarkarsarkar SwitzerlandPosts: 11Member
edited April 6 in Ask the GATK team

Hello All,

I am running RealignerTargetCreator using GATK version GenomeAnalysisTK-1.2-4-gd9ea764 and I am getting the following error: `

ERROR MESSAGE: Input files reads and reference have incompatible contigs: Found contigs with the same name but different lengths:
ERROR contig reads = scaffold69676_size1796 / 3149
ERROR contig reference = scaffold69676_size1796 / 1758.
ERROR reads contigs = [scaffold1_size320545, scaffold2_size291774, scaffold3_size284740..........`

I already checked that I am using the right Reference FASTA file and the correct .bam file, that I have used for alignment before. Therefore, I am clueless why I am getting this error? I would appreciate your help regarding this problem. Any suggestion is welcome?

Thanks, Namrata

Post edited by sarkar on

Answers

  • SheilaSheila Broad InstitutePosts: 280Member, GATK Developer, Broadie, Moderator admin

    @sarkar Hi Namrata,

    First, you should try using our newest GATK version 3.1. Version 1.2 is very old!

    Secondly, assuming the number for the reference is correct (based on contig name), you should re-index the bam file. Sometimes the index file gets corrupted, so re-creating a fresh index file fixes the issue.

    Let me know if this helps.

  • sarkarsarkar SwitzerlandPosts: 11Member

    Hi Sheila,

    Thanks for your feedback. I created a new Index file and tried again but still it is giving me the same error.

    Do you have any other suggestions?

    Many Thanks, Namrata

  • SheilaSheila Broad InstitutePosts: 280Member, GATK Developer, Broadie, Moderator admin

    Hi Namrata @sarkar

    You can try to check the length of the contig in the reference genome and the length of the contig in the bam file to see if they match.

    You can do this by using samtools view -h yourfile.bam You will need to use something like grep to get the actual line of the sequence dictionary that corresponds to the contig. If you do not know how to do this, please ask your IT department for help.

    Which version are you using now? 3.1?

    -Sheila

  • sarkarsarkar SwitzerlandPosts: 11Member

    Hi Sheila,

    The length of the contig in the reference genome is 1758. And when by doing samtools view -h myfile.bam | grep scaffold69676_size1796 i got @SQ SN:scaffold69676_size1796 LN:3149. These lengths are consistant with the Error message I got previously.

    I tried with a later version GATK 2.6 but it still gave the same error. I tried to reorderSam but was not of any help. Thanks, Namrata

  • SheilaSheila Broad InstitutePosts: 280Member, GATK Developer, Broadie, Moderator admin
    edited April 9

    @sarkar

    I see that the length of the reference contig is different from the length of the input file contig. Unfortunately, this means you probably used the wrong version of the reference for aligning your reads. The best advice I can give you is to redo everything starting at the alignment step.

    Version 2.6 is better than 1.2, but while you are at it, you should upgrade to version 3.1.

    Good luck!

    Post edited by Sheila on
  • sarkarsarkar SwitzerlandPosts: 11Member

    Hi Sheila,

    Thanks very much.

    Cheers, Namrata

Sign In or Register to comment.