The current GATK version is 3.6-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Powered by Vanilla. Made with Bootstrap.

Input files reads and reference have incompatible contigs while Running GATK:RealignerTargetCreator

sarkarsarkar SwitzerlandMember Posts: 11
edited April 2014 in Ask the GATK team

Hello All,

I am running RealignerTargetCreator using GATK version GenomeAnalysisTK-1.2-4-gd9ea764 and I am getting the following error:
`

ERROR MESSAGE: Input files reads and reference have incompatible contigs: Found contigs with the same name but different lengths:
ERROR contig reads = scaffold69676_size1796 / 3149
ERROR contig reference = scaffold69676_size1796 / 1758.
ERROR reads contigs = [scaffold1_size320545, scaffold2_size291774, scaffold3_size284740..........`

I already checked that I am using the right Reference FASTA file and the correct .bam file, that I have used for alignment before. Therefore, I am clueless why I am getting this error?
I would appreciate your help regarding this problem. Any suggestion is welcome?

Thanks,
Namrata

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator, Dev Posts: 4,284 admin

    @sarkar Hi Namrata,

    First, you should try using our newest GATK version 3.1. Version 1.2 is very old!

    Secondly, assuming the number for the reference is correct (based on contig name), you should re-index the bam file. Sometimes the index file gets corrupted, so re-creating a fresh index file fixes the issue.

    Let me know if this helps.

  • sarkarsarkar SwitzerlandMember Posts: 11

    Hi Sheila,

    Thanks for your feedback. I created a new Index file and tried again but still it is giving me the same error.

    Do you have any other suggestions?

    Many Thanks,
    Namrata

  • SheilaSheila Broad InstituteMember, Broadie, Moderator, Dev Posts: 4,284 admin

    Hi Namrata @sarkar‌

    You can try to check the length of the contig in the reference genome and the length of the contig in the bam file to see if they match.

    You can do this by using samtools view -h yourfile.bam You will need to use something like grep to get the actual line of the sequence dictionary that corresponds to the contig. If you do not know how to do this, please ask your IT department for help.

    Which version are you using now? 3.1?

    -Sheila

  • sarkarsarkar SwitzerlandMember Posts: 11

    Hi Sheila,

    The length of the contig in the reference genome is 1758. And when by doing
    samtools view -h myfile.bam | grep scaffold69676_size1796 i got @SQ SN:scaffold69676_size1796 LN:3149. These lengths are consistant with the Error message I got previously.

    I tried with a later version GATK 2.6 but it still gave the same error.
    I tried to reorderSam but was not of any help.
    Thanks,
    Namrata

  • SheilaSheila Broad InstituteMember, Broadie, Moderator, Dev Posts: 4,284 admin
    edited April 2014

    @sarkar

    I see that the length of the reference contig is different from the length of the input file contig. Unfortunately, this means you probably used the wrong version of the reference for aligning your reads. The best advice I can give you is to redo everything starting at the alignment step.

    Version 2.6 is better than 1.2, but while you are at it, you should upgrade to version 3.1.

    Good luck!

  • sarkarsarkar SwitzerlandMember Posts: 11

    Hi Sheila,

    Thanks very much.

    Cheers,
    Namrata

Sign In or Register to comment.