To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

Input files reads and reference have incompatible contigs

Dear GATK team,
I'm re-running an analysis pipeline with the new GATK version 3.5 on human samples. I have already analyzed my data with the older 3.1 version without any problems.
I am having troubles now with the IndelRealigner step.

ERROR
ERROR MESSAGE: Input files reads and reference have incompatible contigs. Please see http://gatkforums.broadinstitute.org/discussion/63/input-files-have-incompatible-contigsfor more informa

tion. Error details: Found contigs with the same name but different lengths or MD5s:

ERROR contig reads is named chr20 with length 63025520 and MD5 0dec9660ec1efaaf33281c0d5ea2560f
ERROR contig reference is named chr20 with length 63025520 and MD5 1ef908e47ac040f0e94ede396c59f074.
ERROR reads contigs = [chrM, chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY]
ERROR reference contigs = [chrM, chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY]

This happens for all my samples.

The reason I am re-running all these steps is because I want to use the HC in GVCF mode and this was not possible with my older GATK version. I didn't want to mix versions so I started the whole pipeline again. Is this really necesary? Or can I just do the HC in GVCF mode on the 3.1 generated bam files?
Thanks a lot!!

Tagged:

Issue · Github
by Sheila

Issue Number
394
State
closed
Last Updated
Milestone
Array
Closed By
chandrans

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    This may be caused by a new check we added to validate that sequences match exactly. It sounds like it may be overreacting in your case. One way to bypass this is to remove the md5 string in the file headers, if you're sure that sequences are the same and haven't been modified.

  • chonchofrontochonchofronto Washington D.C.Member

    Hi Geraldine. I also had this same issue simply trying to execute "java -jar GenomeAnalysisTK.jar -T CountReads -R exampleFASTA.fasta -I exampleBAM.bam" from the tutorials on the example data. Once I removed the md5 from the second line of exampleFASTA.dict I received the correct output from the command.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Thanks for reporting this, @chonchofronto

  • Ankit123Ankit123 BerlinMember

    Hi Geraldine,

    Actually I am getting same error while I am trying to execute "java -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -R /projects/aarora1/human-genome/hg19/Bowtie2Index/hg19.fa -I /scratch/ITN_aDDress/output.bam --known /home/aarora1/RNA-seq_variant-calling/1000G_phase1.indels.hg19.sites.vcf -o /scratch/ITN_aDDress/forIndelRealigner.intervals"

    I used tophat for alignment and took gtf and reference fasta file from UCSC and vcf filr gatk bundle.

    Any help would be appreciated .

    Thanks

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi there, as stated above you can try removing the md5 string in the file headers, or you can try running with -U ALLOW_SEQ_DICT_INCOMPATIBILITY.

  • Ankit123Ankit123 BerlinMember

    Hi Geraldine,

    Thanks a lot replying.

    I tried your suggestion but actually I am getting different error . ERROR stack trace
    java.lang.ArrayIndexOutOfBoundsException: 1

    So for that also I tried your suggestion from another question : -rf BadCigar

    But then also I am getting same error.

    So is it that this problem can be due to mapping from tophat?

    Thanks

  • SheilaSheila Broad InstituteMember, Broadie, Moderator
    edited March 2016

    @Ankit123
    Hi,

    Which version of GATK and Java are you using? Also, can you try validating your input bam file with ValidateSamFile?

    Thanks,
    Sheila

  • Ankit123Ankit123 BerlinMember

    Hi Sheila,

    As I found the solution. So in order to avoid this error its better to use ReorderSam first on bam file.

    Thanks

Sign In or Register to comment.