Problem due to: "MESSAGE: Input files reads and reference have incompatible contigs"

NicolaCNicolaC Trento, ITMember
edited September 2014 in Ask the GATK team

I am trying to compute mean coverage (using GATK DepthOfCovearge) for a BAM file (targeting sequencing) aligned using reference hg19.

java -Xmx2g -jar GenomeAnalysisTK.jar \
        -R ucsc.hg19.fasta \
        -T DepthOfCoverage \
        -I my_bam.list \
        -L my_targets.bed \
        -o coverage

The problem reported is:

##### ERROR MESSAGE: Input files reads and reference have incompatible contigs: Found contigs with the same name but different lengths:
##### ERROR   contig reads = chrM / 16569
##### ERROR   contig reference = chrM / 16571.
##### ERROR   reads contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chrM]
##### ERROR   reference contigs = [chrM, chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chr1_gl000191_random, chr1_gl000192_random, chr4_ctg9_hap1, chr4_gl000193_random, chr4_gl000194_random, chr6_apd_hap1, chr6_cox_hap2, chr6_dbb_hap3, chr6_mann_hap4, chr6_mcf_hap5, chr6_qbl_hap6, chr6_ssto_hap7, chr7_gl000195_random, chr8_gl000196_random, chr8_gl000197_random, chr9_gl000198_random, chr9_gl000199_random, chr9_gl000200_random, chr9_gl000201_random, chr11_gl000202_random, chr17_ctg5_hap1, chr17_gl000203_random, chr17_gl000204_random, chr17_gl000205_random, chr17_gl000206_random, chr18_gl000207_random, chr19_gl000208_random, chr19_gl000209_random, chr21_gl000210_random, chrUn_gl000211, chrUn_gl000212, chrUn_gl000213, chrUn_gl000214, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chrUn_gl000218, chrUn_gl000219, chrUn_gl000220, chrUn_gl000221, chrUn_gl000222, chrUn_gl000223, chrUn_gl000224, chrUn_gl000225, chrUn_gl000226, chrUn_gl000227, chrUn_gl000228, chrUn_gl000229, chrUn_gl000230, chrUn_gl000231, chrUn_gl000232, chrUn_gl000233, chrUn_gl000234, chrUn_gl000235, chrUn_gl000236, chrUn_gl000237, chrUn_gl000238, chrUn_gl000239, chrUn_gl000240, chrUn_gl000241, chrUn_gl000242, chrUn_gl000243, chrUn_gl000244, chrUn_gl000245, chrUn_gl000246, chrUn_gl000247, chrUn_gl000248, chrUn_gl000249]
##### ERROR ------------------------------------------------------------------------------------------

Could you please help me to find a solution?
Many thanks in advance.

Best Answer


  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    @NicolaC Have you checked, whether your bams are the same build as your reference?

  • NicolaCNicolaC Trento, ITMember

    I checked contigs in the header of the BAM file, that is:

    @HD     VN:1.4  SO:coordinate
    @SQ     SN:chr1 LN:249250621
    @SQ     SN:chr2 LN:243199373
    @SQ     SN:chr3 LN:198022430
    @SQ     SN:chr4 LN:191154276
    @SQ     SN:chr5 LN:180915260
    @SQ     SN:chr6 LN:171115067
    @SQ     SN:chr7 LN:159138663
    @SQ     SN:chr8 LN:146364022
    @SQ     SN:chr9 LN:141213431
    @SQ     SN:chr10        LN:135534747
    @SQ     SN:chr11        LN:135006516
    @SQ     SN:chr12        LN:133851895
    @SQ     SN:chr13        LN:115169878
    @SQ     SN:chr14        LN:107349540
    @SQ     SN:chr15        LN:102531392
    @SQ     SN:chr16        LN:90354753
    @SQ     SN:chr17        LN:81195210
    @SQ     SN:chr18        LN:78077248
    @SQ     SN:chr19        LN:59128983
    @SQ     SN:chr20        LN:63025520
    @SQ     SN:chr21        LN:48129895
    @SQ     SN:chr22        LN:51304566
    @SQ     SN:chrX LN:155270560
    @SQ     SN:chrY LN:59373566
    @SQ     SN:chrM LN:16569

    And the LB tag is

    I supposed that hg19 was used as reference, am I wrong? Are there any way to verify it?
    Thank you.

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    @NicolaC‌ check this page (4. What is the canonical ordering of human reference contigs in a BAM file?):

    Also check this page:

    Excellent answer by @pmint :
    "in hg19 version, chrM length = 16571 in b37 version, chrM length = 16569"

    So switch from hg19 to b37 and your problem should/might be sorted. I hope that helps.

    If you search for your error message, then you will find, that others have had the same problem.

  • NicolaCNicolaC Trento, ITMember
    edited September 2014

    @tommycarstensen‌ your suggestions are very useful. thank you for your help.

    I tried using as reference the b37 reference: human_g1k_v37.fasta

    The error reported is:

    MESSAGE: File associated with my_target.bed is malformed: Problem reading the interval file caused by Badly formed genome loc: Contig chr8 given as location, but this contig isn't present in the Fasta sequence dictionary

    This happens because, regions in BED file are specified as:

    chr8    234370        234371        
    chr8    234389        234390       
    chr8    234392        234393       
    chr8    234469        234470

    Simply removing the "chr" prefix from BED file target regions is it enough to solve the problem without introducing any bias? I was wondering if, for example, regions with coordinates chr8:234370-234371 (hg19) corresponds exactly to 8:234370-234371 (b37).

    Thank you.

  • NicolaCNicolaC Trento, ITMember
    edited September 2014

    Dear @Geraldine_VdAuwera‌ , thank you for the more than exaustive reply.
    I agree with you that "it looks like your reads were aligned to the b37 reference, but modified to have the 'chr' prefix in the contig names". I am processing bam files sequenced and aligned by IonTorrent. I found out that it uses a specific assembly of the human genome reference. I will ask for that specific assembly trying to solve compatibility problems or as you proposed, I will re-align my reads.
    Many thanks for all helps received!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Ah I see -- good to know that Ion uses their own reference build. Good luck!

Sign In or Register to comment.