Attention:
The frontline support team will be unavailable to answer questions on April 15th and 17th 2019. We will be back soon after. Thank you for your patience and we apologize for any inconvenience!

Contest error: mismatched contigs

RuchiRuchi Member, Broadie, Moderator, Dev admin

Hello,

I've downloaded bams from dbGaP, that were aligned to grch37. GATK Contest tool is saying mismatched contigs when using hg19.

  • Is there a different reference I should use, if yes, which one?
  • If I replace the header of the downloaded bams with a header from an hg19 bam, is that acceptable?
    It seems like the tool might only look at the header to make sure everything is ok?

Thanks!

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Ruchi
    Hi Ruchi!

    You should use the exact same reference your BAM file was aligned to. Does the dbGaP website provide the reference it used? Can you try using the b37 reference from our bundle?

    Can you post the BAM header here?

    Thanks,
    Sheila

  • RuchiRuchi Member, Broadie, Moderator, Dev admin

    I believe @JakeC is using gs://broad-references/hg19/v0/Homo_sapiens_assembly19.fasta and he may be able to provide the bam header as well?

    I ended up digging further and ran into this post:
    https://gatkforums.broadinstitute.org/gatk/discussion/63/errors-about-input-files-having-missing-or-incompatible-contigs

    There were a few options recommended in the case of BAM file contigs not matching the reference:

    Special case of b37 vs. hg19
    The b37 and hg19 human genome builds are very similar, and the canonical chromosomes (1 through 22, X and Y) only differ by their names (no prefix vs. chr prefix, respectively). If you only care about those, and don't give a flying fig about the decoys or the mitochondrial genome, you could just rename the contigs throughout your mismatching file and call it done, right?
    
    Well... This can work if you do it carefully and cleanly -- but many things can go wrong during the editing process that can screw up your files even more, and it only applies to the canonical chromosomes. The mitochondrial contig is a slightly different length (see error above) in addition to having a different naming convention, and all the other contigs (decoys, herpes virus etc) don't have direct equivalents.
    
  • RuchiRuchi Member, Broadie, Moderator, Dev admin

    I'm not sure if Jake has had a chance to test the altered reference, but I think we can close this issue and we can re-open if the edited reference doesn't work. Thanks @shlee @Sheila

Sign In or Register to comment.