The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

Picard ReorderSam Error: Read CIGAR M operator maps off end of reference

nroaknroak HoustonMember Posts: 30
edited December 2015 in Ask the GATK team

Origin of the problem:
GATK detected different order of the bam file and the reference file as follows:

ERROR MESSAGE: Input files reads and reference have incompatible contigs: The contig order in reads and referenceis not the same; to fix this please see: (https://www.broadinstitute.org/gatk/guide/article?id=1328), which describes reordering contigs in BAM and VCF files..
ERROR reads contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, X, Y, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, MT, NT_113887, ...]
ERROR reference contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, MT, NT_113887, ...]

Then I referred to the link, https://www.broadinstitute.org/gatk/guide/article?id=1328

And decided to use Picard ReorderSam tool, which led me to the issue reported here,

The Problem: Picard ReorderSam terminates with error.
Command:
java -Xmx110g -Djava.io.tmpdir=$workDir/merged-bams/tmp -jar ./picard/1.115/ReorderSam.jar ALLOW_INCOMPLETE_DICT_CONCORDANCE=true
TMP_DIR=$workDir/merged-bams/tmp
I=$workDir/merged-bams/$sample.sorted.cleaned.bam
R=$refGenome
O=$workDir/merged-bams/$sample.sorted.reordered.bam
**
Error:**
INFO 2015-12-09 12:24:39 ReorderSam Writing reads...
INFO 2015-12-09 12:24:39 ReorderSam Processing All reads
[Wed Dec 09 13:20:08 CST 2015] picard.sam.ReorderSam done. Elapsed time: 55.49 minutes.
Runtime.totalMemory()=15967387648
To get help, see http://picard.sourceforge.net/index.shtml#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Read name HWUSI-EAS1612_61FV6:6:91:1510:1207#0, Read CIGAR M operator maps off end of reference
at htsjdk.samtools.SAMUtils.processValidationErrors(SAMUtils.java:452)
at htsjdk.samtools.BAMRecord.getCigar(BAMRecord.java:247)
at htsjdk.samtools.SAMRecord.getAlignmentEnd(SAMRecord.java:460)
at htsjdk.samtools.SAMRecord.computeIndexingBin(SAMRecord.java:1235)
at htsjdk.samtools.SAMRecord.isValid(SAMRecord.java:1609)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:642)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:628)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:598)
at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:514)
at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:488)
at picard.sam.ReorderSam.writeReads(ReorderSam.java:165)
at picard.sam.ReorderSam.doWork(ReorderSam.java:127)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:183)
at picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:124)
at picard.sam.ReorderSam.main(ReorderSam.java:85)
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[bam_index_core] Invalid BAM header.[bam_index_build2] fail to index the BAM file.

Attempts to fix:
1. I performed Picard CleanSam to solve this error:
java -Xmx56g -jar ./picard/1.115/CleanSam.jar I=$workDir/merged-bams/$sample.sorted.bam O=$workDir/merged-bams/$sample.sorted.cleaned.bam
And the output is attached.
2. Then reordering this cleaned sam also throws the same error as mentioned above.
3. Next I performed Picard ValidateSamfile and the log contains the readnames with error"Read CIGAR M operator maps off end of reference"

Can you please help me get around this issue? All I really want is proceed with GATK having same order of bam and reference contigs.
I have been referring vigorously to several GATK discussions, but none addresses this issues directly or has helped find a solution.

txt
txt
cleansam.err.txt
480K
Tagged:

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,727 admin

    How was your input bam originally produced?

    Geraldine Van der Auwera, PhD

  • nroaknroak HoustonMember Posts: 30
    edited December 2015

    The input bam was originally produced by aligning to a version of NCBI36 used by Wash U., St. Louis.
    As a disclaimer, I have obtained the bam files from Wash U St. Louis team, pre-aligned to this reference sequence. I also obtained the same reference they used and tried to use it with GATK.
    More interestingly, I get contrasting errors depending on the reference I use,
    Error 1: With GATK Reference: -R ./human_b36_both.fasta
    ERROR reads contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, X, Y, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, MT, NT_113887, ...]
    ERROR reference contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, MT, NT_113887, ...]

    Error 2: With WUSTL Reference: -R ./NCBI36_WUGSC_variant.fa
    ERROR known contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, MT, NT_113887, ...]
    ERROR reference contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, X, Y, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, MT, NT_113887, ...]

    And there is no other parameter changed in these two errors apart from the reference file.
    I am also attaching the bam header for the bam file here.

    txt
    txt
    SJINF001_G.header.txt
    17K
    txt
    txt
    SJTALL001_D.bam.bamheader.out.txt
    35K
  • SheilaSheila Broad InstituteMember, Broadie, Moderator Posts: 4,843 admin

    @nroak
    Hi,

    It looks like the input BAM file is not sorted in the reference order. You can use Picard's ReorderSam to fix the issue.

    -Sheila

  • nroaknroak HoustonMember Posts: 30
    edited December 2015

    Yes, that's what I did as per the discussion posted above.
    And then going back to ReorderSam, I face the issues where the Cigar M maps off the reference.
    To solve that, I use CleanSam, which just ignores all those reads that map off the reference.
    Then I again try to reorder "Cleaned" bam file and get the same error message.
    I next performed ValidateSam with MODE=SUMMARY and get the output file saying:
    HISTOGRAM java.lang.String
    Error Type Count
    ERROR:CIGAR_MAPS_OFF_REFERENCE 820

    I'm mostly going in circles when I try to find Answer. Any help would be really great!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,727 admin

    My advice would be to revert the file and realign the data from scratch. It will be cleaner than trying to salvage what you have, and will give you the opportunity to use a reference build that is not so outdated as b36. This is going to be important down the road when you want to compare your results to other recent studies, unless you're specifically trying to replicate an older experiment.

    Geraldine Van der Auwera, PhD

  • nroaknroak HoustonMember Posts: 30
    edited December 2015

    I would be very happy if I could do that. But the bam files I have obtained are from a large research consortium and thus will be extremely difficult to request them to release realigned bam files to a newer reference. I believed my only option was to liftover variants, although not ideal. Is there any reliable way to do this: bam-ncbi36 ->fastq->bam-GrCh37/38 ?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,727 admin

    Sure, you can do it with Picard tools, specifically RevertSam, which is described in this tutorial. The tutorial leads you to generate an unmapped BAM because the Braod workflow uses uBAMs instead of FASTQ, but you can also modify the command to generate a FASTQ file if you prefer to keep things simple.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.