The current GATK version is 3.8-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

#### ☞ Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

#### ☞ Got a problem?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

#### ☞ Did we ask for a bug report?

Then follow instructions in Article#1894.

#### ☞ Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ` ) each to make a code block as demonstrated here.

##### Jump to another community
Download the latest Picard release at https://github.com/broadinstitute/picard/releases.
GATK version 4.beta.3 (i.e. the third beta release) is out. See the GATK4 beta page for download and details.

# Picard ReorderSam Error: Read CIGAR M operator maps off end of reference

HoustonMember
edited December 2015

Origin of the problem:
GATK detected different order of the bam file and the reference file as follows:

##### ERROR reference contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, MT, NT_113887, ...]

Then I referred to the link, https://www.broadinstitute.org/gatk/guide/article?id=1328

And decided to use Picard ReorderSam tool, which led me to the issue reported here,

The Problem: Picard ReorderSam terminates with error.
Command:
java -Xmx110g -Djava.io.tmpdir=$workDir/merged-bams/tmp -jar ./picard/1.115/ReorderSam.jar ALLOW_INCOMPLETE_DICT_CONCORDANCE=true TMP_DIR=$workDir/merged-bams/tmp
I=$workDir/merged-bams/$sample.sorted.cleaned.bam
R=$refGenome O=$workDir/merged-bams/$sample.sorted.reordered.bam ** Error:** INFO 2015-12-09 12:24:39 ReorderSam Writing reads... INFO 2015-12-09 12:24:39 ReorderSam Processing All reads [Wed Dec 09 13:20:08 CST 2015] picard.sam.ReorderSam done. Elapsed time: 55.49 minutes. Runtime.totalMemory()=15967387648 To get help, see http://picard.sourceforge.net/index.shtml#GettingHelp Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Read name HWUSI-EAS1612_61FV6:6:91:1510:1207#0, Read CIGAR M operator maps off end of reference at htsjdk.samtools.SAMUtils.processValidationErrors(SAMUtils.java:452) at htsjdk.samtools.BAMRecord.getCigar(BAMRecord.java:247) at htsjdk.samtools.SAMRecord.getAlignmentEnd(SAMRecord.java:460) at htsjdk.samtools.SAMRecord.computeIndexingBin(SAMRecord.java:1235) at htsjdk.samtools.SAMRecord.isValid(SAMRecord.java:1609) at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:642)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:628) at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:598)
at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:514) at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:488)
at picard.sam.ReorderSam.writeReads(ReorderSam.java:165)
at picard.sam.ReorderSam.doWork(ReorderSam.java:127)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:183)
at picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:124)
at picard.sam.ReorderSam.main(ReorderSam.java:85)
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[bam_index_core] Invalid BAM header.[bam_index_build2] fail to index the BAM file.

Attempts to fix:
1. I performed Picard CleanSam to solve this error:
java -Xmx56g -jar ./picard/1.115/CleanSam.jar I=$workDir/merged-bams/$sample.sorted.bam O=$workDir/merged-bams/$sample.sorted.cleaned.bam
And the output is attached.
2. Then reordering this cleaned sam also throws the same error as mentioned above.
3. Next I performed Picard ValidateSamfile and the log contains the readnames with error"Read CIGAR M operator maps off end of reference"

Can you please help me get around this issue? All I really want is proceed with GATK having same order of bam and reference contigs.
I have been referring vigorously to several GATK discussions, but none addresses this issues directly or has helped find a solution.

Tagged:

## Answers

• Cambridge, MAMember, Administrator, Broadie

How was your input bam originally produced?

• HoustonMember
edited December 2015

The input bam was originally produced by aligning to a version of NCBI36 used by Wash U., St. Louis.
As a disclaimer, I have obtained the bam files from Wash U St. Louis team, pre-aligned to this reference sequence. I also obtained the same reference they used and tried to use it with GATK.
More interestingly, I get contrasting errors depending on the reference I use,
Error 1: With GATK Reference: -R ./human_b36_both.fasta
ERROR reads contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, X, Y, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, MT, NT_113887, ...]
ERROR reference contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, MT, NT_113887, ...]

Error 2: With WUSTL Reference: -R ./NCBI36_WUGSC_variant.fa
ERROR known contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, MT, NT_113887, ...]
ERROR reference contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, X, Y, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, MT, NT_113887, ...]

And there is no other parameter changed in these two errors apart from the reference file.
I am also attaching the bam header for the bam file here.

• Broad InstituteMember, Broadie, Moderator

@nroak
Hi,

It looks like the input BAM file is not sorted in the reference order. You can use Picard's ReorderSam to fix the issue.

-Sheila

• HoustonMember
edited December 2015

Yes, that's what I did as per the discussion posted above.
And then going back to ReorderSam, I face the issues where the Cigar M maps off the reference.
To solve that, I use CleanSam, which just ignores all those reads that map off the reference.
Then I again try to reorder "Cleaned" bam file and get the same error message.
I next performed ValidateSam with MODE=SUMMARY and get the output file saying:
HISTOGRAM java.lang.String
Error Type Count
ERROR:CIGAR_MAPS_OFF_REFERENCE 820

I'm mostly going in circles when I try to find Answer. Any help would be really great!

• Cambridge, MAMember, Administrator, Broadie

My advice would be to revert the file and realign the data from scratch. It will be cleaner than trying to salvage what you have, and will give you the opportunity to use a reference build that is not so outdated as b36. This is going to be important down the road when you want to compare your results to other recent studies, unless you're specifically trying to replicate an older experiment.

• HoustonMember
edited December 2015

I would be very happy if I could do that. But the bam files I have obtained are from a large research consortium and thus will be extremely difficult to request them to release realigned bam files to a newer reference. I believed my only option was to liftover variants, although not ideal. Is there any reliable way to do this: bam-ncbi36 ->fastq->bam-GrCh37/38 ?

• Cambridge, MAMember, Administrator, Broadie

Sure, you can do it with Picard tools, specifically RevertSam, which is described in this tutorial. The tutorial leads you to generate an unmapped BAM because the Braod workflow uses uBAMs instead of FASTQ, but you can also modify the command to generate a FASTQ file if you prefer to keep things simple.

Sign In or Register to comment.