Fishy output from ReduceReads
I've been following the best practices guide and I've gotten some odd looking output from ReduceReads. Here's a sample:
C100 16 chrM 4934 60 1M * 0 0 T 7 BD:Z:E RG:Z:JC01_L1 BI:Z:L RR:B:c,1 RS:A:1
The odd part is the CIGAR string. Is "1M" a reasonable CIGAR string? Furthermore, prior to ReduceReads, Picard tools' ValidateSamFile finished with no errors, and the validation for the ReduceReads output is like so:
WARNING: Record 1, Read name 1, NM tag (nucleotide differences) is missing
That occurs for records 1 - 100 and then ValidateSamFile does not report any more.
Here is the command line I used for ReduceReads:
java -Xmx2g -Djava.io.tmpdir=pwd/tmp -jar $GATK -T ReduceReads -R $genomes/hg19.fa -I $alignments/$lane.dedup.realn.recal.bam -o $alignments/$lane.dedup.realn.recal.reduced.bam
Note that pwd is surrounded by back ticks, I just don't know how to disable them from interrupting the code format.