The current GATK version is 3.8-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?

Then follow instructions in Article#1894.

Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Download the latest Picard release at
GATK version 4.beta.3 (i.e. the third beta release) is out. See the GATK4 beta page for download and details.

PICARD AlignmentSummaryMetrics

I used the following command to gather summary metrics for my bam file generated via bowtie2 (tophat to be specific):

java -jar /usr/share/picard-tools-1.136/picard.jar CollectAlignmentSummaryMetrics INPUT=Sample_DY10.tophat.bam OUTPUT=tmpmetrics/alignmentmetrics R=/mnt/storage/ref_genome/Homo_sapiens/UCSC_hg19/UCSC/hg19/Sequence/Bowtie2Index/genome.fa

The output file is attached.
The question I have is that the metrics PF_HQ_MEDIAN_MISMATCHES has a very high number (66). When I look at NM tag in the bam file, I see that the median is 1 with max NM = 2
I am wondering how this number is calculated by PICARD.

Any help is appreciated.

Issue · Github
by Sheila

Issue Number
Last Updated
Closed By

Best Answer


  • SheilaSheila Broad InstituteMember, Broadie, Moderator


    It says the metric is "The median number of mismatches versus the reference sequence in reads that were aligned to the reference at high quality (i.e. PF_HQ_ALIGNED READS) in this article. However, I am not sure what an appropriate number is for the metric. I will check with the team and get back to you.


  • shleeshlee CambridgeMember, Broadie, Moderator

    Hi @newbie16,

    We've narrowed down your excessive PF_HQ_MEDIAN_MISMATCHES to three possibilities. Either CIGAR string S bases (softclips) are counted towards mismatches, or CIGAR string N bases (reference-skip bases, e.g. for intronic sequences), or both. Considering these types of bases in your alignment records, does your excessive median mismatches make sense?

    In terms of reads for which this metric is calculated, these I believe have to have MAPQ > 20 (therefore must be aligned) and cannot be supplementary. The tool takes alignment blocks in the record, defined by the CIGAR string, and iterates over each of them to add to the mismatch count by directly comparing the base to the reference. Comparisons are case-insensitive.

  • shleeshlee CambridgeMember, Broadie, Moderator


    Someone from the team informs me that the RNA samples have a PF_HQ_MEDIAN_MISMATCHES value typically around 0-2. So what I wrote above may be wrong. Can you post some of your alignment records so we can take a look at the SAM flag values, CIGAR string, etc?

  • Hi
    Thanks for looking into it. I have uploaded a sample bam on google drive with below link. The PF_HQ_MEDIAN_MISMATCHES value for this file was 66.

  • Hi @shlee

    The SplitNCigarReads program is emitting following error:

    ERROR MESSAGE: Unsupported CIGAR operator N in read HISEQ:262:C99J2ACXX:8:2206:7109:2528 at chr1:4776766. If you are working with RNA-Seq data, see for guidance. If you choose to disregard those instructions, or for other uses, you have the option of either filtering out all reads with operator N in their CIGAR string (add --filter_reads_with_N_cigar to your command line) or overriding this check (add -U ALLOW_N_CIGAR_READS to your command line). Notice however that the latter is unsupported, so if you use it and encounter any problems, the GATK support team not be able to help you.

    The command I used is this:
    java -jar ~/software/GenomeAnalysisTK-3.6/GenomeAnalysisTK.jar -T SplitNCigarReads -R /mnt/storage/ref_genome/mouse_mm10/mouse_mm10/genome.fa -I Sample_14011001.tophat.sub0p005.reorder.bam -o Sample_14011001.tophat.sub0p005.reorder.NoNs.bam

    The aligned read (from sam file) mentioned in the above error is below:


    Could you please help

  • SheilaSheila Broad InstituteMember, Broadie, Moderator


    In this case, it is okay to use -U ALLOW_N_CIGAR_READS. We added a note in that article the error message points you to :smile:


  • Thanks @shlee and @Sheila
    Once I got rid of N's the PF_HQ_MEDIAN_MISMATCHES looks ok, i.e. 0

Sign In or Register to comment.