We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

SortSam before MarkDuplicates?

Hi GATK team,

I'm setting up a GATK best practices workflow. It is described here: https://software.broadinstitute.org/gatk/best-practices/workflow?id=11165 that after mapping, which I did like this:

bwa mem -M -t 8 Homo_sapiens.GRCh38.dna.primary_assembly.fa R1_001.fastq.gz R2_001.fastq.gz > unmarkedDuplicates.bam

...I should MarkDuplicates. I do this like this:

gatk MarkDuplicates \
    -I unmarkedDuplicates.bam \
    -O markedDuplicates.bam \
    -M DuplicationMetrics.txt

This fails with the following error:

picard.PicardException: This program requires input that are either coordinate or query sorted. Found unsorted

Am I doing something wrong or should it be reversed in the description/best practices? It is of course easy to just sort first but I really want to follow your guides as close as possible.

Highest regards,


Best Answer


Sign In or Register to comment.