Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

SortSam before MarkDuplicates?

Hi GATK team,

I'm setting up a GATK best practices workflow. It is described here: https://software.broadinstitute.org/gatk/best-practices/workflow?id=11165 that after mapping, which I did like this:

bwa mem -M -t 8 Homo_sapiens.GRCh38.dna.primary_assembly.fa R1_001.fastq.gz R2_001.fastq.gz > unmarkedDuplicates.bam

...I should MarkDuplicates. I do this like this:

gatk MarkDuplicates \
    -I unmarkedDuplicates.bam \
    -O markedDuplicates.bam \
    -M DuplicationMetrics.txt

This fails with the following error:

picard.PicardException: This program requires input that are either coordinate or query sorted. Found unsorted

Am I doing something wrong or should it be reversed in the description/best practices? It is of course easy to just sort first but I really want to follow your guides as close as possible.

Highest regards,


Best Answer


Sign In or Register to comment.