Holiday Notice:
The Frontline Support team will be offline February 18 for President's Day but will be back February 19th. Thank you for your patience as we get to all of your questions!

SortSam before MarkDuplicates?

Hi GATK team,

I'm setting up a GATK best practices workflow. It is described here: https://software.broadinstitute.org/gatk/best-practices/workflow?id=11165 that after mapping, which I did like this:

bwa mem -M -t 8 Homo_sapiens.GRCh38.dna.primary_assembly.fa R1_001.fastq.gz R2_001.fastq.gz > unmarkedDuplicates.bam

...I should MarkDuplicates. I do this like this:

gatk MarkDuplicates \
    -I unmarkedDuplicates.bam \
    -O markedDuplicates.bam \
    -M DuplicationMetrics.txt

This fails with the following error:

picard.PicardException: This program requires input that are either coordinate or query sorted. Found unsorted

Am I doing something wrong or should it be reversed in the description/best practices? It is of course easy to just sort first but I really want to follow your guides as close as possible.

Highest regards,

Freek

Best Answer

Answers

Sign In or Register to comment.