This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
SortSam before MarkDuplicates?
Hi GATK team,
I'm setting up a GATK best practices workflow. It is described here: https://software.broadinstitute.org/gatk/best-practices/workflow?id=11165 that after mapping, which I did like this:
bwa mem -M -t 8 Homo_sapiens.GRCh38.dna.primary_assembly.fa R1_001.fastq.gz R2_001.fastq.gz > unmarkedDuplicates.bam
...I should MarkDuplicates. I do this like this:
gatk MarkDuplicates \ -I unmarkedDuplicates.bam \ -O markedDuplicates.bam \ -M DuplicationMetrics.txt
This fails with the following error:
picard.PicardException: This program requires input that are either coordinate or query sorted. Found unsorted
Am I doing something wrong or should it be reversed in the description/best practices? It is of course easy to just sort first but I really want to follow your guides as close as possible.