Error MarkDuplicates (GATK4, Best Practices)


I'm trying to create the GATK4 pipeline but during the "MarkDuplicates" step I have the below error:


java -Dsamjdk.compression_level=${cl} -Xms5000m -jar ${ph3} MarkDuplicates INPUT=${SM}".fixedBAM" OUTPUT=${SM}".markdupBAM" METRICS_FILE=${SM}".dupmetrics.txt" VALIDATION_STRINGENCY=SILENT


16:40:10.529 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/share/apps/bio/picard-2.17.3/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Tue Jan 23 16:40:10 CET 2018] MarkDuplicates INPUT=[P3.fixedBAM] OUTPUT=P3.duplBAM METRICS_FILE=P3.duplmetrics ASSUME_SORT_ORDER=queryname OPTICAL_DUPLICATE_PIXEL_DISTANC
[Tue Jan 23 16:40:10 CET 2018] Executing as manolis[email protected] on Linux 3.5.0-36-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_91-b14; Deflater: Intel; Inflater:
INFO 2018-01-23 16:40:10 MarkDuplicates Start of doWork freeMemory: 4989389904; totalMemory: 5024776192; maxMemory: 9544663040
INFO 2018-01-23 16:40:10 MarkDuplicates Reading input file and constructing read end information.
INFO 2018-01-23 16:40:10 MarkDuplicates Will retain up to 34582112 data points before spilling to disk.
INFO 2018-01-23 16:40:23 MarkDuplicates Read 1,000,000 records. Elapsed time: 00:00:12s. Time for last 1,000,000: 12s. Last read position: chr1:26,775,551
INFO 2018-01-23 16:40:23 MarkDuplicates Tracking 3446 as yet unmatched pairs. 307 records in RAM.
INFO 2018-01-23 16:51:52 MarkDuplicates Read 48,000,000 records. Elapsed time: 00:11:41s. Time for last 1,000,000: 18s. Last read position: chrX:149,775,12
INFO 2018-01-23 16:51:52 MarkDuplicates Tracking 11089 as yet unmatched pairs. 381 records in RAM.
INFO 2018-01-23 16:52:03 MarkDuplicates Read 48702879 records. 0 pairs never matched.
INFO 2018-01-23 16:52:29 MarkDuplicates After buildSortedReadEndLists freeMemory: 5887047584; totalMemory: 7838105600; maxMemory: 9544663040
INFO 2018-01-23 16:52:29 MarkDuplicates Will retain up to 298270720 duplicate indices before spilling to disk.
INFO 2018-01-23 16:52:30 MarkDuplicates Traversing read pair information and detecting duplicates.
INFO 2018-01-23 16:52:38 MarkDuplicates Traversing fragment information and detecting duplicates.
INFO 2018-01-23 16:53:12 MarkDuplicates Sorting list of duplicate records.
INFO 2018-01-23 16:53:13 MarkDuplicates After generateDuplicateIndexes freeMemory: 5780984880; totalMemory: 8239185920; maxMemory: 9544663040
INFO 2018-01-23 16:53:13 MarkDuplicates Marking 5693783 records as duplicates.
INFO 2018-01-23 16:53:13 MarkDuplicates Found 1249125 optical duplicate clusters.
INFO 2018-01-23 16:53:13 MarkDuplicates Reads are assumed to be ordered by: queryname
[Tue Jan 23 16:53:14 CET 2018] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 13.06 minutes.
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.IllegalArgumentException: Alignments added out of order in SAMFileWriterImpl.addAlignment for file:///home/manolis/GATK4/IlluminaExome
at htsjdk.samtools.SAMFileWriterImpl.assertPresorted(SAMFileWriterImpl.java:213)
at htsjdk.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:200)
at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:406)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:269)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108)

The previous steps are (I started from fastq files and I performed the trimming step with trim_galore):

java -jar ${ph3} FastqToSam F1=${fol2}/${val1} F2=${fol2}/${val2} O=${fol4}/${SM}".uBAM" SO=queryname RG=${PU1}"."${PU2} SM=${SM} LB=${LB} PL=${PL}

java -Dsamjdk.compression_level=${cl} -Xms5000m -jar ${ph3} SamToFastq INPUT=${SM}".uBAM" FASTQ=/dev/stdout INTERLEAVE=true NON_PF=true | ${ph4} mem -K 100000000 -p -v 3 -t 16 -Y ${gnm} /dev/stdin | samtools view -1 - > ${SM}".bwaBAM"


java -Dsamjdk.compression_level=${cl} -Xms5000m -jar ${ph3} SortSam INPUT=${SM}".mergedBAM" O=/dev/stdout SORT_ORDER=coordinate CREATE_INDEX=false CREATE_MD5_FILE=false | java -Dsamjdk.compression_level=${cl} -Xms5000m -jar ${ph3} SetNmAndUqTags INPUT=/dev/stdin O=${SM}".fixedBAM" CREATE_INDEX=true CREATE_MD5_FILE=true REFERENCE_SEQUENCE=${gnm}

then.. MarkDuplicates

Thank you for you time!
All the best,

  • manolismanolis ✭✭
  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    You seem to coordinate sort your reads and ask MarkDuplicates to assume the sort order as queryname. I guess that's the main issue. Change Assume Sort Order parameter to coordinate and all shall be forgiven :smiley:

  • manolismanolis Member ✭✭
