Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!

Error MarkDuplicates (GATK4, Best Practices)


I'm trying to create the GATK4 pipeline but during the "MarkDuplicates" step I have the below error:


java -Dsamjdk.compression_level=${cl} -Xms5000m -jar ${ph3} MarkDuplicates INPUT=${SM}".fixedBAM" OUTPUT=${SM}".markdupBAM" METRICS_FILE=${SM}".dupmetrics.txt" VALIDATION_STRINGENCY=SILENT


16:40:10.529 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/share/apps/bio/picard-2.17.3/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Tue Jan 23 16:40:10 CET 2018] MarkDuplicates INPUT=[P3.fixedBAM] OUTPUT=P3.duplBAM METRICS_FILE=P3.duplmetrics ASSUME_SORT_ORDER=queryname OPTICAL_DUPLICATE_PIXEL_DISTANC
[Tue Jan 23 16:40:10 CET 2018] Executing as manolis[email protected] on Linux 3.5.0-36-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_91-b14; Deflater: Intel; Inflater:
INFO 2018-01-23 16:40:10 MarkDuplicates Start of doWork freeMemory: 4989389904; totalMemory: 5024776192; maxMemory: 9544663040
INFO 2018-01-23 16:40:10 MarkDuplicates Reading input file and constructing read end information.
INFO 2018-01-23 16:40:10 MarkDuplicates Will retain up to 34582112 data points before spilling to disk.
INFO 2018-01-23 16:40:23 MarkDuplicates Read 1,000,000 records. Elapsed time: 00:00:12s. Time for last 1,000,000: 12s. Last read position: chr1:26,775,551
INFO 2018-01-23 16:40:23 MarkDuplicates Tracking 3446 as yet unmatched pairs. 307 records in RAM.
INFO 2018-01-23 16:51:52 MarkDuplicates Read 48,000,000 records. Elapsed time: 00:11:41s. Time for last 1,000,000: 18s. Last read position: chrX:149,775,12
INFO 2018-01-23 16:51:52 MarkDuplicates Tracking 11089 as yet unmatched pairs. 381 records in RAM.
INFO 2018-01-23 16:52:03 MarkDuplicates Read 48702879 records. 0 pairs never matched.
INFO 2018-01-23 16:52:29 MarkDuplicates After buildSortedReadEndLists freeMemory: 5887047584; totalMemory: 7838105600; maxMemory: 9544663040
INFO 2018-01-23 16:52:29 MarkDuplicates Will retain up to 298270720 duplicate indices before spilling to disk.
INFO 2018-01-23 16:52:30 MarkDuplicates Traversing read pair information and detecting duplicates.
INFO 2018-01-23 16:52:38 MarkDuplicates Traversing fragment information and detecting duplicates.
INFO 2018-01-23 16:53:12 MarkDuplicates Sorting list of duplicate records.
INFO 2018-01-23 16:53:13 MarkDuplicates After generateDuplicateIndexes freeMemory: 5780984880; totalMemory: 8239185920; maxMemory: 9544663040
INFO 2018-01-23 16:53:13 MarkDuplicates Marking 5693783 records as duplicates.
INFO 2018-01-23 16:53:13 MarkDuplicates Found 1249125 optical duplicate clusters.
INFO 2018-01-23 16:53:13 MarkDuplicates Reads are assumed to be ordered by: queryname
[Tue Jan 23 16:53:14 CET 2018] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 13.06 minutes.
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.IllegalArgumentException: Alignments added out of order in SAMFileWriterImpl.addAlignment for file:///home/manolis/GATK4/IlluminaExome
at htsjdk.samtools.SAMFileWriterImpl.assertPresorted(SAMFileWriterImpl.java:213)
at htsjdk.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:200)
at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:406)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:269)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108)

The previous steps are (I started from fastq files and I performed the trimming step with trim_galore):

java -jar ${ph3} FastqToSam F1=${fol2}/${val1} F2=${fol2}/${val2} O=${fol4}/${SM}".uBAM" SO=queryname RG=${PU1}"."${PU2} SM=${SM} LB=${LB} PL=${PL}

java -Dsamjdk.compression_level=${cl} -Xms5000m -jar ${ph3} SamToFastq INPUT=${SM}".uBAM" FASTQ=/dev/stdout INTERLEAVE=true NON_PF=true | ${ph4} mem -K 100000000 -p -v 3 -t 16 -Y ${gnm} /dev/stdin | samtools view -1 - > ${SM}".bwaBAM"


java -Dsamjdk.compression_level=${cl} -Xms5000m -jar ${ph3} SortSam INPUT=${SM}".mergedBAM" O=/dev/stdout SORT_ORDER=coordinate CREATE_INDEX=false CREATE_MD5_FILE=false | java -Dsamjdk.compression_level=${cl} -Xms5000m -jar ${ph3} SetNmAndUqTags INPUT=/dev/stdin O=${SM}".fixedBAM" CREATE_INDEX=true CREATE_MD5_FILE=true REFERENCE_SEQUENCE=${gnm}

then.. MarkDuplicates

Thank you for you time!
All the best,

Best Answers

  • manolismanolis ✭✭
    Accepted Answer

    I think that I found my error... looking the "call" steps they report first the MarkDuplicates and then SortAndFix... if you go to see the "task" you will find the opposite order of steps. I used first MarkDuplicate and then SortAndFix and now works correctly...

    Thank you SkyWarrior


  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    You seem to coordinate sort your reads and ask MarkDuplicates to assume the sort order as queryname. I guess that's the main issue. Change Assume Sort Order parameter to coordinate and all shall be forgiven :smiley:

  • manolismanolis Member ✭✭
    Accepted Answer

    I think that I found my error... looking the "call" steps they report first the MarkDuplicates and then SortAndFix... if you go to see the "task" you will find the opposite order of steps. I used first MarkDuplicate and then SortAndFix and now works correctly...

    Thank you SkyWarrior

Sign In or Register to comment.