To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

Error MarkDuplicates (GATK4, Best Practices)

Hi!

I'm trying to create the GATK4 pipeline but during the "MarkDuplicates" step I have the below error:

CODE

java -Dsamjdk.compression_level=${cl} -Xms5000m -jar ${ph3} MarkDuplicates INPUT=${SM}".fixedBAM" OUTPUT=${SM}".markdupBAM" METRICS_FILE=${SM}".dupmetrics.txt" VALIDATION_STRINGENCY=SILENT
OPTICAL_DUPLICATE_PIXEL_DISTANCE=2500 ASSUME_SORT_ORDER=queryname CREATE_MD5_FILE=true

ERROR

16:40:10.529 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/share/apps/bio/picard-2.17.3/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Tue Jan 23 16:40:10 CET 2018] MarkDuplicates INPUT=[P3.fixedBAM] OUTPUT=P3.duplBAM METRICS_FILE=P3.duplmetrics ASSUME_SORT_ORDER=queryname OPTICAL_DUPLICATE_PIXEL_DISTANC
[Tue Jan 23 16:40:10 CET 2018] Executing as manolis@genemonster on Linux 3.5.0-36-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_91-b14; Deflater: Intel; Inflater:
INFO 2018-01-23 16:40:10 MarkDuplicates Start of doWork freeMemory: 4989389904; totalMemory: 5024776192; maxMemory: 9544663040
INFO 2018-01-23 16:40:10 MarkDuplicates Reading input file and constructing read end information.
INFO 2018-01-23 16:40:10 MarkDuplicates Will retain up to 34582112 data points before spilling to disk.
INFO 2018-01-23 16:40:23 MarkDuplicates Read 1,000,000 records. Elapsed time: 00:00:12s. Time for last 1,000,000: 12s. Last read position: chr1:26,775,551
INFO 2018-01-23 16:40:23 MarkDuplicates Tracking 3446 as yet unmatched pairs. 307 records in RAM.
...
...
...
INFO 2018-01-23 16:51:52 MarkDuplicates Read 48,000,000 records. Elapsed time: 00:11:41s. Time for last 1,000,000: 18s. Last read position: chrX:149,775,12
INFO 2018-01-23 16:51:52 MarkDuplicates Tracking 11089 as yet unmatched pairs. 381 records in RAM.
INFO 2018-01-23 16:52:03 MarkDuplicates Read 48702879 records. 0 pairs never matched.
INFO 2018-01-23 16:52:29 MarkDuplicates After buildSortedReadEndLists freeMemory: 5887047584; totalMemory: 7838105600; maxMemory: 9544663040
INFO 2018-01-23 16:52:29 MarkDuplicates Will retain up to 298270720 duplicate indices before spilling to disk.
INFO 2018-01-23 16:52:30 MarkDuplicates Traversing read pair information and detecting duplicates.
INFO 2018-01-23 16:52:38 MarkDuplicates Traversing fragment information and detecting duplicates.
INFO 2018-01-23 16:53:12 MarkDuplicates Sorting list of duplicate records.
INFO 2018-01-23 16:53:13 MarkDuplicates After generateDuplicateIndexes freeMemory: 5780984880; totalMemory: 8239185920; maxMemory: 9544663040
INFO 2018-01-23 16:53:13 MarkDuplicates Marking 5693783 records as duplicates.
INFO 2018-01-23 16:53:13 MarkDuplicates Found 1249125 optical duplicate clusters.
INFO 2018-01-23 16:53:13 MarkDuplicates Reads are assumed to be ordered by: queryname
[Tue Jan 23 16:53:14 CET 2018] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 13.06 minutes.
Runtime.totalMemory()=8239185920
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.IllegalArgumentException: Alignments added out of order in SAMFileWriterImpl.addAlignment for file:///home/manolis/GATK4/IlluminaExome
at htsjdk.samtools.SAMFileWriterImpl.assertPresorted(SAMFileWriterImpl.java:213)
at htsjdk.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:200)
at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:406)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:269)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108)

The previous steps are (I started from fastq files and I performed the trimming step with trim_galore):

java -jar ${ph3} FastqToSam F1=${fol2}/${val1} F2=${fol2}/${val2} O=${fol4}/${SM}".uBAM" SO=queryname RG=${PU1}"."${PU2} SM=${SM} LB=${LB} PL=${PL}

java -Dsamjdk.compression_level=${cl} -Xms5000m -jar ${ph3} SamToFastq INPUT=${SM}".uBAM" FASTQ=/dev/stdout INTERLEAVE=true NON_PF=true | ${ph4} mem -K 100000000 -p -v 3 -t 16 -Y ${gnm} /dev/stdin | samtools view -1 - > ${SM}".bwaBAM"

java -Dsamjdk.compression_level=${cl} -Xms5000m -jar ${ph3} MergeBamAlignment VALIDATION_STRINGENCY=SILENT ORIENTATIONS=FR ATTRIBUTES_TO_RETAIN=X0 UNMAPPED=${ubam} ALIGNED=${bwabam} O=${SM}".mergedBAM" R=${gnm} PE=true SO=unsorted IS_BISULFITE_SEQUENCE=false ALIGNED_READS_ONLY=false CLIP_ADAPTERS=false MAX_RECORDS_IN_RAM=2000000 MC=true MAX_GAPS=-1 PRIMARY_ALIGNMENT_STRATEGY=MostDistant UNMAPPED_READ_STRATEGY=COPY_TO_TAG ALIGNER_PROPER_PAIR_FLAGS=true UNMAP_CONTAM=true

java -Dsamjdk.compression_level=${cl} -Xms5000m -jar ${ph3} SortSam INPUT=${SM}".mergedBAM" O=/dev/stdout SORT_ORDER=coordinate CREATE_INDEX=false CREATE_MD5_FILE=false | java -Dsamjdk.compression_level=${cl} -Xms5000m -jar ${ph3} SetNmAndUqTags INPUT=/dev/stdin O=${SM}".fixedBAM" CREATE_INDEX=true CREATE_MD5_FILE=true REFERENCE_SEQUENCE=${gnm}

then.. MarkDuplicates

Thank you for you time!
All the best,
manolis

Best Answers

  • manolismanolis Member
    Accepted Answer

    I think that I found my error... looking the "call" steps they report first the MarkDuplicates and then SortAndFix... if you go to see the "task" you will find the opposite order of steps. I used first MarkDuplicate and then SortAndFix and now works correctly...

    Thank you SkyWarrior

Answers

  • SkyWarriorSkyWarrior TurkeyMember

    You seem to coordinate sort your reads and ask MarkDuplicates to assume the sort order as queryname. I guess that's the main issue. Change Assume Sort Order parameter to coordinate and all shall be forgiven :smiley:

  • manolismanolis Member
    Accepted Answer

    I think that I found my error... looking the "call" steps they report first the MarkDuplicates and then SortAndFix... if you go to see the "task" you will find the opposite order of steps. I used first MarkDuplicate and then SortAndFix and now works correctly...

    Thank you SkyWarrior

Sign In or Register to comment.