Attention:
The frontline support team will be unavailable to answer questions on April 15th and 17th 2019. We will be back soon after. Thank you for your patience and we apologize for any inconvenience!

about running PICARD's MarkDuplicatesWithMateCigar and SOrtSam

BogdanBogdan Palo Alto, CAMember ✭✭

Dear all, please could you advise on the following :

I do have a BAM file that has been coordinate sorted and indexed (with SortSam). ValidateSam tool in PiCARD gives no errors, however when I run MarkDuplicatesWithMateCigar (MINIMUM_DISTANCE=200 VALIDATION_STRINGENCY=LENIENT) it gives the error "Records were not found coordinate sort order". And indeed these 2 records below seem not to be properly sorted. Is there any way to fix this please ?

thanks,

bogdan

ps : the records that were nor properly sorted were :

INFO 2016-01-08 11:16:55 MarkDuplicatesWithMateCigar Read 433,000,000 records. Elapsed time: 01:28:56s. Time for last 1,000,000: 15s. Last read position: chr4:49,141,738
INFO 2016-01-08 11:16:55 MarkDuplicatesWithMateCigar freeMemory: 1600550872; totalMemory: 1637351424; maxMemory: 7404519424; output buffer size: 18950; duplicate queue size: 2
INFO 2016-01-08 11:17:10 MarkDuplicatesWithMateCigar Read 434,000,000 records. Elapsed time: 01:29:12s. Time for last 1,000,000: 15s. Last read position: chr4:49,151,453
INFO 2016-01-08 11:17:10 MarkDuplicatesWithMateCigar freeMemory: 1633995456; totalMemory: 1810890752; maxMemory: 7404519424; output buffer size: 168579; duplicate queue size: 47

Previous record: C38P6ACXX_0:6:1213:2060455:0 83 chr4 49151731 40 22M10I68M = 49151442 -379 CCATTCCATTCCATTCCATTCTATTTGGTTTAATTCCATTCCATTCCATTCGATTGCAATCGAGTTGATTCCATTCCATTCCATTCCATTCCATTCCATT <<7<<<BBBBB7<<<BBBFBFBBB<7'<000FBBF<FBBFFFFBFB<FF<BFBBFFFF0F<FBB7FFB<F<BFB0FBB0FIFF<F<FFFBBFFFFFBBBB MD:Z:4A36C6C33G4G2 RG:Z:tumor NM:i:15 AS:i:51 XS:i:48

Current record:C38P6ACXX_0:4:2206:2491027:0 147 chr4 49151612 40 33M5D53M14S = 49151439 -264 CATTCCATTCCATTCCATTCCATTCCATTCCATTCCTTTCCTTTCCATTCCATTCCATTCCATTCCATTCCGTTCCGTTCCGTTCCGTTCCGTTCCATTC B<0<<0'<BB<00<<00'''7'<<<<0<<<<<<<<'777<7'BBB<0'0'''BBFB<7BB<'07B7B<BBBB0700B0<FB<7B7FBBB<F<BBF7FBBB MD:Z:33^TGGGG3A4A44 RG:Z:tumor NM:i:7 AS:i:65 XS:i:69

[Fri Jan 08 11:17:13 PST 2016] picard.sam.markduplicates.MarkDuplicatesWithMateCigar done. Elapsed time: 89.26 minutes.
Runtime.totalMemory()=1810890752

To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" picard.PicardException: Records were not found coordinate sort order
at picard.sam.markduplicates.MarkDuplicatesWithMateCigarIterator.next(MarkDuplicatesWithMateCigarIterator.java:228)
at picard.sam.markduplicates.MarkDuplicatesWithMateCigarIterator.next(MarkDuplicatesWithMateCigarIterator.java:47)
at picard.sam.markduplicates.MarkDuplicatesWithMateCigar.doWork(MarkDuplicatesWithMateCigar.java:132)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:206)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)

Comments

  • BogdanBogdan Palo Alto, CAMember ✭✭

    Of course, in the header of the SAM file it says it is sorted :smile:

    @HD VN:1.5 SO:coordinate
    @SQ SN:chrM LN:16571
    @SQ SN:chr1 LN:249250621
    @SQ SN:chr2 LN:243199373
    @SQ SN:chr3 LN:198022430

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Out of curiosity, can you check if the records are actually out of order in the file? MDWMC could be pulling them out in the wrong order.

  • BogdanBogdan Palo Alto, CAMember ✭✭

    Thanks, Geraldine, will get back to you in a minute. Thought I could also ask : is there any major difference between MarkDuplicatesWithMateCigar and MarkDuplicates. Which function would you advise to use ;) ? Thank you.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I would advise you to use vanilla MarkDuplicates. The MC version is intended to be superior because it detects more cases thanks to some slightly different internal logic, but it's not fully mature; there have been enough bugs reported and concerns about performance that I would advise against using it in production.

  • BogdanBogdan Palo Alto, CAMember ✭✭

    great, thanks Geraldine. You were right about the previous error with MarkDuplicatesWithMateCigar :smile:

    the reads were in the correct order in the BAM ORDERED and INDEXED files: "chr4 49151612 READ" is before "chr4 49151731 READ".

    so I assume it is an error due to MC component of MarkDuplicatesWithMateCigar.

    I will check if I am going to get the same error with PICARD's MarkDuplicates. thanks, and happy and warm weekend.

  • BogdanBogdan Palo Alto, CAMember ✭✭

    forgot to ask though : does "MarkDuplicates" require all the reads to have the same length ?

  • BogdanBogdan Palo Alto, CAMember ✭✭

    thanks Geraldine !

Sign In or Register to comment.