RevertSam (SANITIZE=TRUE) via pipe leads to SAMException

FPBarthelFPBarthel HoustonMember ✭✭
edited May 2018 in Ask the GATK team

I am experiencing a similar problem here as previously reported: https://gatkforums.broadinstitute.org/gatk/discussion/10406/strange-revertsam-error

Running
samtools view -br {READGROUP_ID} {SAMPLE_BAM} | gatk RevertSam --INPUT=/dev/stdin --OUTPUT={OUTPUT}
Works fine

Adding additional parameters to RevertSam results in an error:

samtools view -br {READGROUP_ID} {SAMPLE_BAM} | gatk RevertSam --INPUT=/dev/stdin --OUTPUT={OUTPUT} --RESTORE_ORIGINAL_QUALITIES=true --VALIDATION_STRINGENCY=SILENT --ATTRIBUTE_TO_CLEAR=AS --ATTRIBUTE_TO_CLEAR=FT --ATTRIBUTE_TO_CLEAR=CO --ATTRIBUTE_TO_CLEAR=XT --ATTRIBUTE_TO_CLEAR=XN --ATTRIBUTE_TO_CLEAR=OC --ATTRIBUTE_TO_CLEAR=OP --SANITIZE=true --SORT_ORDER=queryname --MAX_DISCARD_FRACTION=0.05

Exception in thread "main" htsjdk.samtools.SAMException: Cannot determine candidate qualities: no qualities found.

I have not tested each parameter individually, but I made the following observations:

samtools view -br {READGROUP_ID} {SAMPLE_BAM} | gatk RevertSam --INPUT=/dev/stdin --OUTPUT={OUTPUT} --RESTORE_ORIGINAL_QUALITIES=true

Works fine

samtools view -br {READGROUP_ID} {SAMPLE_BAM} | gatk RevertSam --INPUT=/dev/stdin --OUTPUT={OUTPUT} --SANITIZE=true

Results in a SAMException

P.S. tests were done using small ~300 kb sized BAM file publicly available, happy to share to easily reproduce error

N.B. using GATK 4.0.2.1

Tagged:

Best Answer

Answers

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    @FPBarthel, can you see if you get the same Cannot determine candidate qualities: no qualities found. error if you run the Picard tool alone? I suspect this has to do with the --RESTORE_ORIGINAL_QUALITIES=true parameter. Do your reads have the OQ tag?

  • FPBarthelFPBarthel HoustonMember ✭✭
    edited May 2018

    Reading the error message that's what I suspected too, but oddly enough running RevertSam alone or piped RevertSam with --RESTORE_ORIGINAL_QUALITIES=true alone (as mentioned above) work without issue, regardless of whether OQ tag is present. The issue seems be caused by --SANITIZE=true.

    Does not work

    samtools view -br {READGROUP_ID} {SAMPLE_BAM} | gatk RevertSam --INPUT=/dev/stdin --OUTPUT={OUTPUT} --SANITIZE=true --RESTORE_ORIGINAL_QUALITIES=true 
    

    Does not work

    samtools view -br {READGROUP_ID} {SAMPLE_BAM} | gatk RevertSam --INPUT=/dev/stdin --OUTPUT={OUTPUT} --SANITIZE=true --RESTORE_ORIGINAL_QUALITIES=false 
    

    Works

    samtools view -br {READGROUP_ID} {SAMPLE_BAM} | gatk RevertSam --INPUT=/dev/stdin --OUTPUT={OUTPUT} --SANITIZE=false --RESTORE_ORIGINAL_QUALITIES=true 
    

    Works

    samtools view -br {READGROUP_ID} {SAMPLE_BAM} | gatk RevertSam --INPUT=/dev/stdin --OUTPUT={OUTPUT} --SANITIZE=false --RESTORE_ORIGINAL_QUALITIES=false 
    
  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @FPBarthel.

    This behavior is expected. When you set --SANITIZE=true:

    output sort order is queryname and will always cause sorting to occur

    Which means the process can no longer be streamed.

  • FPBarthelFPBarthel HoustonMember ✭✭

    Not sure I understand your answer, I pipe the output of samtools into RevertSam, which is the final step. How would that be affected?

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭
    edited May 2018

    Sorry @FPBarthel, I don't know the internal workings of the tool, but only that there is an issue ticket for this exact streaming error you are reporting. It is at https://github.com/broadinstitute/picard/issues/1008. And yes, the behavior is apparently odd and unexpected. Feel free to add your two cents to the issue ticket.

  • @shlee, maybe this will help

    The behavior of SANITIZE parameter is described at https://broadinstitute.github.io/picard/command-line-overview.html#RevertSam

    It is said that,

    SANITIZE (Boolean)

    WARNING: This option is potentially destructive. If enabled will discard reads in order to produce a consistent output BAM. Reads discarded include (but are not limited to) paired reads with missing mates, duplicated records, records with mismatches in length of bases and qualities. This option can only be enabled if the output sort order is queryname and will always cause sorting to occur. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}

  • FPBarthelFPBarthel HoustonMember ✭✭
    edited May 2018

    @shlee and @GeorgiiRozhnev thank you both. Unfortunately, using --SANITIZE=true is necessary and important. In my opinion, this parameter is vital in any work dealing with final BAM files from public dataset. The problem with public datasets is that due to different pipelines and (sometimes not very good) conventions, reads will have been discarded, duplicated, missing or whatnot and this parameter helps clean things up. Aside from --RESTORE_ORIGINAL_QUALITIES the --SANITIZE=true parameter is the whole reason I use RevertSam. In some examples in my workflow not using --SANITIZE=true has led to missing mate exceptions and otherwise faulty BAM files.

    Reading the description copied above again I do not see any reason why --SANITIZE=true should lead to a SAMException while --SANITIZE=false should not. Also, using --SANITIZE=true without /dev/stdin as input works fine, as follows:

    Works

    samtools view -br {READGROUP_ID} {SAMPLE_BAM} > tmp.bam
    gatk RevertSam --INPUT=tmp.bam --OUTPUT={OUTPUT} --SANITIZE=true
    

    Does not work

    samtools view -br {READGROUP_ID} {SAMPLE_BAM} | gatk RevertSam --INPUT=/dev/stdin --OUTPUT={OUTPUT} --SANITIZE=true
    
  • FPBarthelFPBarthel HoustonMember ✭✭

    OK that makes sense, thanks! I am using a work-around for now.

  • scalvoscalvo Member

    BTW I had the exact same problem. IE samtools view BAM | gatk RevertSam --INPUT=/dev/stdin --OUTPUT={OUTPUT} --SANITIZE=true leading to error

    Exception in thread "main" htsjdk.samtools.SAMException: Cannot determine candidate qualities: no qualities found.

    The error occurred with Picard version 2.18.5 but magically worked with Picard version 2.18.16.

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @scalvo,

    Thanks for reporting your observations, and especially that Picard v2.18.16 allows RevertSam to accept streamed input without error. Looks like the relevant code change was implemented for v2.18.9, in https://github.com/broadinstitute/picard/pull/1191.

Sign In or Register to comment.