five-dollar-genome-analysis-pipeline running errors in CheckContamination

I have cloned the five dollar genome analysis pipeline and uploaded my own WGS input data, however, I encountered problems in CheckContamination stages:

Job germline_single_sample_workflow.CheckContamination:NA:1 exited with return code 1

Workflow ID: 16497c6d-be4c-4579-9202-58960bbde32d

And I checked CheckContamination-stderr.log and found the following error msgs as:

Traceback (most recent call last):
File "", line 6, in
File "/usr/local/lib/python3.6/csv.py", line 111, in next
self.fieldnames
File "/usr/local/lib/python3.6/csv.py", line 98, in fieldnames
self._fieldnames = next(self.reader)
File "/usr/local/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 146: invalid start byte

And then I traced back it is in line 357, open function fails : with open('${output_prefix}.selfSM') as selfSM:

I tried two ways to fix this:
1. change line 354 in bam_processing.wdl from
python3 -> python2,
if we change python3 to python2, then the input .selfSM file can be handled correctly,

  1. change line 357 in bam_processing.wdl
    with open('${output_prefix}.selfSM', ) as selfSM:
    to
    with open('${output_prefix}.selfSM',errors='ignore') as selfSM:

to ignore the unrecognized byte.

Answers

  • KateNKateN Cambridge, MAMember, Broadie, Moderator admin

    You shouldn't need to modify any featured methods to make them work, which leads me to believe there might be another error going on here. Or, if you do need to modify them to make them work, then we need to update our method to fix that error.

    Could you share your workspace using the share button in FireCloud with [email protected], and post the workspace name and submission ID here in this thread?

  • memoryzppmemoryzpp uclaMember

    Thank you for your reply! I shared the workspace.
    name: fccredits-thulium-gold-1705/five-dollar-genome-analysis-pipeline_copy_small
    submissionID: 8bf051fe-f028-40aa-8655-2f45d51e66c7

  • KateNKateN Cambridge, MAMember, Broadie, Moderator admin

    @memoryzpp I apologize that I never got back to you on this; I'm looking into the error now, and looping in some other experts to take a look.

  • ebanksebanks Broad InstituteMember, Broadie, Dev ✭✭✭✭

    Hi, it looks like you have a weird character in your sample name. For some reason, python3 is having a tough time picking it up -- but using python2 is a perfectly good workaround here.

Sign In or Register to comment.