womtool fails on python code in task's command with certain formatted strings

myourshawmyourshaw University of ColoradoMember

Yet another problem with python <<CODE in a task. womtool sometimes reacts badly to python formatted strings where the variables are dict values.

Unrecognized token on line 399, column 139:

            'OUTPUT': f"{sample_metadata['RUN_ID']}.{lane}.{sample_metadata['BARCODE_ID']}.{sample_metadata['LIBRARY_NAME']}.unaligned.bam",
                                                                                                                                          ^

or

Unrecognized token on line 395, column 39:

        o = "{}.{}.{}.{}.unaligned.bam".format(sample_metadata['RUN_ID'], lane, sample_metadata['BARCODE_ID'], sample_metadata['LIBRARY_NAME'])
                                      ^

However, this is OK:

        o = sample_metadata['RUN_ID'] + '.' + str(lane) + '.' + sample_metadata['BARCODE_ID'] + '.' + sample_metadata['LIBRARY_NAME'] + '.unaligned.bam'

Also, this f string later in the command does not cause an error:

'OUTPUT': f"{metadata['RUN_ID']}.{lane}.N.UNKNOWN.unaligned.bam",

The full task:

task CreateLibraryParamsFile {
  String python3_cmd
  File run_metadata
  Int lane

  command {
    ${python3_cmd} <<CODE
    barcode_data = []
    ubams = []

    with open('${run_metadata}', 'r') as ifh:
        metadata = json.load(ifh)

    if metadata['NUM_INDICES'] == 1:
        header = ['BARCODE_1', 'SAMPLE_ALIAS', 'LIBRARY_NAME', 'OUTPUT', 'PM', 'PI', 'DS']
    else:
        header = ['BARCODE_1', 'BARCODE_2', 'SAMPLE_ALIAS', 'LIBRARY_NAME', 'OUTPUT', 'PM', 'PI', 'DS']

    lane_metadata = metadata['LANE_METADATA'].get(str(lane))
    for sample_metadata in lane_metadata:
        barcode_1 = sample_metadata['BARCODE_1']
        barcode_2 = sample_metadata.get('BARCODE_2', '')
        barcode_dict = {
            'SAMPLE_ALIAS': sample_metadata['SAMPLE_ALIAS'],
            'LIBRARY_NAME': sample_metadata['LIBRARY_NAME'],
            'OUTPUT': f"{sample_metadata['RUN_ID']}.{lane}.{sample_metadata['BARCODE_ID']}.{sample_metadata['LIBRARY_NAME']}.unaligned.bam",
            'PM': sample_metadata['PM'],
            'PI': sample_metadata['PI'],
            'DS': sample_metadata['DS'],
            'BARCODE_1': barcode_1,
            'BARCODE_2': barcode_2,
        }
        barcode_data.append([str(barcode_dict[_]) for _ in header])

        ubams.append(barcode_dict['OUTPUT'])

    # add a catchall row for unmatched barcodes
    # do not add this to the list of ubams
    unknown_barcode_dict = {
        'SAMPLE_ALIAS': 'UNKNOWN',
        'LIBRARY_NAME': 'UNKNOWN',
        'OUTPUT': f"{metadata['RUN_ID']}.{lane}.N.UNKNOWN.unaligned.bam",
        'PM': metadata['PM'],
        'PI': '',
        'DS': '',
        'BARCODE_1': 'N',
        'BARCODE_2': 'N',
    }
    barcode_data.append([str(unknown_barcode_dict[_]) for _ in header])

    print('\t'.join(header))

    for d in barcode_data:
        print('\t'.join(d))

    # list of ubam files
    with open('ubams', 'w') as ufh:
        for u in ubams:
            ufh.write(u + '\n')
    CODE
  }
  runtime {
    memory: "1G"
    cpu: 1
  }
  output {
    File library_params = stdout()
    Array[String] ubams = read_lines("./ubams")
  }
}

Answers

Sign In or Register to comment.