Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Picard IlluminaBasecallsToFastq: Row for barcode appears more than once in MULTIPLEX_PARAMS

dinvladdinvlad Member, Broadie, Dev
edited August 2018 in Ask the GATK team

Hi Team,

We're trying to use IlluminaBasecallsToFastq to de-multiplex BCL reads with dual index. Here's an excerpt from MULTIPLEX_PARAMS file:

OUTPUT_PREFIX   BARCODE_1       BARCODE_2
CoPA_13546      TCGCTAGA        CGAATCGT
CoPA_13547      TCGCTAGA        CACTGGAT
CoPA_13543      ACAGTTGA        CACTGGAT
CoPA_13545      AGCATGGA        CACTGGAT

Here, we specify the molecular (Illumina) barcode as BARCODE_1 and our "sample" barcode as BARCODE_2. The corresponding READ_STRUCTURE=76T8M1S8B58T, with the full command

java -Xmx20g -jar ../../Software/picard.jar IlluminaBasecallsToFastq \
  READ_STRUCTURE=76T8M1S8B58T \
  BASECALLS_DIR=. \
  LANE=001 \
  MULTIPLEX_PARAMS=multiplex.tsv \
  RUN_BARCODE=run01 \
  READ_NAME_FORMAT=ILLUMINA

However, for some reason Picard displays an error saying

WARNING 2018-08-14 21:17:44     IlluminaBasecallsToFastq        ADAPTERS_TO_CHECK is not used
21:17:44.756 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/data/Software/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Tue Aug 14 21:17:44 UTC 2018] IlluminaBasecallsToFastq BASECALLS_DIR=. LANE=1 RUN_BARCODE=run01 READ_STRUCTURE=76T8M1S8B58T MULTIPLEX_PARAMS=multiplex.tsv READ_NAME_FORMAT=ILLUMINA    NUM_PROCESSORS=0 APPLY_EAMSS_FILTER=true FORCE_GC=true MAX_READS_IN_RAM_PER_TILE=1200000 MINIMUM_QUALITY=2 INCLUDE_NON_PF_READS=trueIGNORE_UNEXPECTED_BARCODES=false COMPRESS_OUTPUTS=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Tue Aug 14 21:17:44 UTC 2018] Executing as [email protected] on Linux 4.9.0-7-amd64 amd64; OpenJDK 64-Bit Server VM 9-Debian+0-9b181-4bpo91; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.18.11-SNAPSHOT
[Tue Aug 14 21:10:08 UTC 2018] picard.illumina.IlluminaBasecallsToFastq done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=494927872
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" picard.PicardException: Row for barcode TCGCTAGA appears more than once in MULTIPLEX_PARAMS file multiplex.tsv
        at picard.illumina.IlluminaBasecallsToFastq.populateWritersFromMultiplexParams(IlluminaBasecallsToFastq.java:348)

We don't understand what is the issue here, because by the very definition of de-multiplexing (if we understand it correctly), it should only treat pairs of BARCODE_1 and BARCODE_2 as unique, and not necessarily each individual barcode.

Thank you

Post edited by dinvlad on

Best Answers

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @dinvlad
    Hi,

    I am checking with the team and will get back to you.

    -Sheila

  • dinvladdinvlad Member, Broadie, Dev

    Thank you @Sheila,

    Sorry that I haven't updated the post - we did indeed determine that we are using both barcodes as sample barcodes, so we adjusted the read structure accordingly and then had to run ExtractIlluminaBarcodes prior to IlluminaBasecallsToFastq because the latter required it. So we're all set.

Sign In or Register to comment.