Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Picard ExtractIlluminaBarcodes Error

Hello
I'm using ExtractIlluminaBarcodes (picard version 2.18.7) for the first time and am encountering an error with the command

java -jar picard.jar ExtractIlluminaBarcodes \
BASECALLS_DIR=/project/JIY3012/work/data/BaseCalls/ \
LANE=1 \
READ_STRUCTURE=250T8B250T \
BARCODE_FILE=/project/JIY3012/work/data/barcode_file \
METRICS_FILE=250T8B250T_metrics_output.txt \
NUM_PROCESSORS=36 \
MAX_MISMATCHES=0

This produces:

11:03:10.765 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/rosema1/BioInfo/bin/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Thu Jun 14 11:03:10 EDT 2018] ExtractIlluminaBarcodes BASECALLS_DIR=/project/JIY3012/work/data/BaseCalls LANE=1 READ_STRUCTURE=250T8B250T BARCODE_FILE=/project/JIY3012/work/data/barcode_file METRICS_FILE=250T8B250T_metrics_output.txt MAX_MISMATCHES=0 NUM_PROCESSORS=36 MIN_MISMATCH_DELTA=1 MAX_NO_CALLS=2 MINIMUM_BASE_QUALITY=0 MINIMUM_QUALITY=2 COMPRESS_OUTPUTS=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Thu Jun 14 11:03:10 EDT 2018] Executing as [email protected] on Linux 2.6.32-696.18.7.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_60-b27; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.18.7-SNAPSHOT
INFO 2018-06-14 11:03:10 ExtractIlluminaBarcodes Processing with 36 PerTileBarcodeExtractor(s).
[Thu Jun 14 11:03:10 EDT 2018] picard.illumina.ExtractIlluminaBarcodes done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=2058354688
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" picard.PicardException: Expected CycledIlluminaFileMap to contain 8 cycles but only 0 were found!
at picard.illumina.parser.CycleIlluminaFileMap.assertValid(CycleIlluminaFileMap.java:66)
at picard.illumina.parser.IlluminaDataProviderFactory.makeParser(IlluminaDataProviderFactory.java:407)
at picard.illumina.parser.IlluminaDataProviderFactory.makeDataProvider(IlluminaDataProviderFactory.java:292)
at picard.illumina.ExtractIlluminaBarcodes$PerTileBarcodeExtractor.(ExtractIlluminaBarcodes.java:750)
at picard.illumina.ExtractIlluminaBarcodes.doWork(ExtractIlluminaBarcodes.java:317)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:282)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

Perhaps this has something to do with my READ_STRUCTURE string (250T8B250T). These libraries were sequenced with dual unique barcodes with UMIs. I am interested in processing them using single indices (hence my attempted use of 250T8B250T), dual unique indices (250T8B8B250T), and dual unique indices with UMIs (250T8B9M8B250T). I am not confident that these READ_STRUCTURES are correct or if this is the cause of the error.

Additionally, my barcode file looks like this:

barcode_sequence_1 barcode_sequence_2 barcode_name library_name
CTGATCGTNNNNNNNNN ATATGCGC Dual Index UMI Adapter 1 GAR2161A459
ACTCTCGANNNNNNNNN TGGTACAG Dual Index UMI Adapter 2 GAR2161A460
TGAGCTAGNNNNNNNNN AACCGTTC Dual Index UMI Adapter 3 GAR2161A461
GAGACGATNNNNNNNNN TAACCGGT Dual Index UMI Adapter 4 GAR2161A462
CTTGTCGANNNNNNNNN GAACATCG Dual Index UMI Adapter 5 GAR2161A463
TTCCAAGGNNNNNNNNN CCTTGTAG Dual Index UMI Adapter 6 GAR2161A464
CGCATGATNNNNNNNNN TCAGGCTT Dual Index UMI Adapter 7 GAR2161A465
ACGGAACANNNNNNNNN GTTCTCGT Dual Index UMI Adapter 8 GAR2161A466
CGGCTAATNNNNNNNNN AGAACGAG Dual Index UMI Adapter 9 9
ATCGATCGNNNNNNNNN TGCTTCCA Dual Index UMI Adapter 10 10
GCAAGATCNNNNNNNNN CTTCGACT Dual Index UMI Adapter 11 11
(etc.)

I included all 384 barcodes as I am interested in observing any cross-talk that occurs.

Thank you for your help

Mark

Best Answers

  • markmark
    Accepted Answer

    OK, I have discovered the cause of this problem. I did not have the right information about the length of the reads in this run and so my READ_STRUCTURE was wrong. If you encounter this error, refer to the file RunInfo.xml in the base directory of the sequencing run. Mine for instance is:

    [[email protected] ~]$ cat /data/SBI_Illumina_GA2_runs/180323_M00831_0294_0000000 00-BKBM5/RunInfo.xml
    <?xml version="1.0"?>


    000000000-BKBM5
    M00831
    180323







    From this you can derive the following READ_STRUCTURES (NOTE that read number 2 above in my case is the 8bp i7 index + the 9bp UMI)

    single index 100T8B9S8S50T (or 100T8B17S50T)
    dual index 100T8B9S8B50T
    dual index w UMI 100T8B9M8B50T

    Hope this helps

    Mark

  • markmark
    Accepted Answer

    Sorry, the xml above did not paste correctly

    [[email protected] ~]$ cat /data/SBI_Illumina_GA2_runs/180323_M00831_0294_0000000 00-BKBM5/RunInfo.xml <?xml version="1.0"?> <RunInfo xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.o rg/2001/XMLSchema-instance" Version="2"> <Run Id="180323_M00831_0294_000000000-BKBM5" Number="294"> <Flowcell>000000000-BKBM5</Flowcell> <Instrument>M00831</Instrument> <Date>180323</Date> <Reads> <Read NumCycles="100" Number="1" IsIndexedRead="N" /> <Read NumCycles="17" Number="2" IsIndexedRead="Y" /> <Read NumCycles="8" Number="3" IsIndexedRead="Y" /> <Read NumCycles="50" Number="4" IsIndexedRead="N" /> </Reads> <FlowcellLayout LaneCount="1" SurfaceCount="2" SwathCount="1" TileCount="19" /> </Run>

Answers

  • markmark Member
    Accepted Answer

    OK, I have discovered the cause of this problem. I did not have the right information about the length of the reads in this run and so my READ_STRUCTURE was wrong. If you encounter this error, refer to the file RunInfo.xml in the base directory of the sequencing run. Mine for instance is:

    [[email protected] ~]$ cat /data/SBI_Illumina_GA2_runs/180323_M00831_0294_0000000 00-BKBM5/RunInfo.xml
    <?xml version="1.0"?>


    000000000-BKBM5
    M00831
    180323







    From this you can derive the following READ_STRUCTURES (NOTE that read number 2 above in my case is the 8bp i7 index + the 9bp UMI)

    single index 100T8B9S8S50T (or 100T8B17S50T)
    dual index 100T8B9S8B50T
    dual index w UMI 100T8B9M8B50T

    Hope this helps

    Mark

  • markmark Member
    Accepted Answer

    Sorry, the xml above did not paste correctly

    [[email protected] ~]$ cat /data/SBI_Illumina_GA2_runs/180323_M00831_0294_0000000 00-BKBM5/RunInfo.xml <?xml version="1.0"?> <RunInfo xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.o rg/2001/XMLSchema-instance" Version="2"> <Run Id="180323_M00831_0294_000000000-BKBM5" Number="294"> <Flowcell>000000000-BKBM5</Flowcell> <Instrument>M00831</Instrument> <Date>180323</Date> <Reads> <Read NumCycles="100" Number="1" IsIndexedRead="N" /> <Read NumCycles="17" Number="2" IsIndexedRead="Y" /> <Read NumCycles="8" Number="3" IsIndexedRead="Y" /> <Read NumCycles="50" Number="4" IsIndexedRead="N" /> </Reads> <FlowcellLayout LaneCount="1" SurfaceCount="2" SwathCount="1" TileCount="19" /> </Run>

  • SheilaSheila Broad InstituteMember, Broadie admin

    @mark
    Hi Mark,

    Thanks for posting your solution.

    -Sheila

Sign In or Register to comment.