Picard on GCS data

shbriefshbrief New YorkMember
edited February 14 in Ask the GATK team

I'm trying to build .dict file for the reference fasta on GCS (gs://genomics-public-data/references/GRCh38_Verily/GRCh38_Verily_v1.genome.fa).
I used picardcloud.jar but it seems like still can't read the file on GCS.
Can you help me to fix this issue? Thanks!

$ java -jar build/libs/picardcloud.jar CreateSequenceDictionary R=gs://genomics-public-data/references/GRCh38_Verily/GRCh38_Verily_v1.genome.fa O=GRCh38_Verily_v1.genome.fa.dict
INFO    2019-02-14 02:55:42     CreateSequenceDictionary

********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
**********    CreateSequenceDictionary -R gs://genomics-public-data/references/GRCh38_Verily/GRCh38_Verily_v1.genome.fa -O GRCh38_Verily_v1.genome.fa.dict
**********


02:55:42.726 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/shbrief/picard/build/libs/picardcloud.jar!/com/intel/gkl/native/libgkl_compression.so
[Thu Feb 14 02:55:42 EST 2019] CreateSequenceDictionary OUTPUT=GRCh38_Verily_v1.genome.fa.dict REFERENCE=gs:/genomics-public-data/references/GRCh38_Verily/GRCh38_Verily_v1.genome.fa    TRUNCATE_NAMES_AT_WHITESPACE=true NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Thu Feb 14 02:55:42 EST 2019] Executing as [email protected] on Linux 4.14.74+ amd64; OpenJDK 64-Bit Server VM 1.8.0_181-8u181-b13-2~
deb9u1-b13; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: 2.18.27-SNAPSHOT
[Thu Feb 14 02:55:42 EST 2019] picard.sam.CreateSequenceDictionary done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=28442624
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMException: Error opening file: gs:/genomics-public-data/references/GRCh38_Verily/GRCh38_Verily_v1.genome.fa
        at htsjdk.samtools.util.IOUtil.openFileForReading(IOUtil.java:637)
        at htsjdk.samtools.reference.FastaSequenceFile.<init>(FastaSequenceFile.java:64)
        at htsjdk.samtools.reference.ReferenceSequenceFileFactory.getReferenceSequenceFile(ReferenceSequenceFileFactory.java:140)
        at htsjdk.samtools.reference.ReferenceSequenceFileFactory.getReferenceSequenceFile(ReferenceSequenceFileFactory.java:96)
        at htsjdk.samtools.reference.ReferenceSequenceFileFactory.getReferenceSequenceFile(ReferenceSequenceFileFactory.java:84)
        at picard.sam.CreateSequenceDictionary.doWork(CreateSequenceDictionary.java:220)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:295)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
        at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
Caused by: java.nio.file.NoSuchFileException: gs:/genomics-public-data/references/GRCh38_Verily/GRCh38_Verily_v1.genome.fa
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
        at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
        at java.nio.file.Files.newByteChannel(Files.java:361)
        at java.nio.file.Files.newByteChannel(Files.java:407)
        at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384)
        at java.nio.file.Files.newInputStream(Files.java:152)
        at htsjdk.samtools.util.IOUtil.openFileForReading(IOUtil.java:633)
        ... 8 more
Tagged:

Best Answer

Answers

  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin

    It appears that some instructions have been added in the warning:

    INFO    2019-02-14 02:55:42     CreateSequenceDictionary
    
    ********** NOTE: Picard's command line syntax is changing.
    **********
    ********** For more information, please see:
    ********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
    **********
    ********** The command line looks like this in the new syntax:
    **********
    **********    CreateSequenceDictionary -R gs://genomics-public-data/references/GRCh38_Verily/GRCh38_Verily_v1.genome.fa -O GRCh38_Verily_v1.genome.fa.dict
    

    It appears that you may have used the old syntax. Try making this change and see if the same error repeats.

  • shbriefshbrief New YorkMember

    @AdelaideR
    Thanks for your answer! The problem is... neither of them seems to work.
    Here is the help page I got.

    $ java -jar build/libs/picardcloud.jar CreateSequenceDictionary -h
    USAGE: CreateSequenceDictionary [options]
    
    Documentation: http://broadinstitute.github.io/picard/command-line-overview.html#CreateSequenceDictionary
    
    Creates a sequence dictionary for a reference sequence.  This tool creates a sequence dictionary file (with ".dict"
    extension) from a reference sequence provided in FASTA format, which is required by many processing and analysis tools.
    The output file contains a header but no SAMRecords, and the header contains only sequence records.
    
    The reference sequence can be gzipped (both .fasta and .fasta.gz are supported).
    Usage example:
    
    java -jar picard.jar CreateSequenceDictionary \
    R=reference.fasta \
    O=reference.dict
    
    
    Version: 2.18.27-SNAPSHOT
    

    So if I run my code with the new, suggest (?) syntax, this is what I get.

    $ java -jar build/libs/picardcloud.jar CreateSequenceDictionary -R gs://genomics-public-data/references/GRCh38_Verily/GRCh38_Verily_v1.genome.fa -O GRCh38_Verily_v1.genome.fa.dict
    ERROR: Invalid argument '-R'.
    
    USAGE: CreateSequenceDictionary [options]
    
    Documentation: http://broadinstitute.github.io/picard/command-line-overview.html#CreateSequenceDictionary
    
    Creates a sequence dictionary for a reference sequence.  This tool creates a sequence dictionary file (with ".dict"
    extension) from a reference sequence provided in FASTA format, which is required by many processing and analysis tools.
    The output file contains a header but no SAMRecords, and the header contains only sequence records.
    
    The reference sequence can be gzipped (both .fasta and .fasta.gz are supported).
    Usage example:
    
    java -jar picard.jar CreateSequenceDictionary \
    R=reference.fasta \
    O=reference.dict
    
    
    Version: 2.18.27-SNAPSHOT
    

    Thought?

  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin

    Try replacing java -jar build/libs/picardcloud.jar with just

    gatk CreateSequenceDictionary -R gs://genomics-public-data/references/GRCh38_Verily/GRCh38_Verily_v1.genome.fa -O GRCh38_Verily_v1.genome.fa.dict

    That seemed to work for me.

  • shbriefshbrief New YorkMember
    edited February 15

    @AdelaideR
    Humm... it's still not working. ;(

    gatk CreateSequenceDictionary -R gs://genomics-public-data/references/GRCh38_Verily/GRCh38_Verily_v1.genome.fa -O GRCh38_Verily_v1.genome.fa.dict
    Using GATK jar /gatk/gatk-package-4.1.0.0-local.jar
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.1.0.0-local.jar CreateSequenceDictionary -R gs://genomics-public-data/references/GRCh38_Verily/GRCh38_Verily_v1.genome.fa -O GRCh38_Verily_v1.genome.fa.dict
    21:06:04.735 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    [Fri Feb 15 21:06:04 UTC 2019] CreateSequenceDictionary  --OUTPUT GRCh38_Verily_v1.genome.fa.dict --REFERENCE gs:/genomics-public-data/references/GRCh38_Verily/GRCh38_Verily_v1.genome.fa  --TRUNCATE_NAMES_AT_WHITESPACE true --NUM_SEQUENCES 2147483647 --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
    [Fri Feb 15 21:06:05 UTC 2019] Executing as [email protected] on Linux 4.14.74+ amd64; OpenJDK 64-Bit Server VM 1.8.0_191-8u191-b12-0ubuntu0.16.04.1-b12; Deflater
    : Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.1.0.0
    [Fri Feb 15 21:06:05 UTC 2019] picard.sam.CreateSequenceDictionary done. Elapsed time: 0.01 minutes.
    Runtime.totalMemory()=28442624
    To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
    htsjdk.samtools.SAMException: Error opening file: gs:/genomics-public-data/references/GRCh38_Verily/GRCh38_Verily_v1.genome.fa
            at htsjdk.samtools.util.IOUtil.openFileForReading(IOUtil.java:637)
            at htsjdk.samtools.reference.FastaSequenceFile.<init>(FastaSequenceFile.java:64)
            at htsjdk.samtools.reference.ReferenceSequenceFileFactory.getReferenceSequenceFile(ReferenceSequenceFileFactory.java:140)
            at htsjdk.samtools.reference.ReferenceSequenceFileFactory.getReferenceSequenceFile(ReferenceSequenceFileFactory.java:96)
            at htsjdk.samtools.reference.ReferenceSequenceFileFactory.getReferenceSequenceFile(ReferenceSequenceFileFactory.java:84)
            at picard.sam.CreateSequenceDictionary.doWork(CreateSequenceDictionary.java:220)
            at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:295)
            at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:25)
            at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
            at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
            at org.broadinstitute.hellbender.Main.main(Main.java:291)
    Caused by: java.nio.file.NoSuchFileException: gs:/genomics-public-data/references/GRCh38_Verily/GRCh38_Verily_v1.genome.fa
            at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
            at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
            at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
            at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
            at java.nio.file.Files.newByteChannel(Files.java:361)
            at java.nio.file.Files.newByteChannel(Files.java:407)
            at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384)
            at java.nio.file.Files.newInputStream(Files.java:152)
            at htsjdk.samtools.util.IOUtil.openFileForReading(IOUtil.java:633)
            ... 10 more
    
  • shbriefshbrief New YorkMember

    Thanks @AdelaideR! I could make .dict file downloading the ref file and run picard locally.
    I'm building FireCloud workflow and ideally want to make everything happen in cloud - I tried this in GCP cloud shell, and it didn't work there. But now it make sense the authorization issue is with the tool itself. I'll keep eye on the issue. :)

Sign In or Register to comment.