Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Picard on GCS data

shbriefshbrief New YorkMember
edited February 14 in Ask the GATK team

I'm trying to build .dict file for the reference fasta on GCS (gs://genomics-public-data/references/GRCh38_Verily/GRCh38_Verily_v1.genome.fa).
I used picardcloud.jar but it seems like still can't read the file on GCS.
Can you help me to fix this issue? Thanks!

$ java -jar build/libs/picardcloud.jar CreateSequenceDictionary R=gs://genomics-public-data/references/GRCh38_Verily/GRCh38_Verily_v1.genome.fa O=GRCh38_Verily_v1.genome.fa.dict
INFO    2019-02-14 02:55:42     CreateSequenceDictionary

********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
**********    CreateSequenceDictionary -R gs://genomics-public-data/references/GRCh38_Verily/GRCh38_Verily_v1.genome.fa -O GRCh38_Verily_v1.genome.fa.dict
**********


02:55:42.726 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/shbrief/picard/build/libs/picardcloud.jar!/com/intel/gkl/native/libgkl_compression.so
[Thu Feb 14 02:55:42 EST 2019] CreateSequenceDictionary OUTPUT=GRCh38_Verily_v1.genome.fa.dict REFERENCE=gs:/genomics-public-data/references/GRCh38_Verily/GRCh38_Verily_v1.genome.fa    TRUNCATE_NAMES_AT_WHITESPACE=true NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Thu Feb 14 02:55:42 EST 2019] Executing as [email protected] on Linux 4.14.74+ amd64; OpenJDK 64-Bit Server VM 1.8.0_181-8u181-b13-2~
deb9u1-b13; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: 2.18.27-SNAPSHOT
[Thu Feb 14 02:55:42 EST 2019] picard.sam.CreateSequenceDictionary done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=28442624
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMException: Error opening file: gs:/genomics-public-data/references/GRCh38_Verily/GRCh38_Verily_v1.genome.fa
        at htsjdk.samtools.util.IOUtil.openFileForReading(IOUtil.java:637)
        at htsjdk.samtools.reference.FastaSequenceFile.<init>(FastaSequenceFile.java:64)
        at htsjdk.samtools.reference.ReferenceSequenceFileFactory.getReferenceSequenceFile(ReferenceSequenceFileFactory.java:140)
        at htsjdk.samtools.reference.ReferenceSequenceFileFactory.getReferenceSequenceFile(ReferenceSequenceFileFactory.java:96)
        at htsjdk.samtools.reference.ReferenceSequenceFileFactory.getReferenceSequenceFile(ReferenceSequenceFileFactory.java:84)
        at picard.sam.CreateSequenceDictionary.doWork(CreateSequenceDictionary.java:220)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:295)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
        at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
Caused by: java.nio.file.NoSuchFileException: gs:/genomics-public-data/references/GRCh38_Verily/GRCh38_Verily_v1.genome.fa
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
        at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
        at java.nio.file.Files.newByteChannel(Files.java:361)
        at java.nio.file.Files.newByteChannel(Files.java:407)
        at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384)
        at java.nio.file.Files.newInputStream(Files.java:152)
        at htsjdk.samtools.util.IOUtil.openFileForReading(IOUtil.java:633)
        ... 8 more
Tagged:

Best Answer

Answers

  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin

    It appears that some instructions have been added in the warning:

    INFO    2019-02-14 02:55:42     CreateSequenceDictionary
    
    ********** NOTE: Picard's command line syntax is changing.
    **********
    ********** For more information, please see:
    ********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
    **********
    ********** The command line looks like this in the new syntax:
    **********
    **********    CreateSequenceDictionary -R gs://genomics-public-data/references/GRCh38_Verily/GRCh38_Verily_v1.genome.fa -O GRCh38_Verily_v1.genome.fa.dict
    

    It appears that you may have used the old syntax. Try making this change and see if the same error repeats.

  • shbriefshbrief New YorkMember

    @AdelaideR
    Thanks for your answer! The problem is... neither of them seems to work.
    Here is the help page I got.

    $ java -jar build/libs/picardcloud.jar CreateSequenceDictionary -h
    USAGE: CreateSequenceDictionary [options]
    
    Documentation: http://broadinstitute.github.io/picard/command-line-overview.html#CreateSequenceDictionary
    
    Creates a sequence dictionary for a reference sequence.  This tool creates a sequence dictionary file (with ".dict"
    extension) from a reference sequence provided in FASTA format, which is required by many processing and analysis tools.
    The output file contains a header but no SAMRecords, and the header contains only sequence records.
    
    The reference sequence can be gzipped (both .fasta and .fasta.gz are supported).
    Usage example:
    
    java -jar picard.jar CreateSequenceDictionary \
    R=reference.fasta \
    O=reference.dict
    
    
    Version: 2.18.27-SNAPSHOT
    

    So if I run my code with the new, suggest (?) syntax, this is what I get.

    $ java -jar build/libs/picardcloud.jar CreateSequenceDictionary -R gs://genomics-public-data/references/GRCh38_Verily/GRCh38_Verily_v1.genome.fa -O GRCh38_Verily_v1.genome.fa.dict
    ERROR: Invalid argument '-R'.
    
    USAGE: CreateSequenceDictionary [options]
    
    Documentation: http://broadinstitute.github.io/picard/command-line-overview.html#CreateSequenceDictionary
    
    Creates a sequence dictionary for a reference sequence.  This tool creates a sequence dictionary file (with ".dict"
    extension) from a reference sequence provided in FASTA format, which is required by many processing and analysis tools.
    The output file contains a header but no SAMRecords, and the header contains only sequence records.
    
    The reference sequence can be gzipped (both .fasta and .fasta.gz are supported).
    Usage example:
    
    java -jar picard.jar CreateSequenceDictionary \
    R=reference.fasta \
    O=reference.dict
    
    
    Version: 2.18.27-SNAPSHOT
    

    Thought?

  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin

    Try replacing java -jar build/libs/picardcloud.jar with just

    gatk CreateSequenceDictionary -R gs://genomics-public-data/references/GRCh38_Verily/GRCh38_Verily_v1.genome.fa -O GRCh38_Verily_v1.genome.fa.dict

    That seemed to work for me.

  • shbriefshbrief New YorkMember
    edited February 15

    @AdelaideR
    Humm... it's still not working. ;(

    gatk CreateSequenceDictionary -R gs://genomics-public-data/references/GRCh38_Verily/GRCh38_Verily_v1.genome.fa -O GRCh38_Verily_v1.genome.fa.dict
    Using GATK jar /gatk/gatk-package-4.1.0.0-local.jar
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.1.0.0-local.jar CreateSequenceDictionary -R gs://genomics-public-data/references/GRCh38_Verily/GRCh38_Verily_v1.genome.fa -O GRCh38_Verily_v1.genome.fa.dict
    21:06:04.735 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    [Fri Feb 15 21:06:04 UTC 2019] CreateSequenceDictionary  --OUTPUT GRCh38_Verily_v1.genome.fa.dict --REFERENCE gs:/genomics-public-data/references/GRCh38_Verily/GRCh38_Verily_v1.genome.fa  --TRUNCATE_NAMES_AT_WHITESPACE true --NUM_SEQUENCES 2147483647 --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
    [Fri Feb 15 21:06:05 UTC 2019] Executing as [email protected] on Linux 4.14.74+ amd64; OpenJDK 64-Bit Server VM 1.8.0_191-8u191-b12-0ubuntu0.16.04.1-b12; Deflater
    : Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.1.0.0
    [Fri Feb 15 21:06:05 UTC 2019] picard.sam.CreateSequenceDictionary done. Elapsed time: 0.01 minutes.
    Runtime.totalMemory()=28442624
    To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
    htsjdk.samtools.SAMException: Error opening file: gs:/genomics-public-data/references/GRCh38_Verily/GRCh38_Verily_v1.genome.fa
            at htsjdk.samtools.util.IOUtil.openFileForReading(IOUtil.java:637)
            at htsjdk.samtools.reference.FastaSequenceFile.<init>(FastaSequenceFile.java:64)
            at htsjdk.samtools.reference.ReferenceSequenceFileFactory.getReferenceSequenceFile(ReferenceSequenceFileFactory.java:140)
            at htsjdk.samtools.reference.ReferenceSequenceFileFactory.getReferenceSequenceFile(ReferenceSequenceFileFactory.java:96)
            at htsjdk.samtools.reference.ReferenceSequenceFileFactory.getReferenceSequenceFile(ReferenceSequenceFileFactory.java:84)
            at picard.sam.CreateSequenceDictionary.doWork(CreateSequenceDictionary.java:220)
            at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:295)
            at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:25)
            at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
            at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
            at org.broadinstitute.hellbender.Main.main(Main.java:291)
    Caused by: java.nio.file.NoSuchFileException: gs:/genomics-public-data/references/GRCh38_Verily/GRCh38_Verily_v1.genome.fa
            at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
            at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
            at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
            at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
            at java.nio.file.Files.newByteChannel(Files.java:361)
            at java.nio.file.Files.newByteChannel(Files.java:407)
            at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384)
            at java.nio.file.Files.newInputStream(Files.java:152)
            at htsjdk.samtools.util.IOUtil.openFileForReading(IOUtil.java:633)
            ... 10 more
    
  • shbriefshbrief New YorkMember

    Thanks @AdelaideR! I could make .dict file downloading the ref file and run picard locally.
    I'm building FireCloud workflow and ideally want to make everything happen in cloud - I tried this in GCP cloud shell, and it didn't work there. But now it make sense the authorization issue is with the tool itself. I'll keep eye on the issue. :)

Sign In or Register to comment.