GenomicsDBImport: Failed to create reader

Hi,

I am trying to use the reference joint-discovery WDL to run my samples in google cloud. But I got "Failed to create reader" errors. I checked that my provided gs:// does contain my gvcf files. Do you know why it's complaining this?

https://github.com/gatk-workflows/gatk4-germline-snps-indels.git

Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell_root/tmp.298beb67
01:45:49.230 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/build/libs/gatk-package-4.0.4.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
01:45:49.390 INFO GenomicsDBImport - ------------------------------------------------------------
01:45:49.391 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.0.4.0
01:45:49.391 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
01:45:49.393 INFO GenomicsDBImport - Executing as [email protected] on Linux v4.9.0-0.bpo.6-amd64 amd64
01:45:49.393 INFO GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11
01:45:49.393 INFO GenomicsDBImport - Start Date/Time: July 9, 2018 1:45:49 AM UTC
01:45:49.393 INFO GenomicsDBImport - ------------------------------------------------------------
01:45:49.393 INFO GenomicsDBImport - ------------------------------------------------------------
01:45:49.394 INFO GenomicsDBImport - HTSJDK Version: 2.14.3
01:45:49.394 INFO GenomicsDBImport - Picard Version: 2.18.2
01:45:49.394 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
01:45:49.394 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
01:45:49.395 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
01:45:49.395 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
01:45:49.395 INFO GenomicsDBImport - Deflater: IntelDeflater
01:45:49.396 INFO GenomicsDBImport - Inflater: IntelInflater
01:45:49.396 INFO GenomicsDBImport - GCS max retries/reopens: 20
01:45:49.396 INFO GenomicsDBImport - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
01:45:49.396 INFO GenomicsDBImport - Initializing engine
01:45:52.368 INFO GenomicsDBImport - Shutting down engine
[July 9, 2018 1:45:52 AM UTC] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.05 minutes.
Runtime.totalMemory()=4116185088


A USER ERROR has occurred: Failed to create reader from gs://pca-test/gvcf/CL100056740_B5EHUMjrjRAAALAAA.g.vcf.gz


Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
Using GATK jar /gatk/build/libs/gatk-package-4.0.4.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx4g -Xms4g -jar /gatk/build/libs/gatk-package-4.0.4.0-local.jar GenomicsDBImport --genomicsdb-workspace-path genomicsdb --batch-size 50 -L chr1:1-248956422 --sample-name-map /cromwell_root/pca-test/WGS.samples_map2 --reader-threads 5 -ip 500

Answers

  • mwleemwlee Member
    edited July 9

    Using the default NA12878.sample_map file, I am able to run it in google cloud without issue. But substitute with my own sample file, it have the above error.

    The gvcf is created from the reference germline WDL. As shown in the link:
    https://github.com/gatk-workflows/five-dollar-genome-analysis-pipeline

    Working:
    "JointGenotyping.sample_name_map": "gs://gatk-test-data/joint_discovery/NA12878.sample_map",
    Failed:
    "JointGenotyping.sample_name_map": "gs://pca-test/WGS.samples_map2",
    WGS.samples_map2:
    C170508 gs://pca-test/gvcf/C170508.g.vcf.gz

  • YatrosYatros Seattle, WA, USAMember

    Hello,

    I'm experiencing a similar problem with the joint-discovery-gatk4-local.wdl pipeline. I'm trying to merge several samples with genomicsDBimport + GenotypeGVCFs, but I always get the "A USER ERROR has occurred: Failed to create reader from file" error.

    Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell-executions/JointGenotyping/21af0807-e11c-435f-bc3c-befffcaefc9c/call-ImportGVCFs/shard-179/execution/tmp.3FPwiI
    23:42:45.044 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.0.6.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    23:42:45.383 INFO  GenomicsDBImport - ------------------------------------------------------------
    23:42:45.384 INFO  GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.0.6.0
    23:42:45.384 INFO  GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
    23:42:45.385 INFO  GenomicsDBImport - Executing as [email protected] on Linux v4.4.0-133-generic amd64
    23:42:45.386 INFO  GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_171-8u171-b11-0ubuntu0.16.04.1-b11
    23:42:45.387 INFO  GenomicsDBImport - Start Date/Time: September 4, 2018 11:42:44 PM UTC
    23:42:45.387 INFO  GenomicsDBImport - ------------------------------------------------------------
    23:42:45.388 INFO  GenomicsDBImport - ------------------------------------------------------------
    23:42:45.389 INFO  GenomicsDBImport - HTSJDK Version: 2.16.0
    23:42:45.390 INFO  GenomicsDBImport - Picard Version: 2.18.7
    23:42:45.390 INFO  GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    23:42:45.390 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    23:42:45.390 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    23:42:45.391 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    23:42:45.391 INFO  GenomicsDBImport - Deflater: IntelDeflater
    23:42:45.391 INFO  GenomicsDBImport - Inflater: IntelInflater
    23:42:45.391 INFO  GenomicsDBImport - GCS max retries/reopens: 20
    23:42:45.392 INFO  GenomicsDBImport - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
    23:42:45.392 INFO  GenomicsDBImport - Initializing engine
    23:42:45.467 INFO  GenomicsDBImport - Shutting down engine
    [September 4, 2018 11:42:45 PM UTC] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.01 minutes.
    Runtime.totalMemory()=4116185088
    ***********************************************************************
    
    A USER ERROR has occurred: Failed to create reader from file:///cromwell-executions/JointGenotyping/21af0807-e11c-435f-bc3c-befffcaefc9c/call-ImportGVCFs/shard-179/inputs/mnt/ND27_PANEL/genomicsDBimport/ND27_panel.sample_map
    
    ***********************************************************************
    Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
    Using GATK jar /gatk/gatk-package-4.0.6.0-local.jar
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx4g -Xms4g -jar /gatk/gatk-package-4.0.6.0-local.jar GenomicsDBImport --genomicsdb-workspace-path genomicsdb --batch-size 50 -L 9:35060697-35061276 --sample-name-map inputs.list --reader-threads 5 -ip 500
    

    From my understanding, I think that the error starts with the following python code:

    python << CODE
        gvcfs = ['${sep="','" input_gvcfs}']
        sample_names = ['${sep="','" sample_names}']
    
        if len(gvcfs)!= len(sample_names):
          exit(1)
    
        with open("inputs.list", "w") as fi:
          for i in range(len(gvcfs)):
            fi.write(sample_names[i] + "\t" + gvcfs[i] + "\n") 
    
    CODE
    

    Instead of getting a inputs.list file with the name of the samples in the first column and their path location in the second one, I get a file with the following content:

    /mnt/ND27/genomicsDBimport/ND27_panel.samples /cromwell-executions/JointGenotyping/21af0807-e11c-435f-bc3c-befffcaefc9c/call-ImportGVCFs/shard-1/inputs/mnt/ND27/genomicsDBimport/ND27_panel.sample_map

    The file contains the paths to the two original 'samples' and 'sample_map' files. Can somebody explain me why does python behave this way? What should I change in order to get an inputs.file with the format:

    Sample Path_to_sample_location

    If I generate the inputs.file manually and I run the script that was generated during the pipeline run, the genomicsdb folder is generated in the right way and I get the expected results.

    Thank you very much,

    Best Regards,

    Yatros

Sign In or Register to comment.