How to run GATK directly on SRA files

Hello , I recently saw a webinar by NCBI "Advanced Workshop on SRA and dbGaP Data Analysis" (ftp://ftp.ncbi.nlm.nih.gov/pub/education/public_webinars/2016/03Mar23_Advanced_Workshop/). They mentioned that they were able to run GATK directly on SRA files.

I downloaded GenomeAnalysisTK-3.5 jar file to my computer. I tried both these commands:

java -jar /path/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R SRRFileName -I SRRFileName -stand_call_conf 30 -stand_emit_conf 10 -o SRRFileName.vcf

java -jar /path/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SRRFileName -R SRR1718738 -I SRRFileName -stand_call_conf 30 -stand_emit_conf 10 -o SRRFileName.vcf

For both these commands, I got this error:
ERROR MESSAGE: Invalid command line: The GATK reads argument (-I, --input_file) supports only BAM/CRAM files with the .bam/.cram extension and lists of BAM/CRAM files with the .list extension, but the file SRR1718738 has neither extension. Please ensure that your BAM/CRAM file or list of BAM/CRAM files is in the correct format, update the extension, and try again.

I don't see any documentation here about this, so wanted to check with you or anyone else has had any experience with this.

Thanks
K

Best Answers

Answers

  • Thank you for the prompt response.

  • Ben_BusbyBen_Busby NCBIMember

    Right now it works well if you launch the 'vdb' AMI in AWS.

  • Hello, I'd like to check if SRA support was added to GATK 3.6 or not.

    @Ben_Busby : We are trying to do this on our local machine , not on the cloud, so cannot use your AMI. Thanks.

  • Dear GATK aficionados,

    has there been any update regarding the integration of SRA compatibility with subsequent GATK releases?

    I can find very little reference to it in the user guide, but as per the thread here it was supposed to be integrated in v 3.8?

    many thanks

    james

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @Amoyzing
    Hi James,

    It does look like SRA files are now supported in GATK4. It looks like they should be supported in 3.8 as well.

    -Sheila

  • Hi thanks for getting back to me. Im still unable to get GATK to read SRA files any help much appreciated.

    I've tried running HaplotypeCaller on version 3.8 and 4.0.3 and got this error

    Exception in thread "main" gov.nih.nlm.ncbi.ngs.error.LibraryNotFoundError: Failed to load 'ngs-sdk' - No installed library was found, auto-download failed - connection problem
    Please check your network connection, and check if you need proxy configuration. Contact your IT department or email sra-tools@ncbi.nlm.nih.gov for assistance.

    Caused by: gov.nih.nlm.ncbi.ngs.error.cause.ConnectionProblemCause: auto-download failed - connection problem

    Ive spoken to people a NBCI and they said

    "GATK is either using an outdated version of NGS or incorrect configuration - it is trying to create an http connection rather than https."

    Is it possible to use a local version of 'ngs-sdk' instead as I assume that it will require a fix to be issued in the next version of GATK.

    Thanks

    James

    The whole command and output is here

    gatk --java-options "-Dsamjdk.sra_libraries_download=true" HaplotypeCaller -R /mnt/scratch/DGE/MOPOPGEN/jstudd/jstudd/reference_genomes/human_g1k_v37.fasta -I SRR5115250.sra -O SRR5115250.vcf
    Using GATK jar /opt/gridware/apps/gatk/4.0.0.0/gatk-package-4.0.0.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -Dsamjdk.sra_libraries_download=true -jar /opt/gridware/apps/gatk/4.0.0.0/gatk-package-4.0.0.0-local.jar HaplotypeCaller -R /mnt/scratch/DGE/MOPOPGEN/jstudd/jstudd/reference_genomes/human_g1k_v37.fasta -I SRR5115250.sra -O SRR5115250.vcf
    19:18:12.628 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/opt/gridware/apps/gatk/4.0.0.0/gatk-package-4.0.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    19:18:25.499 INFO HaplotypeCaller - ------------------------------------------------------------
    19:18:25.500 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.0.0.0
    19:18:25.501 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
    19:18:25.502 INFO HaplotypeCaller - Executing as jstudd@dav001.prv.davros.compute.estate on Linux v3.10.0-327.3.1.el7.x86_64 amd64
    19:18:25.503 INFO HaplotypeCaller - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_66-b17
    19:18:25.503 INFO HaplotypeCaller - Start Date/Time: 23 April 2018 19:18:12 BST
    19:18:25.503 INFO HaplotypeCaller - ------------------------------------------------------------
    19:18:25.503 INFO HaplotypeCaller - ------------------------------------------------------------
    19:18:25.504 INFO HaplotypeCaller - HTSJDK Version: 2.13.2
    19:18:25.504 INFO HaplotypeCaller - Picard Version: 2.17.2
    19:18:25.505 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 1
    19:18:25.505 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    19:18:25.505 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    19:18:25.505 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    19:18:25.505 INFO HaplotypeCaller - Deflater: IntelDeflater
    19:18:25.506 INFO HaplotypeCaller - Inflater: IntelInflater
    19:18:25.506 INFO HaplotypeCaller - GCS max retries/reopens: 20
    19:18:25.506 INFO HaplotypeCaller - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
    19:18:25.506 INFO HaplotypeCaller - Initializing engine
    java.io.IOException: Server returned HTTP response code: 403 for URL: http://trace.ncbi.nlm.nih.gov/Traces/sratoolkit/sratoolkit.cgi
    ngs-java: Failed to download ngs-sdk from NCBI
    ngs-java: Loading of ngs-sdk library failed
    INFO 2018-04-23 19:18:26 SRAAccession SRA initialization failed. Will not be able to read from SRA
    19:18:26.586 INFO HaplotypeCaller - Shutting down engine
    [23 April 2018 19:18:26 BST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 0.23 minutes.
    Runtime.totalMemory()=1526202368
    Exception in thread "main" gov.nih.nlm.ncbi.ngs.error.LibraryNotFoundError: Failed to load 'ngs-sdk' - No installed library was found, auto-download failed - connection problem
    Please check your network connection, and check if you need proxy configuration. Contact your IT department or email sra-tools@ncbi.nlm.nih.gov for assistance.
    at gov.nih.nlm.ncbi.ngs.LibManager.loadLibrary(LibManager.java:335)
    at gov.nih.nlm.ncbi.ngs.Manager.(Manager.java:103)
    at gov.nih.nlm.ncbi.ngs.NGS.(NGS.java:120)
    at htsjdk.samtools.sra.SRAAccession.checkIfInitialized(SRAAccession.java:96)
    at htsjdk.samtools.sra.SRAAccession.isValid(SRAAccession.java:138)
    at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.isSra(SamReaderFactory.java:438)
    at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:403)
    at htsjdk.samtools.SamReaderFactory.open(SamReaderFactory.java:105)
    at org.broadinstitute.hellbender.engine.ReadsDataSource.(ReadsDataSource.java:227)
    at org.broadinstitute.hellbender.engine.ReadsDataSource.(ReadsDataSource.java:162)
    at org.broadinstitute.hellbender.engine.GATKTool.initializeReads(GATKTool.java:318)
    at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:556)
    at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.onStartup(AssemblyRegionWalker.java:160)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:134)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:152)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:195)
    at org.broadinstitute.hellbender.Main.main(Main.java:275)
    Caused by: gov.nih.nlm.ncbi.ngs.error.cause.ConnectionProblemCause: auto-download failed - connection problem
    at gov.nih.nlm.ncbi.ngs.LibManager.searchLibrary(LibManager.java:658)
    at gov.nih.nlm.ncbi.ngs.LibManager.loadLibrary(LibManager.java:332)

  • AmoyzingAmoyzing Member

    Solution

    download this https://ftp-trace.ncbi.nlm.nih.gov/sra/ngs/1.3.0/ngs-sdk.1.3.0-linux.tar.gz

    extract and then execute gatk with the java option
    --java-options "-Djava.library.path=ngs-sdk.1.3.0-linux/lib64"

    assuming the file is extracted to the current directory

    example command would then be

    gatk --java-options "-Djava.library.path=ngs-sdk.1.3.0-linux/lib64" HaplotypeCaller -R /GCF_000001405.25_GRCh37.p13_genomic_gencode_2.fna -I SRR1234.sra -O SRR1234.vcf

    tested on gatk v4.0.3.0

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @Amoyzing
    Hi,

    Thanks for reporting your solution. Glad to hear you found a workaround!

    -Sheila

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @Amoyzing
    Hi again,

    We talked with Kurt and there are plans to move forward with SRA support again :smile:

    -Sheila

Sign In or Register to comment.