Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Are there any plans to add multi-interval support to GenomicsDBImport?

TechnicalVaultTechnicalVault Cambridge, UKMember ✭✭✭

The reason I ask is that it's rather annoying when you've chunking your input data and one of your chunks crosses a chromosome boundary. it seems like according to the Github docs thqt GenomicsDB supports this with vcf2tiledb, but I'm not sure whether it will then work with GenotypeGVCFs?

Issue · Github
by Sheila

Issue Number
2863
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
chandrans

Best Answers

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    Accepted Answer

    @TechnicalVault
    Hi,

    Yes, there are plans to support to support more than 1 interval, but I need to check with the developers on the timeline.

    -Sheila

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    Accepted Answer

    @TechnicalVault
    Hi again,

    You can keep track of the issue here.

    -Sheila

  • gmauro72gmauro72 ItalyMember
    edited July 2018

    @Sheila
    Hi Sheila,
    the issue referenced above has been closed. Can you provide any info on how to use the multi-interval option with GenomicsDBImport?
    Thank you

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @gmauro72
    Hi,

    We need to update the docs. Thanks for pointing this out. I am not sure if the latest version contains this update, however. Can you try using a few intervals (with -L) in the tool? If that does not work, I will check with the team if the update is in (I think there should be a new release out soon that definitely contains it), or I will confirm how to use it.
    -Sheila

  • gmauro72gmauro72 ItalyMember

    @Sheila
    Hi Sheila,
    I have tried -L with a single file with several intervals inside (-L split/0000-scattered.intervals)

    Using GATK 4.0.5.2, I got this message: "A USER ERROR has occurred: More than one interval specified. The tool takes only one"

    Using GATK 4.0.6.0, it succeeded.

  • phhphh Member

    Hi @Sheila

    I could not find the updated manual of GenomicsDBImport about how to use the multi-interval option. For -L, how should I separate my input of multiple intervals (ex: chr1, chr2,...)?

    @gmauro72
    Would you mind sharing what interval file looks like?

    Thanks!

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @phh

    Here is a document you can follow to see the different ways to use intervals list: https://software.broadinstitute.org/gatk/documentation/article?id=11009

    Please let me know if this helps.

    Regards
    Bhanu

  • gmauro72gmauro72 ItalyMember

    @phh like this:
    chrM:1-16571
    chr1:10000-177417
    chr1:227417-267719
    chr1:317719-471368
    ...

  • phhphh Member

    @bhanuGandham @gmauro72
    Thanks! I know where my issues are.

  • phhphh Member

    Hi @bhanuGandham

    I just tested the interval file with GATK 4.0.11.0 but encountered an interupution saying "Cannot call query with different interval, expected:Chr1:1-2852272 queried with: Chr2:1-2745854". My purpose is to combine GVCF of the entire contigs. Thank you so much.

    My interval file is like:
    Chr1:1-2852272
    Chr2:1-2745854
    Chr3:1-2541747
    Chr4:1-1383544
    Chr5:1-1289138
    Mitochondrion:1-51340

    Some messages from GATK:
    java.util.concurrent.CompletionException: org.broadinstitute.hellbender.exceptions.GATKException: Cannot call query with different interval, expected:Chr1:1-2852272 queried with: Chr2:1-2745854
    at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
    at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
    at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1592)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    Caused by: org.broadinstitute.hellbender.exceptions.GATKException: Cannot call query with different interval, expected:Chr1:1-2852272 queried with: Chr2:1-2745854
    at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport$InitializedQueryWrapper.query(GenomicsDBImport.java:769)
    at com.intel.genomicsdb.importer.GenomicsDBImporter.(GenomicsDBImporter.java:165)
    at com.intel.genomicsdb.importer.GenomicsDBImporter.lambda$null$2(GenomicsDBImporter.java:604)
    at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
    ... 3 more
    Using GATK jar /gatk/gatk-package-4.0.9.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx30g -jar /gatk/gatk-package-4.0.9.0-local.jar GenomicsDBImport --genomicsdb-workspace-path Chr_database3 -L interval.list --sample-name-map cohort.JGI_reseq_sample_map --reader-threads 12

Sign In or Register to comment.