Are there any plans to add multi-interval support to GenomicsDBImport?

TechnicalVaultTechnicalVault Cambridge, UKMember ✭✭✭

The reason I ask is that it's rather annoying when you've chunking your input data and one of your chunks crosses a chromosome boundary. it seems like according to the Github docs thqt GenomicsDB supports this with vcf2tiledb, but I'm not sure whether it will then work with GenotypeGVCFs?

Issue · Github
by Sheila

Issue Number
2863
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
chandrans

Best Answers

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    Accepted Answer

    @TechnicalVault
    Hi,

    Yes, there are plans to support to support more than 1 interval, but I need to check with the developers on the timeline.

    -Sheila

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    Accepted Answer

    @TechnicalVault
    Hi again,

    You can keep track of the issue here.

    -Sheila

  • gmauro72gmauro72 ItalyMember
    edited July 2018

    @Sheila
    Hi Sheila,
    the issue referenced above has been closed. Can you provide any info on how to use the multi-interval option with GenomicsDBImport?
    Thank you

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @gmauro72
    Hi,

    We need to update the docs. Thanks for pointing this out. I am not sure if the latest version contains this update, however. Can you try using a few intervals (with -L) in the tool? If that does not work, I will check with the team if the update is in (I think there should be a new release out soon that definitely contains it), or I will confirm how to use it.
    -Sheila

  • gmauro72gmauro72 ItalyMember

    @Sheila
    Hi Sheila,
    I have tried -L with a single file with several intervals inside (-L split/0000-scattered.intervals)

    Using GATK 4.0.5.2, I got this message: "A USER ERROR has occurred: More than one interval specified. The tool takes only one"

    Using GATK 4.0.6.0, it succeeded.

  • phhphh Member

    Hi @Sheila

    I could not find the updated manual of GenomicsDBImport about how to use the multi-interval option. For -L, how should I separate my input of multiple intervals (ex: chr1, chr2,...)?

    @gmauro72
    Would you mind sharing what interval file looks like?

    Thanks!

  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin

    Hi @phh

    Here is a document you can follow to see the different ways to use intervals list: https://software.broadinstitute.org/gatk/documentation/article?id=11009

    Please let me know if this helps.

    Regards
    Bhanu

  • gmauro72gmauro72 ItalyMember

    @phh like this:
    chrM:1-16571
    chr1:10000-177417
    chr1:227417-267719
    chr1:317719-471368
    ...

  • phhphh Member

    @bhanuGandham @gmauro72
    Thanks! I know where my issues are.

  • phhphh Member

    Hi @bhanuGandham

    I just tested the interval file with GATK 4.0.11.0 but encountered an interupution saying "Cannot call query with different interval, expected:Chr1:1-2852272 queried with: Chr2:1-2745854". My purpose is to combine GVCF of the entire contigs. Thank you so much.

    My interval file is like:
    Chr1:1-2852272
    Chr2:1-2745854
    Chr3:1-2541747
    Chr4:1-1383544
    Chr5:1-1289138
    Mitochondrion:1-51340

    Some messages from GATK:
    java.util.concurrent.CompletionException: org.broadinstitute.hellbender.exceptions.GATKException: Cannot call query with different interval, expected:Chr1:1-2852272 queried with: Chr2:1-2745854
    at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
    at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
    at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1592)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    Caused by: org.broadinstitute.hellbender.exceptions.GATKException: Cannot call query with different interval, expected:Chr1:1-2852272 queried with: Chr2:1-2745854
    at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport$InitializedQueryWrapper.query(GenomicsDBImport.java:769)
    at com.intel.genomicsdb.importer.GenomicsDBImporter.(GenomicsDBImporter.java:165)
    at com.intel.genomicsdb.importer.GenomicsDBImporter.lambda$null$2(GenomicsDBImporter.java:604)
    at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
    ... 3 more
    Using GATK jar /gatk/gatk-package-4.0.9.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx30g -jar /gatk/gatk-package-4.0.9.0-local.jar GenomicsDBImport --genomicsdb-workspace-path Chr_database3 -L interval.list --sample-name-map cohort.JGI_reseq_sample_map --reader-threads 12

Sign In or Register to comment.