We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Error while attempting to exclude intervals in GenotypeGVCFs

Hello, I've been trying to get the HaplotypeCaller-in-gVCF-mode to work for a combination of exome samples, some of which are diploid individuals, and others of which are pools of individuals, with ploidies ~20.

During the GenotypeGVCFs step, a number of regions have huge memory demands and fail, presumably related to the high ploidies.

These regions seem to be pretty small, and there are only a few of them, so my approach is simply to exclude the regions from analysis.

If I do this by telling GenotypeGVCFs to genotype all the non-problem regions using --intervals, this works just fine:

GenotypeGVCFs [etc] --intervals chr1:1-50000 --intervals chr1:60000-56000000

But if I tell it to exclude the problem regions, it fails:

GenotypeGVCFs [etc] -XL chr1:50000-60000

More details:

I am working with mosquito exome data, 150bp PE illumina reads, GATK 4.1.4.0, java 1.8.0_222. I've been following the Broad best practices pretty closely, with a single round of bootstrapped base recalibration.

Here is the command:

gatk --java-options '-Xmx20G' GenotypeGVCFs \
  -R genome.fa \
  -V gendb://../vcfs/combined_gvcfs_br/NW_021837065.1 \
  -O ../vcfs/chromosome_vcfs_br/NW_021837065.1.vcf \
  -XL NW_021837065.1:0-100000 \

I get the following message:

16:23:43.946 INFO  GenotypeGVCFs - Initializing engine
WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
16:23:45.306 INFO  IntervalArgumentCollection - Initial include intervals span 2538371206 loci; exclude intervals span 100000 loci
16:23:45.307 INFO  IntervalArgumentCollection - Excluding 100000 loci from original intervals (0.00% reduction)
16:23:45.309 INFO  IntervalArgumentCollection - Processing 2538271206 bp from intervals
16:23:45.337 INFO  GenotypeGVCFs - Done initializing engine
16:23:45.459 INFO  ProgressMeter - Starting traversal
16:23:45.459 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute
16:23:45.514 INFO  GenotypeGVCFs - Shutting down engine
[October 22, 2019 4:23:45 PM EDT] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=476577792
java.lang.IllegalStateException: There are no sources based on those query parameters
        at org.genomicsdb.reader.GenomicsDBFeatureIterator.<init>(GenomicsDBFeatureIterator.java:132)
        at org.genomicsdb.reader.GenomicsDBFeatureReader.query(GenomicsDBFeatureReader.java:144)
        at org.broadinstitute.hellbender.engine.FeatureIntervalIterator.queryNextInterval(FeatureIntervalIterator.java:135)
        at org.broadinstitute.hellbender.engine.FeatureIntervalIterator.loadNextFeature(FeatureIntervalIterator.java:92)
        at org.broadinstitute.hellbender.engine.FeatureIntervalIterator.loadNextNovelFeature(FeatureIntervalIterator.java:74)
        at org.broadinstitute.hellbender.engine.FeatureIntervalIterator.<init>(FeatureIntervalIterator.java:47)
        at org.broadinstitute.hellbender.engine.FeatureDataSource.iterator(FeatureDataSource.java:467)
        at java.lang.Iterable.spliterator(Iterable.java:101)
        at org.broadinstitute.hellbender.engine.VariantLocusWalker.getSpliteratorForDrivingVariants(VariantLocusWalker.java:58)
        at org.broadinstitute.hellbender.engine.VariantLocusWalker.traverse(VariantLocusWalker.java:133)
        at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
        at org.broadinstitute.hellbender.Main.main(Main.java:292)

Best Answer

  • jhbjhb
    Accepted Answer

    Thanks for checking in, Tiffany. In the end, it was less work for me to rewrite my pipeline to use a series of --intervals rather than --exclude-intervals, than to go back and rerun GenomicsDBImport. So I'm not sure whether that would have worked or not, and since it's working with --intervals, that's good enough for me.

    Still, I think it does suggest a possible bug somewhere with --exclude-intervals, so I'm glad I brought it up.

Answers

  • jhbjhb Member

    Additional note: I would really, really like to be able to exclude intervals rather than include the non-problematic regions. Several of these chromosomes have multiple high-memory-crashing sites, but you don't know about the second+ ones until you have excluded the first. So multiple rounds of this are necessary, and scripting that out is much simpler to do by excluding regions than by including them.

  • jhbjhb Member

    Additional additional note: I realize that it is bad to have "0" in intervals. I adjusted this to "1", but this did not change the error produced.

  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Administrator, Broadie, Moderator admin

    Hi @jhb
    Are you excluding the same interval when running GenomicsDBImport? I've read that this should help and found this open ticket if this is the issue.

  • jhbjhb Member

    Hmmm, I'm not using --all-sites, so it's at least not exactly the same thing as the open ticket.

    I am not using the same interval when running GenomicsDBImport. (I ran GenomicsDBImport on the entire chromosome, i.e., --intervals NW_021837065.1 or whatever). I suppose that this could be the source of the problem, although it is interesting in that case that specifying --intervals works but --exclude-intervals does not.

  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Administrator, Broadie, Moderator admin

    Let us know if excluding them works for you.

  • jhbjhb Member
    Accepted Answer

    Thanks for checking in, Tiffany. In the end, it was less work for me to rewrite my pipeline to use a series of --intervals rather than --exclude-intervals, than to go back and rerun GenomicsDBImport. So I'm not sure whether that would have worked or not, and since it's working with --intervals, that's good enough for me.

    Still, I think it does suggest a possible bug somewhere with --exclude-intervals, so I'm glad I brought it up.

Sign In or Register to comment.