gatk4 error: java.lang.IllegalStateException: The covariates table is missing ReadGroup

LavanyaLavanya Member
edited March 2018 in Ask the GATK team

Hi,
Trying to run GATK4 best practice on AWS batch setup with docker (using SSDs).
In most of the time "ApplyBQSR" fails with the following error.

Oops... Pipeline execution stopped with the following message: 08:43:27.126 INFO ApplyBQSR - ------------------------------------------------------------ 08:43:27.126 INFO ApplyBQSR - ------------------------------------------------------------ 08:43:27.126 INFO ApplyBQSR - HTSJDK Version: 2.14.1 08:43:27.126 INFO ApplyBQSR - Picard Version: 2.17.2 08:43:27.126 INFO ApplyBQSR - HTSJDK Defaults.COMPRESSION_LEVEL : 1 08:43:27.126 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 08:43:27.126 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 08:43:27.126 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 08:43:27.127 INFO ApplyBQSR - Deflater: IntelDeflater 08:43:27.127 INFO ApplyBQSR - Inflater: IntelInflater : 08:43:27.954 INFO ProgressMeter - Current Locus Elapsed Minutes Reads Processed Reads/Minute 08:43:28.470 INFO ApplyBQSR - Shutting down engine [February 28, 2018 8:43:28 AM UTC] org.broadinstitute.hellbender.tools.walkers.bqsr.ApplyBQSR done. Elapsed time: 0.03 minutes. Runtime.totalMemory()=2112618496 java.lang.IllegalStateException: The covariates table is missing ReadGroup e5dae369 in RecalTable0 at org.broadinstitute.hellbender.utils.Utils.validate(Utils.java:706) at org.broadinstitute.hellbender.utils.recalibration.covariates.ReadGroupCovariate.keyForReadGroup(ReadGroupCovariate.java:81) at org.broadinstitute.hellbender.utils.recalibration.covariates.ReadGroupCovariate.recordValues(ReadGroupCovariate.java:53) at org.broadinstitute.hellbender.utils.recalibration.covariates.StandardCovariateList.recordAllValuesInStorage(StandardCovariateList.java:133) at org.broadinstitute.hellbender.utils.recalibration.RecalUtils.computeCovariates(RecalUtils.java:546) at org.broadinstitute.hellbender.utils.recalibration.RecalUtils.computeCovariates(RecalUtils.java:527) at org.broadinstitute.hellbender.transformers.BQSRReadTransformer.apply(BQSRReadTransformer.java:145) at org.broadinstitute.hellbender.transformers.BQSRReadTransformer.apply(BQSRReadTransformer.java:27) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.Iterator.forEachRemaining(Iterator.java:116) at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at org.broadinstitute.hellbender.engine.ReadWalker.traverse(ReadWalker.java:94) at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:893) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:136) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:153) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:195) at org.broadinstitute.hellbender.Main.main(Main.java:277) Using GATK jar /opt/conda/share/gatk4-4.0.1.1-0/gatk-package-4.0.1.1-local.jar
Rerunning the same step works most of the time. Any suggestions. Thanks.

Post edited by Lavanya on

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Lavanya
    Hi,

    I am checking with the team and will get back to you soon.

    -Sheila

  • LouisBLouisB Broad InstituteMember, Broadie, Dev ✭✭

    @Lavanya Is it possible that the recalibration table was computed on a different bam than the one you're applying it to? The error sounds like it's looking for a read group that isn't in the recalibration table.

  • LavanyaLavanya Member

    @LouisB , I have checked for those details. In fact I downloaded the respective recalibrated table and respective bam (as in the commanline) and initiated the analysis locally. Local invocation completed successfully.

  • palmerppalmerp Member
    edited December 2018
    I am also getting the same error.

    I am following GATK Best Practices and so ran bwa, sorted, marked duplicates and ran BaseRecalibrator beforehand.

    I have tried doing this for some small sample FASTQ files I found and it worked.

    However, when I tried doing the same steps for some FASTQ files from phase 3 of the1000 Genomes I get this java.lang.IllegalStateException error.

    I am running it on an AWS EC2 instance. Any ideas what might be the cause and how this can be resolved? Is there any more information I can supply? (I have checked and the BAM file and table from BaseRecalibrator correspond). I suspect the problem is (partly) with the data, to do with the read group but am not exactly sure what's causing the issue.

    Thanks in advance
  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin

    Hi @palmerp

    Please post the exact command you are using, the version of gatk and the entire error log. Thank you.

  • palmerppalmerp Member
    edited December 2018
    Hi @bhanuGandham,

    Thanks for your reply. Here's the information you requested.

    Command executed:
    gatk ApplyBQSR -I SRR042441_MarkDup.bam -bqsr SRR042441_recal_data.table -O SRR042441_bqsr.bam

    Version:
    I am running this inside the docker container broadinstitute/gatk:latest which uses GATK jar /gatk/gatk-package-4.0.11.0-local.jar

    Command error:
    11:18:22.223 INFO ApplyBQSR - ------------------------------------------------------------
    11:18:22.223 INFO ApplyBQSR - ------------------------------------------------------------
    11:18:22.224 INFO ApplyBQSR - HTSJDK Version: 2.16.1
    11:18:22.224 INFO ApplyBQSR - Picard Version: 2.18.13
    11:18:22.224 INFO ApplyBQSR - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    11:18:22.224 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    11:18:22.224 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    11:18:22.224 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    11:18:22.224 INFO ApplyBQSR - Deflater: IntelDeflater
    11:18:22.224 INFO ApplyBQSR - Inflater: IntelInflater
    11:18:22.224 INFO ApplyBQSR - GCS max retries/reopens: 20
    11:18:22.224 INFO ApplyBQSR - Requester pays: disabled
    11:18:22.225 INFO ApplyBQSR - Initializing engine
    11:18:22.567 INFO ApplyBQSR - Done initializing engine
    11:18:22.592 INFO ProgressMeter - Starting traversal
    11:18:22.592 INFO ProgressMeter - Current Locus Elapsed Minutes Reads Processed Reads/Minute
    11:18:22.705 INFO ApplyBQSR - Shutting down engine
    [December 16, 2018 11:18:22 AM UTC] org.broadinstitute.hellbender.tools.walkers.bqsr.ApplyBQSR done. Elapsed time: 0.01 minutes.
    Runtime.totalMemory()=1970798592
    java.lang.IllegalStateException: The covariates table is missing ReadGroup SRR042441 in RecalTable0
    at org.broadinstitute.hellbender.utils.Utils.validate(Utils.java:749)
    at org.broadinstitute.hellbender.utils.recalibration.covariates.ReadGroupCovariate.keyForReadGroup(ReadGroupCovariate.java:81)
    at org.broadinstitute.hellbender.utils.recalibration.covariates.ReadGroupCovariate.recordValues(ReadGroupCovariate.java:53)
    at org.broadinstitute.hellbender.utils.recalibration.covariates.StandardCovariateList.recordAllValuesInStorage(StandardCovariateList.java:133)
    at org.broadinstitute.hellbender.utils.recalibration.RecalUtils.computeCovariates(RecalUtils.java:546)
    at org.broadinstitute.hellbender.utils.recalibration.RecalUtils.computeCovariates(RecalUtils.java:527)
    at org.broadinstitute.hellbender.transformers.BQSRReadTransformer.apply(BQSRReadTransformer.java:145)
    at org.broadinstitute.hellbender.transformers.BQSRReadTransformer.apply(BQSRReadTransformer.java:27)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
    at org.broadinstitute.hellbender.engine.ReadWalker.traverse(ReadWalker.java:89)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:966)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)
    Using GATK jar /gatk/gatk-package-4.0.11.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.0.11.0-local.jar ApplyBQSR -I SRR042441_MarkDup.bam -bqsr SRR042441_recal_data.table -O SRR042441_bqsr.bam

    The original FASTQs I mapped were from here:
    s3://1000genomes/phase3/data/HG00097/sequence_read/SRR042441.filt.fastq.gz
    s3://1000genomes/phase3/data/HG00099/sequence_read/SRR042437.filt.fastq.gz
    Could it be to do with them being filtered reads?

    The following files are within the working directory:
    SRR042441_MarkDup.bam
    SRR042441_bqsr.bai
    SRR042441_bqsr.bam
    SRR042441_recal_data.table
    I have checked and none of them seem to be malformed, although only 12 lines were outputted into SRR042441_bqsr.bam before the command failed

    Would you also happen to know where I can get some FASTQ files to use as testdata which are known/should work for GATK germline best practice pipeline? That is why I am using this 1000 genomes data, however, I sometimes get this error and sometimes a problem with VariantRecalibrator. I think the problem with VariantRecalibrator may just be because the files are not large enough.
  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin

    Hi @palmerp

    We apologize we were unable to get to your question and our team is on a holiday until Feb 2nd 2019. We will come back and get to your question asap.
    Merry Christmas and Happy New Year!

  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin
    edited January 7

    Hi @palmerp

    It seems to me that the way the recal table file was created is causing this error. Would you please redo your BaseRecalibrator step as shown in this command:

    gatk BaseRecalibrator \
    -I my_reads.bam \
    -R reference.fasta \
    --known-sites sites_of_variation.vcf \
    --known-sites another/optional/setOfSitesToMask.vcf \
    -O recal_data.table

Sign In or Register to comment.