We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

gatk4 error: java.lang.IllegalStateException: The covariates table is missing ReadGroup

LavanyaLavanya Member
edited March 2018 in Ask the GATK team

Hi,
Trying to run GATK4 best practice on AWS batch setup with docker (using SSDs).
In most of the time "ApplyBQSR" fails with the following error.

Oops... Pipeline execution stopped with the following message: 08:43:27.126 INFO ApplyBQSR - ------------------------------------------------------------ 08:43:27.126 INFO ApplyBQSR - ------------------------------------------------------------ 08:43:27.126 INFO ApplyBQSR - HTSJDK Version: 2.14.1 08:43:27.126 INFO ApplyBQSR - Picard Version: 2.17.2 08:43:27.126 INFO ApplyBQSR - HTSJDK Defaults.COMPRESSION_LEVEL : 1 08:43:27.126 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 08:43:27.126 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 08:43:27.126 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 08:43:27.127 INFO ApplyBQSR - Deflater: IntelDeflater 08:43:27.127 INFO ApplyBQSR - Inflater: IntelInflater : 08:43:27.954 INFO ProgressMeter - Current Locus Elapsed Minutes Reads Processed Reads/Minute 08:43:28.470 INFO ApplyBQSR - Shutting down engine [February 28, 2018 8:43:28 AM UTC] org.broadinstitute.hellbender.tools.walkers.bqsr.ApplyBQSR done. Elapsed time: 0.03 minutes. Runtime.totalMemory()=2112618496 java.lang.IllegalStateException: The covariates table is missing ReadGroup e5dae369 in RecalTable0 at org.broadinstitute.hellbender.utils.Utils.validate(Utils.java:706) at org.broadinstitute.hellbender.utils.recalibration.covariates.ReadGroupCovariate.keyForReadGroup(ReadGroupCovariate.java:81) at org.broadinstitute.hellbender.utils.recalibration.covariates.ReadGroupCovariate.recordValues(ReadGroupCovariate.java:53) at org.broadinstitute.hellbender.utils.recalibration.covariates.StandardCovariateList.recordAllValuesInStorage(StandardCovariateList.java:133) at org.broadinstitute.hellbender.utils.recalibration.RecalUtils.computeCovariates(RecalUtils.java:546) at org.broadinstitute.hellbender.utils.recalibration.RecalUtils.computeCovariates(RecalUtils.java:527) at org.broadinstitute.hellbender.transformers.BQSRReadTransformer.apply(BQSRReadTransformer.java:145) at org.broadinstitute.hellbender.transformers.BQSRReadTransformer.apply(BQSRReadTransformer.java:27) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.Iterator.forEachRemaining(Iterator.java:116) at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at org.broadinstitute.hellbender.engine.ReadWalker.traverse(ReadWalker.java:94) at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:893) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:136) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:153) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:195) at org.broadinstitute.hellbender.Main.main(Main.java:277) Using GATK jar /opt/conda/share/gatk4-4.0.1.1-0/gatk-package-4.0.1.1-local.jar
Rerunning the same step works most of the time. Any suggestions. Thanks.

Post edited by Lavanya on

Answers

  • SheilaSheila Broad InstituteMember, Broadie admin

    @Lavanya
    Hi,

    I am checking with the team and will get back to you soon.

    -Sheila

  • LouisBLouisB Broad InstituteMember, Broadie, Dev ✭✭✭

    @Lavanya Is it possible that the recalibration table was computed on a different bam than the one you're applying it to? The error sounds like it's looking for a read group that isn't in the recalibration table.

  • LavanyaLavanya Member

    @LouisB , I have checked for those details. In fact I downloaded the respective recalibrated table and respective bam (as in the commanline) and initiated the analysis locally. Local invocation completed successfully.

  • palmerppalmerp Member
    edited December 2018
    I am also getting the same error.

    I am following GATK Best Practices and so ran bwa, sorted, marked duplicates and ran BaseRecalibrator beforehand.

    I have tried doing this for some small sample FASTQ files I found and it worked.

    However, when I tried doing the same steps for some FASTQ files from phase 3 of the1000 Genomes I get this java.lang.IllegalStateException error.

    I am running it on an AWS EC2 instance. Any ideas what might be the cause and how this can be resolved? Is there any more information I can supply? (I have checked and the BAM file and table from BaseRecalibrator correspond). I suspect the problem is (partly) with the data, to do with the read group but am not exactly sure what's causing the issue.

    Thanks in advance
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @palmerp

    Please post the exact command you are using, the version of gatk and the entire error log. Thank you.

  • palmerppalmerp Member
    edited December 2018
    Hi @bhanuGandham,

    Thanks for your reply. Here's the information you requested.

    Command executed:
    gatk ApplyBQSR -I SRR042441_MarkDup.bam -bqsr SRR042441_recal_data.table -O SRR042441_bqsr.bam

    Version:
    I am running this inside the docker container broadinstitute/gatk:latest which uses GATK jar /gatk/gatk-package-4.0.11.0-local.jar

    Command error:
    11:18:22.223 INFO ApplyBQSR - ------------------------------------------------------------
    11:18:22.223 INFO ApplyBQSR - ------------------------------------------------------------
    11:18:22.224 INFO ApplyBQSR - HTSJDK Version: 2.16.1
    11:18:22.224 INFO ApplyBQSR - Picard Version: 2.18.13
    11:18:22.224 INFO ApplyBQSR - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    11:18:22.224 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    11:18:22.224 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    11:18:22.224 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    11:18:22.224 INFO ApplyBQSR - Deflater: IntelDeflater
    11:18:22.224 INFO ApplyBQSR - Inflater: IntelInflater
    11:18:22.224 INFO ApplyBQSR - GCS max retries/reopens: 20
    11:18:22.224 INFO ApplyBQSR - Requester pays: disabled
    11:18:22.225 INFO ApplyBQSR - Initializing engine
    11:18:22.567 INFO ApplyBQSR - Done initializing engine
    11:18:22.592 INFO ProgressMeter - Starting traversal
    11:18:22.592 INFO ProgressMeter - Current Locus Elapsed Minutes Reads Processed Reads/Minute
    11:18:22.705 INFO ApplyBQSR - Shutting down engine
    [December 16, 2018 11:18:22 AM UTC] org.broadinstitute.hellbender.tools.walkers.bqsr.ApplyBQSR done. Elapsed time: 0.01 minutes.
    Runtime.totalMemory()=1970798592
    java.lang.IllegalStateException: The covariates table is missing ReadGroup SRR042441 in RecalTable0
    at org.broadinstitute.hellbender.utils.Utils.validate(Utils.java:749)
    at org.broadinstitute.hellbender.utils.recalibration.covariates.ReadGroupCovariate.keyForReadGroup(ReadGroupCovariate.java:81)
    at org.broadinstitute.hellbender.utils.recalibration.covariates.ReadGroupCovariate.recordValues(ReadGroupCovariate.java:53)
    at org.broadinstitute.hellbender.utils.recalibration.covariates.StandardCovariateList.recordAllValuesInStorage(StandardCovariateList.java:133)
    at org.broadinstitute.hellbender.utils.recalibration.RecalUtils.computeCovariates(RecalUtils.java:546)
    at org.broadinstitute.hellbender.utils.recalibration.RecalUtils.computeCovariates(RecalUtils.java:527)
    at org.broadinstitute.hellbender.transformers.BQSRReadTransformer.apply(BQSRReadTransformer.java:145)
    at org.broadinstitute.hellbender.transformers.BQSRReadTransformer.apply(BQSRReadTransformer.java:27)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
    at org.broadinstitute.hellbender.engine.ReadWalker.traverse(ReadWalker.java:89)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:966)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)
    Using GATK jar /gatk/gatk-package-4.0.11.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.0.11.0-local.jar ApplyBQSR -I SRR042441_MarkDup.bam -bqsr SRR042441_recal_data.table -O SRR042441_bqsr.bam

    The original FASTQs I mapped were from here:
    s3://1000genomes/phase3/data/HG00097/sequence_read/SRR042441.filt.fastq.gz
    s3://1000genomes/phase3/data/HG00099/sequence_read/SRR042437.filt.fastq.gz
    Could it be to do with them being filtered reads?

    The following files are within the working directory:
    SRR042441_MarkDup.bam
    SRR042441_bqsr.bai
    SRR042441_bqsr.bam
    SRR042441_recal_data.table
    I have checked and none of them seem to be malformed, although only 12 lines were outputted into SRR042441_bqsr.bam before the command failed

    Would you also happen to know where I can get some FASTQ files to use as testdata which are known/should work for GATK germline best practice pipeline? That is why I am using this 1000 genomes data, however, I sometimes get this error and sometimes a problem with VariantRecalibrator. I think the problem with VariantRecalibrator may just be because the files are not large enough.
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @palmerp

    We apologize we were unable to get to your question and our team is on a holiday until Feb 2nd 2019. We will come back and get to your question asap.
    Merry Christmas and Happy New Year!

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited January 2019

    Hi @palmerp

    It seems to me that the way the recal table file was created is causing this error. Would you please redo your BaseRecalibrator step as shown in this command:

    gatk BaseRecalibrator \
    -I my_reads.bam \
    -R reference.fasta \
    --known-sites sites_of_variation.vcf \
    --known-sites another/optional/setOfSitesToMask.vcf \
    -O recal_data.table

  • cricketcricket Member
    I am starting with paired fastq WGS files with the goal of getting to a recalibrated bam file. My script fails on the ApplyBQSR step with an error stating "The covariates table is missing ReadGroup CBV1EANXX.2.89 in RecalTable0"; I am/was trying to trace back to the origin of the error and wonder if it might be my PU parameter in a previous step.

    Here is the workflow

    ```
    align (bwa mem -R): sam -> sort (picard SortSam): bam1 -> mark_dups (picard MarkDuplicates): bam2 -> recalibrate (gatk4 BaseRecalibrator): covTab -> applyCO (gatk4 ApplyBQSR): bam3
    ```

    The readgroup information is added in the bwa mem step, but I have also left out that parameter and used `AddOrReplaceReadGroups` later with similar success.

    I am using BI docker containers of the tools along with singularity; I am using the latest `picard` tools, `gatk version 4.1.3.0`, and `java version 1.8.0_66`. I am attempting to follow the best practices based on a melange of communications..."From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline", https://github.com/gatk-workflows/five-dollar-genome-analysis-pipeline, https://gatkforums.broadinstitute.org/gatk/discussion/6484/how-to-generate-an-unmapped-bam-from-fastq-or-aligned-bam, https://software.broadinstitute.org/gatk/best-practices/workflow?id=11165, and https://software.broadinstitute.org/gatk/documentation/article?id=6484.

    Here is the traceback:
    ```
    17:49:22.575 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Oct 30, 2019 5:49:24 PM shaded.cloud_nio.com.google.auth....................
    INFO: Failed to detect whether we are running on Google Compute Engine.
    17:49:24.429 INFO ApplyBQSR - ------------------------------------------------------------
    17:49:24.429 INFO ApplyBQSR - The Genome Analysis Toolkit (GATK) v4.1.3.0
    17:49:24.430 INFO ApplyBQSR - For support and documentation go to https://software.broadinstitute.org/gatk/
    17:49:24.432 INFO ApplyBQSR - Executing as XXXXXX on Linux v3.10.0-000.00.0.el7.x86_64 amd64
    17:49:24.432 INFO ApplyBQSR - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_191-8u191-b12-0ubuntu0.16.04.1-b12
    17:49:24.432 INFO ApplyBQSR - Start Date/Time: October 30, 2019 5:49:22 PM UTC
    17:49:24.432 INFO ApplyBQSR - ------------------------------------------------------------
    17:49:24.432 INFO ApplyBQSR - ------------------------------------------------------------
    17:49:24.433 INFO ApplyBQSR - HTSJDK Version: 2.20.1
    17:49:24.433 INFO ApplyBQSR - Picard Version: 2.20.5
    17:49:24.433 INFO ApplyBQSR - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    17:49:24.434 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    17:49:24.434 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    17:49:24.434 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    17:49:24.434 INFO ApplyBQSR - Deflater: IntelDeflater
    17:49:24.434 INFO ApplyBQSR - Inflater: IntelInflater
    17:49:24.434 INFO ApplyBQSR - GCS max retries/reopens: 20
    17:49:24.434 INFO ApplyBQSR - Requester pays: disabled
    17:49:24.434 INFO ApplyBQSR - Initializing engine
    WARNING: BAM index file data/endpoints/bam/wgs/hg38/sorted_RS107-01a.bai is older than BAM data/endpoints/bam/wgs/hg38/sorted_RS107-01a.bam
    17:49:25.452 INFO ApplyBQSR - Done initializing engine
    17:49:25.602 INFO ProgressMeter - Starting traversal
    17:49:25.603 INFO ProgressMeter - Current Locus Elapsed Minutes Reads Processed Reads/Minute
    17:49:26.150 INFO ApplyBQSR - Shutting down engine
    [October 30, 2019 5:49:26 PM UTC] org.broadinstitute.hellbender.tools.walkers.bqsr.ApplyBQSR done. Elapsed time: 0.06 minutes.
    Runtime.totalMemory()=3087007744
    java.lang.IllegalStateException: The covariates table is missing ReadGroup CBV1EANXX.2.89 in RecalTable0
    at org.broadinstitute.hellbender.utils.Utils.validate(Utils.java:749)
    at org.broadinstitute.hellbender.utils.recalibration.covariates.ReadGroupCovariate.keyForReadGroup(ReadGroupCovariate.java:81)
    at org.broadinstitute.hellbender.utils.recalibration.covariates.ReadGroupCovariate.recordValues(ReadGroupCovariate.java:53)
    at org.broadinstitute.hellbender.utils.recalibration.covariates.StandardCovariateList.recordAllValuesInStorage(StandardCovariateList.java:133)
    at org.broadinstitute.hellbender.utils.recalibration.RecalUtils.computeCovariates(RecalUtils.java:546)
    at org.broadinstitute.hellbender.utils.recalibration.RecalUtils.computeCovariates(RecalUtils.java:527)
    at org.broadinstitute.hellbender.transformers.BQSRReadTransformer.apply(BQSRReadTransformer.java:145)
    at org.broadinstitute.hellbender.transformers.BQSRReadTransformer.apply(BQSRReadTransformer.java:27)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
    at org.broadinstitute.hellbender.engine.ReadWalker.traverse(ReadWalker.java:94)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
    at org.broadinstitute.hellbender.Main.main(Main.java:291)
    ```

    Here is the exact command:
    ```
    java -Xmx3g -Xms3g -jar /gatk/gatk-package-4.1.3.0-local.jar ApplyBQSR -R ref-data/broad/ftp.broadinstitute.org/bundle/hg38/Homo_sapiens_assembly38.fasta.gz -I data/endpoints/bam/wgs/hg38/sorted_RS107-01a.bam --bqsr-recal-file reports/files/recal_data_RS107-01a.table -O data/endpoints/recal_bam/wgs/hg38/recalibrated_RS107-01a.bam 2> logs/alignment/ApplyBQSR_RS107-01a.gatk.log
    ```

    I have used ValidateSamFile (e.g., `java -Xmx5g -Xms5g -jar /usr/picard/picard.jar ValidateSamFile I=data/endpoints/bam/wgs/hg38/sorted_RS107-01a.bam) MODE=SUMMARY 2> sorted_validate` on all created sam/bam files and received `No errors found` for all three.

    As I mentioned previously, I am starting with paired fastq WGS files and wonder if "patient0" is my library prep identifier. If that is the case, I cannot obtain information about what library was used for prep, but to the best of my knowledge, each file represents a single sample (r1 or r2) and is the result of a single run.

    I added the readgroup information in the `bwa mem -R` step, but I also tried skipping that parameter and using `picard AddOrReplaceReadGroups`. I cobbled together some items from the fastq headers (e.g., @D00687:89:CBV1EANXX:2:2201:1088:2049 1:N:0:ACTATGCA) but I don't know if that is copacetic library identifier when the original information is lacking (guidance is appreciated). Here is the header that was used:

    `@RG ID:CBV1EANXX.2.89 LB:89 PL:illumina SM:RS107-01a PU:CBV1EANXX.2.89`

    I don't know if it is a PU error, if not...guidance is appreciated. Also, apologies for the lack of formatting, I put in backticks but they are apparently on strike. Thank you in advance.
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @cricket

    Is it possible that the recalibration table was computed on a different bam than the one you're applying it to? The error sounds like it's looking for a read group that isn't in the recalibration table. Can you please confirm that the BQSR and ApplyBQSR tools were executed on the same bam?

  • xingaulagxingaulag VietnamMember
    edited December 2019
    hi @bhanuGandham, I got the same error like @Lavanya , I tried to recalibration again then apply BQSR but it didn't work.
    do you have any idea?
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @xingaulag

    Please post
    1. the version of GATK you are using,
    2. the exact command and
    3. the entire error log

  • xingaulagxingaulag VietnamMember
    Version: $gatk --version

    Using GATK jar /home/thanh/Desktop/gatk/work/gatk/gatk-package-4.1.4.1-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/thanh/Desktop/gatk/work/gatk/gatk-package-4.1.4.1-local.jar --version
    The Genome Analysis Toolkit (GATK) v4.1.4.1
    HTSJDK Version: 2.21.0
    Picard Version: 2.21.2

    My command:

    $gatk ApplyBQSR -R refgen/resources-broad-hg38-v0-Homo_sapiens_assembly38.fasta -I mark38/VT1_sorted.bam --bqsr-recal-file recal38/recal_data1.table -O recal38/VT1_arr.bam

    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/thanh/Desktop/gatk/work/gatk/gatk-package-4.1.4.1-local.jar ApplyBQSR -R refgen/resources-broad-hg38-v0-Homo_sapiens_assembly38.fasta -I mark38/VT1_sorted.bam --bqsr-recal-file recal38/recal_data1.table -O recal38/VT1_arr.bam
    16:05:00.658 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/thanh/Desktop/gatk/work/gatk/gatk-package-4.1.4.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Dec 23, 2019 4:05:00 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    16:05:00.743 INFO ApplyBQSR - ------------------------------------------------------------
    16:05:00.743 INFO ApplyBQSR - The Genome Analysis Toolkit (GATK) v4.1.4.1
    16:05:00.743 INFO ApplyBQSR - For support and documentation go to https://software.broadinstitute.org/gatk/
    16:05:00.743 INFO ApplyBQSR - Executing as [email protected] on Linux v5.0.0-37-generic amd64
    16:05:00.743 INFO ApplyBQSR - Java runtime: OpenJDK 64-Bit Server VM v11.0.5+10-post-Ubuntu-0ubuntu1.118.04
    16:05:00.744 INFO ApplyBQSR - Start Date/Time: December 23, 2019 at 4:05:00 PM ICT
    16:05:00.744 INFO ApplyBQSR - ------------------------------------------------------------
    16:05:00.744 INFO ApplyBQSR - ------------------------------------------------------------
    16:05:00.744 INFO ApplyBQSR - HTSJDK Version: 2.21.0
    16:05:00.744 INFO ApplyBQSR - Picard Version: 2.21.2
    16:05:00.744 INFO ApplyBQSR - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    16:05:00.744 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    16:05:00.744 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    16:05:00.744 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    16:05:00.745 INFO ApplyBQSR - Deflater: IntelDeflater
    16:05:00.745 INFO ApplyBQSR - Inflater: IntelInflater
    16:05:00.745 INFO ApplyBQSR - GCS max retries/reopens: 20
    16:05:00.745 INFO ApplyBQSR - Requester pays: disabled
    16:05:00.745 INFO ApplyBQSR - Initializing engine
    16:05:00.959 INFO ApplyBQSR - Done initializing engine
    16:05:01.002 INFO ProgressMeter - Starting traversal
    16:05:01.003 INFO ProgressMeter - Current Locus Elapsed Minutes Reads Processed Reads/Minute
    16:05:01.456 INFO ApplyBQSR - Shutting down engine
    [December 23, 2019 at 4:05:01 PM ICT] org.broadinstitute.hellbender.tools.walkers.bqsr.ApplyBQSR done. Elapsed time: 0.01 minutes.
    Runtime.totalMemory()=257949696
    java.lang.IllegalStateException: The covariates table is missing ReadGroup unit1 in RecalTable0
    at org.broadinstitute.hellbender.utils.Utils.validate(Utils.java:750)
    at org.broadinstitute.hellbender.utils.recalibration.covariates.ReadGroupCovariate.keyForReadGroup(ReadGroupCovariate.java:81)
    at org.broadinstitute.hellbender.utils.recalibration.covariates.ReadGroupCovariate.recordValues(ReadGroupCovariate.java:53)
    at org.broadinstitute.hellbender.utils.recalibration.covariates.StandardCovariateList.recordAllValuesInStorage(StandardCovariateList.java:133)
    at org.broadinstitute.hellbender.utils.recalibration.RecalUtils.computeCovariates(RecalUtils.java:546)
    at org.broadinstitute.hellbender.utils.recalibration.RecalUtils.computeCovariates(RecalUtils.java:527)
    at org.broadinstitute.hellbender.transformers.BQSRReadTransformer.apply(BQSRReadTransformer.java:145)
    at org.broadinstitute.hellbender.transformers.BQSRReadTransformer.apply(BQSRReadTransformer.java:27)
    at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
    at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
    at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
    at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
    at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
    at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
    at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
    at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
    at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
    at org.broadinstitute.hellbender.engine.ReadWalker.traverse(ReadWalker.java:94)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
    at org.broadinstitute.hellbender.Main.main(Main.java:292)

    I have added RG in bwa mem step. Before this step, I also run BaseRecalibration

    gatk BaseRecalibrator -I mark38/VT1_sorted.bam -R refgen/resources-broad-hg38-v0-Homo_sapiens_assembly38.fasta --known-sites refgen/resources-broad-hg38-v0-Homo_sapiens_assembly38.dbsnp138.vcf -O recal38/recal_data1.table
    Using GATK jar /home/thanh/Desktop/gatk/work/gatk/gatk-package-4.1.4.1-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/thanh/Desktop/gatk/work/gatk/gatk-package-4.1.4.1-local.jar BaseRecalibrator -I mark38/VT1_sorted.bam -R refgen/resources-broad-hg38-v0-Homo_sapiens_assembly38.fasta --known-sites refgen/resources-broad-hg38-v0-Homo_sapiens_assembly38.dbsnp138.vcf -O recal38/recal_data1.table
    16:02:39.233 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/thanh/Desktop/gatk/work/gatk/gatk-package-4.1.4.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Dec 23, 2019 4:02:39 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    16:02:39.814 INFO BaseRecalibrator - ------------------------------------------------------------
    16:02:39.814 INFO BaseRecalibrator - The Genome Analysis Toolkit (GATK) v4.1.4.1
    16:02:39.814 INFO BaseRecalibrator - For support and documentation go to https://software.broadinstitute.org/gatk/
    16:02:39.814 INFO BaseRecalibrator - Executing as [email protected] on Linux v5.0.0-37-generic amd64
    16:02:39.814 INFO BaseRecalibrator - Java runtime: OpenJDK 64-Bit Server VM v11.0.5+10-post-Ubuntu-0ubuntu1.118.04
    16:02:39.814 INFO BaseRecalibrator - Start Date/Time: December 23, 2019 at 4:02:39 PM ICT
    16:02:39.815 INFO BaseRecalibrator - ------------------------------------------------------------
    16:02:39.815 INFO BaseRecalibrator - ------------------------------------------------------------
    16:02:39.815 INFO BaseRecalibrator - HTSJDK Version: 2.21.0
    16:02:39.815 INFO BaseRecalibrator - Picard Version: 2.21.2
    16:02:39.815 INFO BaseRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    16:02:39.815 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    16:02:39.815 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    16:02:39.815 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    16:02:39.815 INFO BaseRecalibrator - Deflater: IntelDeflater
    16:02:39.815 INFO BaseRecalibrator - Inflater: IntelInflater
    16:02:39.815 INFO BaseRecalibrator - GCS max retries/reopens: 20
    16:02:39.816 INFO BaseRecalibrator - Requester pays: disabled
    16:02:39.816 INFO BaseRecalibrator - Initializing engine
    16:02:40.111 INFO FeatureManager - Using codec VCFCodec to read file file:///home/thanh/Desktop/gatk/work/refgen/resources-broad-hg38-v0-Homo_sapiens_assembly38.dbsnp138.vcf
    16:02:40.288 INFO BaseRecalibrator - Done initializing engine
    16:02:40.304 INFO BaseRecalibrationEngine - The covariates being used here:
    16:02:40.304 INFO BaseRecalibrationEngine - ReadGroupCovariate
    16:02:40.304 INFO BaseRecalibrationEngine - QualityScoreCovariate
    16:02:40.304 INFO BaseRecalibrationEngine - ContextCovariate
    16:02:40.304 INFO BaseRecalibrationEngine - CycleCovariate
    16:02:40.353 INFO ProgressMeter - Starting traversal
    16:02:40.353 INFO ProgressMeter - Current Locus Elapsed Minutes Reads Processed Reads/Minute
    16:02:54.198 INFO ProgressMeter - chr5:177148604 0.2 4000 17349.8
    16:03:04.593 INFO ProgressMeter - chr10:102833979 0.4 8000 19802.8
    16:03:16.247 INFO ProgressMeter - chr17:50679707 0.6 13000 21730.7
    16:03:24.196 INFO BaseRecalibrator - 420966 read(s) filtered by: ((((((MappingQualityNotZeroReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter) AND WellformedReadFilter)
    420966 read(s) filtered by: (((((MappingQualityNotZeroReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter)
    420966 read(s) filtered by: ((((MappingQualityNotZeroReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter) AND NotDuplicateReadFilter)
    39915 read(s) filtered by: (((MappingQualityNotZeroReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter)
    39394 read(s) filtered by: ((MappingQualityNotZeroReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter)
    39394 read(s) filtered by: (MappingQualityNotZeroReadFilter AND MappingQualityAvailableReadFilter)
    39394 read(s) filtered by: MappingQualityNotZeroReadFilter
    521 read(s) filtered by: NotSecondaryAlignmentReadFilter
    381051 read(s) filtered by: NotDuplicateReadFilter

    16:03:24.197 INFO ProgressMeter - chr20:47381444 0.7 15776 21589.8
    16:03:24.197 INFO ProgressMeter - Traversal complete. Processed 15776 total reads in 0.7 minutes.
    16:03:24.278 INFO BaseRecalibrator - Calculating quantized quality scores...
    16:03:24.288 INFO BaseRecalibrator - Writing recalibration report...
    16:03:25.713 INFO BaseRecalibrator - ...done!
    16:03:25.713 INFO BaseRecalibrator - Shutting down engine
    [December 23, 2019 at 4:03:25 PM ICT] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 0.78 minutes.
    Runtime.totalMemory()=283115520
    Tool returned:
    15776
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @xingaulag

    Can you please run ValidateSam using the latest GATK4.1.4.1 to verify if the bam files are not malformed? Take a look at this doc for info on how to resolve some common errors based on the output from ValidateSam: https://software.broadinstitute.org/gatk/documentation/article?id=11095

  • xingaulagxingaulag VietnamMember
    edited December 2019
    hi @bhanuGandham I tried the ValidateSam, it reported No errors found and this line "WARNING 2019-12-24 10:29:21 ValidateSamFile NM validation cannot be performed without the reference. All other validations will still occur."

    My command:
    java -jar picard.jar ValidateSamFile I=mark38/VT1_sorted.bam MODE=SUMMARY
    INFO 2019-12-24 10:59:04 ValidateSamFile

    ********** NOTE: Picard's command line syntax is changing.
    **********
    ********** For more information, please see:
    ********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
    **********
    ********** The command line looks like this in the new syntax:
    **********
    ********** ValidateSamFile -I mark38/VT1_sorted.bam -MODE SUMMARY
    **********


    10:59:04.977 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/thanh/Desktop/gatk/work/picard.jar!/com/intel/gkl/native/libgkl_compression.so
    [Tue Dec 24 10:59:05 ICT 2019] ValidateSamFile INPUT=mark38/VT1_sorted.bam MODE=SUMMARY MAX_OUTPUT=100 IGNORE_WARNINGS=false VALIDATE_INDEX=true INDEX_VALIDATION_STRINGENCY=EXHAUSTIVE IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 SKIP_MATE_VALIDATION=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
    [Tue Dec 24 10:59:05 ICT 2019] Executing as [email protected] on Linux 5.0.0-37-generic amd64; OpenJDK 64-Bit Server VM 11.0.5+10-post-Ubuntu-0ubuntu1.118.04; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.21.4-SNAPSHOT
    WARNING 2019-12-24 10:59:05 ValidateSamFile NM validation cannot be performed without the reference. All other validations will still occur.
    No errors found
    [Tue Dec 24 10:59:08 ICT 2019] picard.sam.ValidateSamFile done. Elapsed time: 0.06 minutes.
    Runtime.totalMemory()=309329920


    this warning is not available among 8 warnings
    Post edited by xingaulag on
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @xingaulag

    We are not monitoring questions posted on the forum at this time and cannot guarantee a response. Our team is out of office until Jan 2nd 2020. If you are not able to find a solution, please post your question again after Jan 2nd.

  • xingaulagxingaulag VietnamMember
    hi @bhanuGandham , thank you and GATK team, I've already solved my problem now.
  • DerekCADerekCA Member, Administrator, Broadie, Moderator admin

    "...thank you and GATK team, I've already solved my problem now."

    Hi there, @xingaulag! For the benefit of others who may have encountered a similar issue, would you mind explaining how you got it to work?

Sign In or Register to comment.