Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

BaseRecalibrator : getContigNames(SequenceDictionaryUtils.java:463)

Paul_ArthurPaul_Arthur ParisMember
edited July 11 in Ask the GATK team
Hi,

I'm trying to use the BaseRecalibrator tool on a BAM file but the program doesn't run to the end. The messages returned by the tool did not allow me to correct the error by myself. I am running version 4.1.2.0 of GATK4.

Here is the complete message:

```
16:09:12.733 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/data2/home/pamesl/miniconda3/envs/smk_env/share/gatk4-4.1.2.0-1/gatk-package-4.1.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jul 11, 2019 4:09:14 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
16:09:14.487 INFO BaseRecalibrator - ------------------------------------------------------------
16:09:14.488 INFO BaseRecalibrator - The Genome Analysis Toolkit (GATK) v4.1.2.0
16:09:14.488 INFO BaseRecalibrator - For support and documentation go to
16:09:14.488 INFO BaseRecalibrator - Executing as [email protected] on Linux v2.6.32-573.7.1.el6.x86_64 amd64
16:09:14.489 INFO BaseRecalibrator - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_152-release-1056-b12
16:09:14.489 INFO BaseRecalibrator - Start Date/Time: 11 juillet 2019 16:09:12 CEST
16:09:14.489 INFO BaseRecalibrator - ------------------------------------------------------------
16:09:14.489 INFO BaseRecalibrator - ------------------------------------------------------------
16:09:14.490 INFO BaseRecalibrator - HTSJDK Version: 2.19.0
16:09:14.490 INFO BaseRecalibrator - Picard Version: 2.19.0
16:09:14.490 INFO BaseRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:09:14.491 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:09:14.491 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:09:14.491 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:09:14.491 INFO BaseRecalibrator - Deflater: IntelDeflater
16:09:14.491 INFO BaseRecalibrator - Inflater: IntelInflater
16:09:14.491 INFO BaseRecalibrator - GCS max retries/reopens: 20
16:09:14.491 INFO BaseRecalibrator - Requester pays: disabled
16:09:14.492 INFO BaseRecalibrator - Initializing engine
16:09:15.263 INFO FeatureManager - Using codec VCFCodec to read file file:///data1/scratch/pamesl/projet_cbf/data/dbSNP/dbsnp_138.hg19.vcf.gz
16:09:15.411 INFO FeatureManager - Using codec VCFCodec to read file file:///data1/scratch/pamesl/projet_cbf/data/mills_1000G/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf
16:09:15.428 INFO BaseRecalibrator - Shutting down engine
[11 juillet 2019 16:09:15 CEST] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 0.05 minutes.
Runtime.totalMemory()=2224553984
java.lang.NullPointerException
at org.broadinstitute.hellbender.utils.SequenceDictionaryUtils.getContigNames(SequenceDictionaryUtils.java:463)
at org.broadinstitute.hellbender.utils.SequenceDictionaryUtils.getCommonContigsByName(SequenceDictionaryUtils.java:457)
at org.broadinstitute.hellbender.utils.SequenceDictionaryUtils.compareDictionaries(SequenceDictionaryUtils.java:234)
at org.broadinstitute.hellbender.utils.SequenceDictionaryUtils.validateDictionaries(SequenceDictionaryUtils.java:150)
at org.broadinstitute.hellbender.utils.SequenceDictionaryUtils.validateDictionaries(SequenceDictionaryUtils.java:98)
at org.broadinstitute.hellbender.engine.GATKTool.validateSequenceDictionaries(GATKTool.java:760)
at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:702)
at org.broadinstitute.hellbender.engine.ReadWalker.onStartup(ReadWalker.java:50)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:137)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
at org.broadinstitute.hellbender.Main.main(Main.java:291)
Using GATK jar /data2/home/pamesl/miniconda3/envs/smk_env/share/gatk4-4.1.2.0-1/gatk-package-4.1.2.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /data2/home/pamesl/miniconda3/envs/smk_env/share/gatk4-4.1.2.0-1/gatk-package-4.1.2.0-local.jar BaseRecalibrator -I /data1/scratch/pamesl/projet_cbf/data/bam/SJCBF016_G-C0DG1ACXX.5_marked_duplicates.bam -R /data1/scratch/pamesl/projet_cbf/data/hg19_data/reference_hg19/ucsc.hg19.fasta.gz --known-sites /data1/scratch/pamesl/projet_cbf/data/dbSNP/dbsnp_138.hg19.vcf.gz --known-sites /data1/scratch/pamesl/projet_cbf/data/mills_1000G/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf -O /data1/scratch/pamesl/projet_cbf/data/bam/recal_data_SJCBF016_G-C0DG1ACXX.5.table
```

I checked the validity of the BAM file SJCBF016_G-C0DG1ACXX.5_marked_duplicates.bam using the ValidateSamFile tool and got the following result:

```
No errors found
Tool returned:
0
```

I have a feeling that the problem comes from my Mills_and_1000G_gold_standard files.indels.hg19.sites.vcf, dbsnp_138.hg19.vcf.gz or my reference file ucsc.hg19.fasta.gz but I don't know which way to go.

Edit: I will perform ValidateVariants on each VCF files and post results tomorrow.

Best regards,

Paul-Arthur
Post edited by Paul_Arthur on

Best Answer

  • Paul_ArthurPaul_Arthur Paris
    Accepted Answer
    After deleting and recreating the different files, the problem is solved. The defective file was the .dict file, although I don't understand where the initial error came from.

Answers

  • Paul_ArthurPaul_Arthur ParisMember
    Here are the outputs of ValidateVariants on my files:

    ```
    09:42:13.154 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/data2/home/pamesl/miniconda3/envs/gatk4_4.1.2.0_e
    nv/share/gatk4-4.1.2.0-1/gatk-package-4.1.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Jul 12, 2019 9:42:14 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    09:42:14.894 INFO ValidateVariants - ------------------------------------------------------------
    09:42:14.895 INFO ValidateVariants - The Genome Analysis Toolkit (GATK) v4.1.2.0
    09:42:14.895 INFO ValidateVariants - For support and documentation go to
    09:42:14.895 INFO ValidateVariants - Executing as [email protected] on Linux v2.6.32-573.7.1.el6.x86_64 amd64
    09:42:14.896 INFO ValidateVariants - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_152-release-1056-b12
    09:42:14.896 INFO ValidateVariants - Start Date/Time: 12 juillet 2019 09:42:13 CEST
    09:42:14.896 INFO ValidateVariants - ------------------------------------------------------------
    09:42:14.896 INFO ValidateVariants - ------------------------------------------------------------
    09:42:14.897 INFO ValidateVariants - HTSJDK Version: 2.19.0
    09:42:14.897 INFO ValidateVariants - Picard Version: 2.19.0
    09:42:14.898 INFO ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    09:42:14.898 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    09:42:14.898 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    09:42:14.898 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    09:42:14.898 INFO ValidateVariants - Deflater: IntelDeflater
    09:42:14.898 INFO ValidateVariants - Inflater: IntelInflater
    09:42:14.899 INFO ValidateVariants - GCS max retries/reopens: 20
    09:42:14.899 INFO ValidateVariants - Requester pays: disabled
    09:42:14.899 INFO ValidateVariants - Initializing engine
    09:42:15.555 INFO FeatureManager - Using codec VCFCodec to read file file:///data1/scratch/pamesl/projet_cbf/data/dbSNP/dbsnp_138.hg19.vcf.gz
    09:42:15.921 INFO ValidateVariants - Done initializing engine
    09:42:15.921 INFO ProgressMeter - Starting traversal
    09:42:15.921 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
    09:42:25.927 INFO ProgressMeter - chr1:85219337 0.2 1773000 10633746.5
    09:42:35.926 INFO ProgressMeter - chr1:203432070 0.3 3720000 11157768.4
    09:42:45.928 INFO ProgressMeter - chr2:41628100 0.5 5644000 11285366.7
    09:42:55.932 INFO ProgressMeter - chr2:139260385 0.7 7620000 11427143.2
    09:43:05.937 INFO ProgressMeter - chr2:236480298 0.8 9564000 11473128.6
    09:43:15.939 INFO ProgressMeter - chr3:82594206 1.0 11532000 11528541.4
    09:43:25.942 INFO ProgressMeter - chr3:181681522 1.2 13506000 11573099.5
    09:43:35.945 INFO ProgressMeter - chr4:72864798 1.3 15489000 11613266.0
    09:43:45.948 INFO ProgressMeter - chr4:170394397 1.5 17453000 11631843.8
    09:43:55.951 INFO ProgressMeter - chr5:73338396 1.7 19428000 11653304.0
    09:44:05.952 INFO ProgressMeter - chr5:168904051 1.8 21399000 11668999.4
    09:44:15.952 INFO ProgressMeter - chr6:76816892 2.0 23373000 11683481.8
    09:44:25.954 INFO ProgressMeter - chr6:170470765 2.2 25329000 11687340.9
    09:44:35.957 INFO ProgressMeter - chr7:90139660 2.3 27303000 11698277.6
    09:44:45.959 INFO ProgressMeter - chr8:16590512 2.5 29272000 11705834.5
    09:44:55.961 INFO ProgressMeter - chr8:114349378 2.7 31246000 11714321.4
    09:45:05.963 INFO ProgressMeter - chr9:81277834 2.8 33239000 11728514.1
    09:45:15.967 INFO ProgressMeter - chr10:29180143 3.0 35206000 11732400.2
    09:45:25.968 INFO ProgressMeter - chr10:126674848 3.2 37170000 11734991.9
    09:45:35.972 INFO ProgressMeter - chr11:81324426 3.3 39136000 11737865.5
    09:45:45.979 INFO ProgressMeter - chr12:38295747 3.5 41091000 11737044.1
    09:45:55.985 INFO ProgressMeter - chr12:132001225 3.7 43051000 11737767.2
    09:46:05.987 INFO ProgressMeter - chr13:110019237 3.8 45006000 11737327.5
    09:46:15.989 INFO ProgressMeter - chr14:104967730 4.0 46957000 11735924.8
    09:46:25.992 INFO ProgressMeter - chr16:5281264 4.2 48917000 11736793.7
    09:46:35.992 INFO ProgressMeter - chr17:7334268 4.3 50895000 11741793.6
    09:46:45.996 INFO ProgressMeter - chr18:22883277 4.5 52861000 11743626.8
    09:46:56.000 INFO ProgressMeter - chr19:35290321 4.7 54825000 11744900.5
    09:47:06.001 INFO ProgressMeter - chr20:62705429 4.8 56805000 11749557.9
    09:47:16.004 INFO ProgressMeter - chrX:22158309 5.0 58786000 11753948.1
    09:47:26.004 INFO ProgressMeter - chrY:14482729 5.2 60807000 11765946.5
    09:47:26.248 INFO ProgressMeter - chrY:59338394 5.2 60860307 11767002.0
    09:47:26.249 INFO ProgressMeter - Traversal complete. Processed 60860307 total variants in 5.2 minutes.
    09:47:26.249 INFO ValidateVariants - Shutting down engine
    [12 juillet 2019 09:47:26 CEST] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 5.22 minutes.
    Runtime.totalMemory()=1924136960
    Using GATK jar /data2/home/pamesl/miniconda3/envs/gatk4_4.1.2.0_env/share/gatk4-4.1.2.0-1/gatk-package-4.1.2.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /data2/home/pamesl/miniconda3/envs/gatk4_4.1.2.0_env/share/gatk4-4.1.2.0-1/gatk-package-4.1.2.0-local.jar ValidateVariants -V /data1/scratch/pamesl/projet_cbf/data/dbSNP/dbsnp_138.hg19.vcf.gz
    ```

    and

    ```
    09:43:43.281 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/data2/home/pamesl/miniconda3/envs/gatk4_4.1.2.0_env/share/gatk4-4.1.2.0-1/gatk-package-4.1.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Jul 12, 2019 9:43:45 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    09:43:45.017 INFO ValidateVariants - ------------------------------------------------------------
    09:43:45.018 INFO ValidateVariants - The Genome Analysis Toolkit (GATK) v4.1.2.0
    09:43:45.018 INFO ValidateVariants - For support and documentation go to
    09:43:45.019 INFO ValidateVariants - Executing as [email protected] on Linux v2.6.32-573.7.1.el6.x86_64 amd64
    09:43:45.019 INFO ValidateVariants - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_152-release-1056-b12
    09:43:45.019 INFO ValidateVariants - Start Date/Time: 12 juillet 2019 09:43:43 CEST
    09:43:45.019 INFO ValidateVariants - ------------------------------------------------------------
    09:43:45.020 INFO ValidateVariants - ------------------------------------------------------------
    09:43:45.021 INFO ValidateVariants - HTSJDK Version: 2.19.0
    09:43:45.021 INFO ValidateVariants - Picard Version: 2.19.0
    09:43:45.021 INFO ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    09:43:45.021 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    09:43:45.021 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    09:43:45.021 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    09:43:45.021 INFO ValidateVariants - Deflater: IntelDeflater
    09:43:45.022 INFO ValidateVariants - Inflater: IntelInflater
    09:43:45.022 INFO ValidateVariants - GCS max retries/reopens: 20
    09:43:45.022 INFO ValidateVariants - Requester pays: disabled
    09:43:45.022 INFO ValidateVariants - Initializing engine
    09:43:45.692 INFO FeatureManager - Using codec VCFCodec to read file file:///data1/scratch/pamesl/projet_cbf/data/mills_1000G/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf
    09:43:45.823 INFO ValidateVariants - Done initializing engine
    09:43:45.824 INFO ProgressMeter - Starting traversal
    09:43:45.825 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
    09:43:50.024 INFO ProgressMeter - chrX:151966000 0.1 1274580 18221300.9
    09:43:50.024 INFO ProgressMeter - Traversal complete. Processed 1274580 total variants in 0.1 minutes.
    09:43:50.024 INFO ValidateVariants - Shutting down engine
    [12 juillet 2019 09:43:50 CEST] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 0.11 minutes.
    Runtime.totalMemory()=2469920768
    Using GATK jar /data2/home/pamesl/miniconda3/envs/gatk4_4.1.2.0_env/share/gatk4-4.1.2.0-1/gatk-package-4.1.2.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /data2/home/pamesl/miniconda3/envs/gatk4_4.1.2.0_env/share/gatk4-4.1.2.0-1/gatk-package-4.1.2.0-local.jar ValidateVariants -V /data1/scratch/pamesl/projet_cbf/data/mills_1000G/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf
    ```
  • Paul_ArthurPaul_Arthur ParisMember
    Accepted Answer
    After deleting and recreating the different files, the problem is solved. The defective file was the .dict file, although I don't understand where the initial error came from.
Sign In or Register to comment.