We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

GenotypeGVCF stuck(?) after ProgressMeter - Starting traversal

I am running GenotypeGVCF on ~1700 samples. I use the intervals from https://console.cloud.google.com/storage/browser/_details/gatk-test-data/intervals/hg38.even.handcurated.20k.intervals?project=broad-dsde-outreach&organizationId=548622027621. The genomicDB is filled from haplotypecaller VCFs which have been produced with RNAseq data. I am using GATK version 4.1.0.0.

Runtime ranges from 1-5 hours. However, some of the intervals are now running for > 8 hours. When I look in the log output

00:27:41.070 WARN  GATKAnnotationPluginDescriptor - Redundant enabled annotation group (StandardAnnotation) is enabled for this tool by default
00:27:41.122 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/data/umcg-ndeklein/apps/software/GATK/4.1.0.0-foss-2015b-Python-3.6.3/gatk-package-4.1.0.0-local.jar!/com/intel/gkl/n
Dec 06, 2019 12:27:42 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
WARNING: Failed to detect whether we are running on Google Compute Engine.
shaded.cloud_nio.com.google.api.client.http.HttpResponseException: 404 Not Found
00:27:43.167 INFO  GenotypeGVCFs - ------------------------------------------------------------
00:27:43.167 INFO  GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.1.0.0
00:27:43.167 INFO  GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
00:27:43.168 INFO  GenotypeGVCFs - Executing as [email protected] on Linux v3.10.0-1062.4.1.el7.x86_64 amd64
00:27:43.168 INFO  GenotypeGVCFs - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_45-b14
00:27:43.168 INFO  GenotypeGVCFs - Start Date/Time: December 6, 2019 12:27:41 AM GMT-05:00
00:27:43.168 INFO  GenotypeGVCFs - ------------------------------------------------------------
00:27:43.168 INFO  GenotypeGVCFs - ------------------------------------------------------------
00:27:43.169 INFO  GenotypeGVCFs - HTSJDK Version: 2.18.2
00:27:43.169 INFO  GenotypeGVCFs - Picard Version: 2.18.25
00:27:43.169 INFO  GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
00:27:43.169 INFO  GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
00:27:43.169 INFO  GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
00:27:43.169 INFO  GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
00:27:43.169 INFO  GenotypeGVCFs - Deflater: IntelDeflater
00:27:43.169 INFO  GenotypeGVCFs - Inflater: IntelInflater
00:27:43.169 INFO  GenotypeGVCFs - GCS max retries/reopens: 20
00:27:43.169 INFO  GenotypeGVCFs - Requester pays: disabled
00:27:43.169 INFO  GenotypeGVCFs - Initializing engine
00:27:43.620 INFO  FeatureManager - Using codec VCFCodec to read file file:///data/umcg-ndeklein/apps/data/ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/All_20180418.biallelicSNP_only.with_chr.vcf.gz
WARNING: No valid combination operation found for INFO field DB - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field DB - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
00:28:06.355 INFO  IntervalArgumentCollection - Processing 154451 bp from intervals
00:28:06.362 WARN  IndexUtils - Feature file "/data/umcg-ndeklein/apps/data/ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/All_20180418.biallelicSNP_only.with_chr.vcf.gz" appears to contain no sequence di
00:28:06.582 INFO  GenotypeGVCFs - Done initializing engine
00:28:06.734 INFO  ProgressMeter - Starting traversal
00:28:06.735 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute
WARNING: No valid combination operation found for INFO field DB - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records

It has not started with any progress. The size of the interval is not that large (chr1:118432037-118569219), so I would have expected this to go faster. Is there something in the settings that I am missing? I run it with:

gatk --java-options "-Xmx20G" GenotypeGVCFs \
    --reference GRCh38.primary_assembly.genome.fa \
    --dbsnp All_20180418.biallelicSNP_only.with_chr.vcf.gz \
    --output chr1_118432037_118569219.gg.vcf.gz \
    --variant gendb:///genomicDB//chr1 \
    --stand-call-conf 10.0 \
    -L chr1:118432037-118569219 \
    -G StandardAnnotation

Thanks for your help,
Niek

Answers

Sign In or Register to comment.