We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

GATK v4.1.3.0: Error with SelectVariants

bsmith030465bsmith030465 caMember
edited November 2019 in Ask the GATK team

Hi,

I was trying to make a hg38 biallelic set (to be used later for getPileupSummaries). I downloaded the gnomad hg38 vcf file from:
https://console.cloud.google.com/storage/browser/gatk-best-practices/somatic-hg38?project=broad-dsde-outreach&pli=1

My command to get only biallelic variants is:
gatk SelectVariants \ -R Homo_sapiens_assembly38.fasta \ -V somatic-hg38_af-only-gnomad.hg38.vcf \ -O somatic-hg38_af-only-gnomad.hg38.BIALLELIC.vcf \ --restrict-alleles-to BIALLELIC

However, I get an exception:
GATKException: Error initializing feature reader for path somatic-hg38_af-only-gnomad.hg38.vcf

What am I doing wrong?

thanks!

Complete terminal output is:

INFO: Failed to detect whether we are running on Google Compute Engine.
7 10:44:11.212 INFO SelectVariants - ------------------------------------------------------------
8 10:44:11.213 INFO SelectVariants - The Genome Analysis Toolkit (GATK) v4.1.3.0
9 10:44:11.214 INFO SelectVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
10 10:44:11.215 INFO SelectVariants - Executing as [email protected] on Linux v3.10.0-1062.1.1.el7.x86_64 amd64
11 10:44:11.216 INFO SelectVariants - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_111-b14
12 10:44:11.216 INFO SelectVariants - Start Date/Time: October 31, 2019 10:44:07 AM EDT
13 10:44:11.217 INFO SelectVariants - ------------------------------------------------------------
14 10:44:11.218 INFO SelectVariants - ------------------------------------------------------------
15 10:44:11.224 INFO SelectVariants - HTSJDK Version: 2.20.1
16 10:44:11.225 INFO SelectVariants - Picard Version: 2.20.5
17 10:44:11.226 INFO SelectVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
18 10:44:11.226 INFO SelectVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
19 10:44:11.227 INFO SelectVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
20 10:44:11.228 INFO SelectVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
21 10:44:11.229 INFO SelectVariants - Deflater: IntelDeflater
22 10:44:11.229 INFO SelectVariants - Inflater: IntelInflater
23 10:44:11.230 INFO SelectVariants - GCS max retries/reopens: 20
24 10:44:11.230 INFO SelectVariants - Requester pays: disabled
"gatk_biallelic.sh.e8119" 56L, 5501C 1,1 Top
eicc:[gatk] % vi gatk_biallelic.sh.e8118
eicc:[gatk] % vi gatk_biallelic.sh
10:44:11.216 INFO SelectVariants - Start Date/Time: October 31, 2019 10:44:07 AM EDT
10:44:11.217 INFO SelectVariants - ------------------------------------------------------------
10:44:11.218 INFO SelectVariants - ------------------------------------------------------------
10:44:11.224 INFO SelectVariants - HTSJDK Version: 2.20.1
10:44:11.225 INFO SelectVariants - Picard Version: 2.20.5
10:44:11.226 INFO SelectVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
10:44:11.226 INFO SelectVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
10:44:11.227 INFO SelectVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
10:44:11.228 INFO SelectVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
10:44:11.229 INFO SelectVariants - Deflater: IntelDeflater
10:44:11.229 INFO SelectVariants - Inflater: IntelInflater
10:44:11.230 INFO SelectVariants - GCS max retries/reopens: 20
10:44:11.230 INFO SelectVariants - Requester pays: disabled
10:44:11.231 INFO SelectVariants - Initializing engine
10:44:13.791 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/kpx/v1/home/data/references/gatk-4.1.3.0-bundle/somatic-hg38_af-only-gnomad.hg38.vcf
10:44:13.816 INFO SelectVariants - Shutting down engine
[October 31, 2019 10:44:13 AM EDT] org.broadinstitute.hellbender.tools.walkers.variantutils.SelectVariants done. Elapsed time: 0.10 minutes.
Runtime.totalMemory()=2366636032
org.broadinstitute.hellbender.exceptions.GATKException: Error initializing feature reader for path somatic-hg38_af-only-gnomad.hg38.vcf
at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:375)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:327)
at org.broadinstitute.hellbender.engine.FeatureDataSource.(FeatureDataSource.java:277)
at org.broadinstitute.hellbender.engine.VariantWalker.initializeDrivingVariants(VariantWalker.java:58)
at org.broadinstitute.hellbender.engine.VariantWalkerBase.initializeFeatures(VariantWalkerBase.java:67)
at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:706)
:set nonumber 34,1-4 35%
10:44:11.216 INFO SelectVariants - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_111-b14

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @bsmith030465

    Please send the complete stacktrace. We are looking for the keyword "cause by" in the stack trace. That will help us narrow down the cause of this issue.

  • bsmith030465bsmith030465 caMember
    edited November 2019

    It seems like it's complaining about a header for the input file that was downloaded from the gatk best practices link (pl. see original post for link). Is there a more upto date file available? Or what would you suggest? Thanks!!

    Here is the complete trace:

    ============
    Using GATK jar /mnt/beegfs/v1/kpx/sw/kpx/Pkgs/GATK/4.1.3.0/gatk-package-4.1.3.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /mnt/beegfs/v1/kpx/sw/kpx/Pkgs/GATK/4.1.3.0/gatk-package-4.1.3.0-local.jar SelectVariants -R /home/kpx/data/references/gatk-4.1.3.0-bundle/Homo_sapiens_assembly38.fasta -V /home/kpx/data/references/gatk-4.1.3.0-bundle/somatic-hg38_af-only-gnomad.hg38.vcf -O /home/kpx/data/references/gatk-4.1.3.0-bundle/somatic-hg38_af-only-gnomad.hg38.BIALLELIC.vcf --restrict-alleles-to BIALLELIC
    08:10:04.251 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/beegfs/v1/kpx/sw/kpx/Pkgs/GATK/4.1.3.0/gatk-package-4.1.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Nov 05, 2019 8:10:07 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    08:10:07.412 INFO SelectVariants - ------------------------------------------------------------
    08:10:07.414 INFO SelectVariants - The Genome Analysis Toolkit (GATK) v4.1.3.0
    08:10:07.415 INFO SelectVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
    08:10:07.417 INFO SelectVariants - Executing as [email protected] on Linux v3.10.0-1062.1.1.el7.x86_64 amd64
    08:10:07.419 INFO SelectVariants - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_111-b14
    08:10:07.420 INFO SelectVariants - Start Date/Time: November 5, 2019 8:10:03 AM EST
    08:10:07.421 INFO SelectVariants - ------------------------------------------------------------
    08:10:07.422 INFO SelectVariants - ------------------------------------------------------------
    08:10:07.430 INFO SelectVariants - HTSJDK Version: 2.20.1
    08:10:07.431 INFO SelectVariants - Picard Version: 2.20.5
    08:10:07.432 INFO SelectVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    08:10:07.433 INFO SelectVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    08:10:07.434 INFO SelectVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    08:10:07.435 INFO SelectVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    08:10:07.437 INFO SelectVariants - Deflater: IntelDeflater
    08:10:07.438 INFO SelectVariants - Inflater: IntelInflater
    08:10:07.439 INFO SelectVariants - GCS max retries/reopens: 20
    08:10:07.440 INFO SelectVariants - Requester pays: disabled
    08:10:07.441 INFO SelectVariants - Initializing engine
    08:10:10.518 INFO FeatureManager - Using codec VCFCodec to read file file:///home/kpx/data/references/gatk-4.1.3.0-bundle/somatic-hg38_af-only-gnomad.hg38.vcf
    08:10:10.565 INFO SelectVariants - Shutting down engine
    [November 5, 2019 8:10:10 AM EST] org.broadinstitute.hellbender.tools.walkers.variantutils.SelectVariants done. Elapsed time: 0.11 minutes.
    Runtime.totalMemory()=2130706432
    org.broadinstitute.hellbender.exceptions.GATKException: Error initializing feature reader for path /home/kpx/data/references/gatk-4.1.3.0-bundle/somatic-hg38_af-only-gnomad.hg38.vcf
    at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:375)
    at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:327)
    at org.broadinstitute.hellbender.engine.FeatureDataSource.(FeatureDataSource.java:277)
    at org.broadinstitute.hellbender.engine.VariantWalker.initializeDrivingVariants(VariantWalker.java:58)
    at org.broadinstitute.hellbender.engine.VariantWalkerBase.initializeFeatures(VariantWalkerBase.java:67)
    at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:706)
    at org.broadinstitute.hellbender.engine.VariantWalker.onStartup(VariantWalker.java:45)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:137)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
    at org.broadinstitute.hellbender.Main.main(Main.java:291)
    Caused by: htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file, for input source: /home/kpx/data/references/gatk-4.1.3.0-bundle/somatic-hg38_af-only-gnomad.hg38.vcf
    at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:263)
    at htsjdk.tribble.TribbleIndexedFeatureReader.(TribbleIndexedFeatureReader.java:102)
    at htsjdk.tribble.TribbleIndexedFeatureReader.(TribbleIndexedFeatureReader.java:127)
    at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:121)
    at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:372)
    ... 12 more
    Caused by: htsjdk.tribble.TribbleException$InvalidHeader: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file
    at htsjdk.variant.vcf.VCFCodec.readActualHeader(VCFCodec.java:115)
    at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:79)
    at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:37)
    at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:261)
    ... 16 more

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @bsmith030465

    We looked into this and the VCF is malformed or truncated in some way. The underlying error is that the "#CHROM" line is missing. This line should always be the last header line in a VCF, and looks like this: "#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE1 SAMPLE2 ..."

  • bsmith030465bsmith030465 caMember
    edited November 2019

    Thanks! However, the vcf file was downloaded from GATK best practices website (I downloaded the gnomad hg38 vcf file from:
    https://console.cloud.google.com/storage/browser/gatk-best-practices/somatic-hg38?project=broad-dsde-outreach&pli=1)

    If this vcf is no longer current, could you please direct me to the correct one?

    thanks!!

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited November 2019

    HI @bsmith030465

    I just checked this file and it does have the #CHROM line in the header.
    grep CHROM af-only-gnomad.hg38.vcf
    #CHROM POS ID REF ALT QUAL FILTER INFO

    Maybe it was truncated during the download. Please redo the download.

Sign In or Register to comment.