GATK selectVariants on vcf

I'm using GATK (v3.3.) SelectVariants on the .vcf file of the ExAc data (downloaded from ftp://ftp.broadinstitute.org/pub/ExAC_release/release0.3.1/).

I get the following error

java -Xmx45g -XX:+AggressiveOpts -jar ~ngs/gatk/GenomeAnalysisTK.jar -T SelectVariants -R /home/ngs/data/tools/gatk/hg/broad_bundle_hg19_v2.5/ucsc.hg19.fasta --variant ExAC.r0.3.1.sites.vep.vcf -o exac_positions.vcf -L positions.bed

INFO 16:52:13,449 HelpFormatter - --------------------------------------------------------------------------------
INFO 16:52:13,451 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.3-0-g37228af, Compiled 2014/10/24 01:07:22
INFO 16:52:13,452 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 16:52:13,452 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 16:52:13,457 HelpFormatter - Program Args: -T SelectVariants -R /home/ngs/data/tools/gatk/hg/broad_bundle_hg19_v2.5/ucsc.hg19.fasta --variant ExAC.r0.3.1.sites.vep.vcf -o exac_positions.vcf -L positions.bed
INFO 16:52:13,462 HelpFormatter - Executing as ngs@bngs05b on Linux 3.4.33-2.24-desktop amd64; OpenJDK 64-Bit Server VM 1.7.0_45-b31.
INFO 16:52:13,462 HelpFormatter - Date/Time: 2016/08/17 16:52:13
INFO 16:52:13,463 HelpFormatter - --------------------------------------------------------------------------------
INFO 16:52:13,463 HelpFormatter - --------------------------------------------------------------------------------
INFO 16:52:13,526 GenomeAnalysisEngine - Strictness is SILENT
INFO 16:52:16,120 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.NumberFormatException: For input string: "R"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:492)
at java.lang.Integer.valueOf(Integer.java:582)
at htsjdk.variant.vcf.VCFCompoundHeaderLine.(VCFCompoundHeaderLine.java:171)
at htsjdk.variant.vcf.VCFInfoHeaderLine.(VCFInfoHeaderLine.java:46)
at htsjdk.variant.vcf.AbstractVCFCodec.parseHeaderFromLines(AbstractVCFCodec.java:205)
at htsjdk.variant.vcf.VCFCodec.readActualHeader(VCFCodec.java:111)
at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:88)
at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:41)
at htsjdk.tribble.index.IndexFactory$FeatureIterator.readHeader(IndexFactory.java:413)
at htsjdk.tribble.index.IndexFactory$FeatureIterator.(IndexFactory.java:401)
at htsjdk.tribble.index.IndexFactory.createDynamicIndex(IndexFactory.java:312)
at org.broadinstitute.gatk.engine.refdata.tracks.RMDTrackBuilder.createIndexInMemory(RMDTrackBuilder.java:402)
at org.broadinstitute.gatk.engine.refdata.tracks.RMDTrackBuilder.loadIndex(RMDTrackBuilder.java:288)
at org.broadinstitute.gatk.engine.refdata.tracks.RMDTrackBuilder.getFeatureSource(RMDTrackBuilder.java:225)
at org.broadinstitute.gatk.engine.refdata.tracks.RMDTrackBuilder.createInstanceOfTrack(RMDTrackBuilder.java:148)
at org.broadinstitute.gatk.engine.datasources.rmd.ReferenceOrderedQueryDataPool.(ReferenceOrderedDataSource.java:208)
at org.broadinstitute.gatk.engine.datasources.rmd.ReferenceOrderedDataSource.(ReferenceOrderedDataSource.java:88)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.getReferenceOrderedDataSources(GenomeAnalysisEngine.java:997)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.initializeDataSources(GenomeAnalysisEngine.java:779)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:290)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:107)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.3-0-g37228af):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: For input string: "R"
ERROR ------------------------------------------------------------------------------------------

Is this a problem with the Exac .vcf file? There are off course a lot of 'R' in that .vcf, so I don't know on which line I should look ...

Tagged:

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    Can you please check if this still happens with the latest version of GATK? I think this might be due to a bug that has been fixed.
  • I tested it with with GATK3.4, that we had already installed. The error disappeared, but the program is not advancing. It's stuck at:
    ...
    INFO 16:52:13,526 GenomeAnalysisEngine - Strictness is SILENT
    INFO 16:52:16,120 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000

    It stays like that already for more than 12hours, which I suppose is not normal.

    I've tried to use GATK3.6, but there I immediately get an error, which I guess has something to do with our java version (which is 1.7)?
    java -Xmx45g -XX:+AggressiveOpts -jar /home/ngs/installed/gatk/GenomeAnalysisTK-3.6/GenomeAnalysisTK.jar

    Exception in thread "main" java.lang.UnsupportedClassVersionError: org/broadinstitute/gatk/engine/CommandLineGATK : Unsupported major.minor version 52.0
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(Unknown Source)
    at java.security.SecureClassLoader.defineClass(Unknown Source)
    at java.net.URLClassLoader.defineClass(Unknown Source)
    at java.net.URLClassLoader.access$100(Unknown Source)
    at java.net.URLClassLoader$1.run(Unknown Source)
    at java.net.URLClassLoader$1.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at sun.launcher.LauncherHelper.checkAndLoadMain(Unknown Source)

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @ddaneels
    Hi,

    You need to have Java 1.8 to use GATK version 3.6. Have a look at this article which may help you switch between versions.

    -Sheila

  • We seem to be getting somewhere. I was able to install version 3.6. Now I get the error

    MESSAGE: Key AC_Adj0_Filter found in VariantContext field FILTER at chr1:1375207 but this key isn't defined in the VCFHeader. We require all VCFs to have complete VCF headers by default.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @ddaneels
    Hi,

    Hmm. It looks like an issue with the ExAC VCF. I think if you add this: ##FILTER=<ID=AC_Adj0_Filter,Description="AC_Adj == 0"> to your VCF header, you should be all set :smile:

    -Sheila

Sign In or Register to comment.