Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Annotation problem: not all variants are taken into account

Hello,

I use GATK version 4.1 to annotate a vcf with the following command :

java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx5g -jar gatk-package-4.1.1.0-local.jar VariantAnnotator -R /sandbox/resources/species/human/ensembl/release-75/Homo_sapiens.GRCh37.75.dna.toplevel.fa -V GQPDOMB-stats.vcf -O rsID_GQPDOMB.vcf --dbsnp
/sandbox/resources/species/human/ensembl/release-75/dbSNP_b150_GRCh37_00-All.vcf.gz

However, it only looks for rs IDs for the first 5. How is it possible to solve this problem so that it can find the rs IDs of the entire file?

Thanking you for your help

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @amandineld

    Would you please post the first 10 lines of your file.

  • amandineldamandineld Member
    I apologize for this late answer, for the little problems that prevented me from coming back to you.


    This is the exit of the terminal :

    21:51:08.917 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/sandbox/users/alecerf-defer/Alloscore_work/GQPDOMB/GATK/gatk-package-4.1.1.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Apr 11, 2019 9:51:14 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    21:51:14.679 INFO VariantAnnotator - ------------------------------------------------------------
    21:51:14.680 INFO VariantAnnotator - The Genome Analysis Toolkit (GATK) v4.1.1.0
    21:51:14.680 INFO VariantAnnotator - For support and documentation go to https://software.broadinstitute.org/gatk/
    21:51:14.680 INFO VariantAnnotator - Executing as [email protected] on Linux v3.10.0-957.1.3.el7.x86_64 amd64
    21:51:14.681 INFO VariantAnnotator - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_131-b11
    21:51:14.681 INFO VariantAnnotator - Start Date/Time: April 11, 2019 9:51:07 PM CEST
    21:51:14.681 INFO VariantAnnotator - ------------------------------------------------------------
    21:51:14.681 INFO VariantAnnotator - ------------------------------------------------------------
    21:51:14.682 INFO VariantAnnotator - HTSJDK Version: 2.19.0
    21:51:14.682 INFO VariantAnnotator - Picard Version: 2.19.0
    21:51:14.683 INFO VariantAnnotator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    21:51:14.683 INFO VariantAnnotator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    21:51:14.683 INFO VariantAnnotator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    21:51:14.683 INFO VariantAnnotator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    21:51:14.683 INFO VariantAnnotator - Deflater: IntelDeflater
    21:51:14.683 INFO VariantAnnotator - Inflater: IntelInflater
    21:51:14.683 INFO VariantAnnotator - GCS max retries/reopens: 20
    21:51:14.683 INFO VariantAnnotator - Requester pays: disabled
    21:51:14.684 WARN VariantAnnotator -

    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

    Warning: VariantAnnotator is a BETA tool and is not yet ready for use in production

    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


    21:51:14.684 INFO VariantAnnotator - Initializing engine
    21:51:17.427 INFO FeatureManager - Using codec VCFCodec to read file file:///sandbox/resources/species/human/ensembl/release-75/dbSNP_b150_GRCh37_00-All.vcf.gz
    21:51:18.386 INFO FeatureManager - Using codec VCFCodec to read file file:///sandbox/users/alecerf-defer/Alloscore_work/GQPDOMB/GATK/GQPDOMB-stats.vcf
    21:51:18.429 WARN IndexUtils - Feature file "/sandbox/resources/species/human/ensembl/release-75/dbSNP_b150_GRCh37_00-All.vcf.gz" appears to contain no sequence dictionary. Attempting to retrieve a sequence dictionary from the associated index file
    21:51:18.845 INFO VariantAnnotator - Done initializing engine
    21:51:18.919 INFO ProgressMeter - Starting traversal
    21:51:18.919 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
    21:51:20.434 INFO VariantAnnotator - Shutting down engine
    [April 11, 2019 9:51:20 PM CEST] org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotator done. Elapsed time: 0.22 minutes.
    Runtime.totalMemory()=1185939456
    htsjdk.tribble.TribbleException: The provided VCF file is malformed at approximately line number 67: unparsable vcf record with allele A-GG, for input source: GQPDOMB-stats.vcf
    at htsjdk.variant.vcf.AbstractVCFCodec.generateException(AbstractVCFCodec.java:801)
    at htsjdk.variant.vcf.AbstractVCFCodec.checkAllele(AbstractVCFCodec.java:593)
    at htsjdk.variant.vcf.AbstractVCFCodec.parseSingleAltAllele(AbstractVCFCodec.java:621)
    at htsjdk.variant.vcf.AbstractVCFCodec.parseAlleles(AbstractVCFCodec.java:560)
    at htsjdk.variant.vcf.AbstractVCFCodec.parseVCFLine(AbstractVCFCodec.java:358)
    at htsjdk.variant.vcf.AbstractVCFCodec.decodeLine(AbstractVCFCodec.java:299)
    at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:277)
    at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:64)
    at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:70)
    at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:37)
    at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.readNextRecord(TribbleIndexedFeatureReader.java:373)
    at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.next(TribbleIndexedFeatureReader.java:354)
    at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.next(TribbleIndexedFeatureReader.java:315)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
    at org.broadinstitute.hellbender.engine.VariantWalker.traverse(VariantWalker.java:99)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:984)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
    at org.broadinstitute.hellbender.Main.main(Main.java:291)

    VCF File :
    ##fileformat=VCFv4.1
    ##Goby=development (20150227145140)
    ##INFO=<ID=BIOMART_COORDS,Number=1,Type=String,Description="Coordinates for use with Biomart.">
    ##INFO=<ID=INDEL,Number=1,Type=Flag,Description="Indicates that the variation is an indel.">
    ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
    ##FORMAT=<ID=BC,Number=5,Type=String,Description="Base counts in format A=?;T=?;C=?;G=?;N=?.">
    ##FORMAT=<ID=GB,Number=1,Type=String,Description="Number of bases that pass base filters in this sample, or ignore string.">
    ##FORMAT=<ID=FB,Number=1,Type=String,Description="Number of bases that failed base filters in this sample, or ignore string.">
    ##FORMAT=<ID=Zygosity,Number=1,Type=String,Description="Zygosity">
    ##INFO=<ID=VariantEffectPrediction,Number=1,Type=String,Description="Variant Effect Predictions">
    ##source_20150426.1=vcf-annotate(r797) -a output-21711.tsv.gz -d /scratchLocal/gobyweb/tmp/421024.1.rascals.q/attributes.lst -c CHROM,FROM,TO,INFO/VariantEffectPrediction
    ##INFO=<ID=GENE,Number=1,Type=String,Description="Ensembl gene identifier">
    ##INFO=<ID=GENE_NAME,Number=1,Type=String,Description="Gene name">
    ##source_20150426.2=vcf-annotate(r797) -a /scratchLocal/gobyweb/ARTIFACT_REPOSITORY-dev/artifacts/ENSEMBL_ANNOTATIONS/ANNOTATIONS/2.1.3/HOMO_SAPIENS/1000GENOMES/37/ref-start-end-gene-hgnc-sorted.tsv.gz -d /scratchLocal/gobyweb/tmp/421024.1.rascals.q/attributes.lst -c CHROM,FROM,TO,INFO/GENE,INFO/GENE_NAME
    #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT AERMQNK-paris-8400326-P6-recipient BEGHUSZ-paris-CIN130255-P5-donor BNAGPIP-paris-SCE240246-P1-donor DKFEYNF-paris-9402755-P24-recipient DUWPYOX-paris-LAP201156-P2-donor ENERRIA-paris-7406853-P13-donor FSCDYAD-paris-08404250-P18-recipient GZISOJR-paris-7400781-P16-donor HDXJQEQ-paris-BOU231086-P8-recipient HPPXKJR-paris-DAS070568-P11-recipient IBMRISR-paris-1012026425-P14-donor JBQRXUU-paris-BAL040960-P4-donor JGYNLLJ-paris-9406986-P22-donor JOJLUPS-paris-CIN020578-P5-recipient KXQMHUU-paris-PAQ030369-P10-donor MFQDFBY-paris-6405545-P19-donor MMGEWHT-paris-7400697-P16-recipient MVDUSTB-paris-BAR131257-P10-recipient MVTISRC-paris-761011046144-P14-recipient NEYXWDZ-paris-RIZ120484-P9-recipient NFAMZIJ-paris-09406539-P22-recipient OENIGXI-paris-6405485-P17-donor ONDAMJK-paris-SCE260544-P1-recipient PAIFUYN-paris-JEA110683-P7-donor QGGDOUC-paris-GIL090756-P3-donor QNIODCU-paris-06405141-P19-recipient RGFUJSJ-paris-7405431-P15-donor RPUJEMI-paris-JEA050373-P7-recipient RZBCDKZ-paris-06400985-P15-recipient SNDXLDU-paris-06404984-P17-recipient TRPLSFT-paris-7401924-P13-recipient UFVLBVS-paris-GIL160150-P3-recipient VOCSMLM-paris-DEL240879-P2-recipient WKCQCMA-paris-9402754-P24-donor WKXMWKI-paris-8404252-P18-donor XUPTJWH-paris-RIZ310548-P9-donor YPOYSQD-paris-DAC250146-P11-donor YYMSZHP-paris-DEM300679-P6-donor ZCRWXBD-paris-BOU280575-P8-donor ZGAQHAH-paris-BAL251159-P4-recipient
    1 721427 . C . . . BIOMART_COORDS=1:721427:721427;GENE=ENSG00000230021,ENSG00000237491 GT:BC:GB:FB:Zygosity 0/0:C=187:187:0:homozygous 0/0:C=202:202:2:homozygous 0/0:C=157:157:0:homozygous 0/0:C=153:153:0:homozygous 0/0:C=196:196:0:homozygous 0/0:C=111:111:0:homozygous 0/0:C=176:176:0:homozygous 0/0:C=73:73:0:homozygous 0/0:C=195:195:0:homozygous 0/0:C=199:199:0:homozygous 0/0:C=78:78:0:homozygous 0/0:C=243:243:0:homozygous 0/0:C=65:65:0:homozygous 0/0:C=189:189:0:homozygous 0/0:C=166:166:0:homozygous 0/0:C=245:245:2:homozygous 0/0:C=60:60:0:homozygous 0/0:C=224:224:0:homozygous 0/0:C=84:84:0:homozygous 0/0:C=138:138:0:homozygous 0/0:C=156:156:0:homozygous 0/0:C=84:84:0:homozygous 0/0:C=211:211:0:homozygous 0/0:C=234:234:0:homozygous 0/0:C=247:247:0:homozygous 0/0:C=187:187:0:homozygous 0/0:C=124:124:0:homozygous 0/0:C=201:201:0:homozygous 0/0:C=155:155:0:homozygous 0/0:C=168:168:0:homozygous 0/0:C=99:99:0:homozygous 0/0:C=179:179:0:homozygous 0/0:C=209:209:2:homozygous 0/0:C=91:91:0:homozygous 0/0:C=189:189:0:homozygous 0/0:C=206:206:0:homozygous 0/0:C=167:167:2:homozygous 0/0:C=159:159:0:homozygous 0/0:C=220:220:0:homozygous 0/0:C=222:222:0:homozygous
    1 721428 . G . . . BIOMART_COORDS=1:721428:721428;GENE=ENSG00000230021,ENSG00000237491 GT:BC:GB:FB:Zygosity 0/0:G=187:187:0:homozygous 0/0:G=204:204:0:homozygous 0/0:G=157:157:0:homozygous 0/0:G=153:153:0:homozygous 0/0:G=196:196:0:homozygous 0/0:G=111:111:0:homozygous 0/0:G=177:177:0:homozygous 0/0:G=73:73:0:homozygous 0/0:G=195:195:0:homozygous 0/0:G=199:199:0:homozygous 0/0:G=78:78:0:homozygous 0/0:G=243:243:0:homozygous 0/0:G=65:65:0:homozygous 0/0:G=189:189:0:homozygous 0/0:G=164:164:2:homozygous 0/0:G=247:247:0:homozygous 0/0:G=60:60:0:homozygous 0/0:G=224:224:0:homozygous 0/0:G=84:84:0:homozygous 0/0:G=138:138:0:homozygous 0/0:G=156:156:0:homozygous 0/0:G=84:84:0:homozygous 0/0:G=209:209:2:homozygous 0/0:G=235:235:0:homozygous 0/0:G=248:248:0:homozygous 0/0:G=187:187:0:homozygous 0/0:G=124:124:0:homozygous 0/0:G=201:201:0:homozygous 0/0:G=155:155:0:homozygous 0/0:G=168:168:0:homozygous 0/0:G=99:99:0:homozygous 0/0:G=177:177:2:homozygous 0/0:G=211:211:0:homozygous 0/0:G=91:91:0:homozygous 0/0:G=189:189:0:homozygous 0/0:G=205:205:1:homozygous 0/0:G=169:169:0:homozygous 0/0:G=159:159:0:homozygous 0/0:G=219:219:1:homozygous 0/0:G=222:222:0:homozygous


    I also have a problem because the softwaere says tha the #CHROM POS ID line don't have enough column for the file but when I count, there is 49 column every where.
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
Sign In or Register to comment.