Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

GATK 4.1.1.0 GenomicsDBImport error : Duplicate fields exist in vid attribute "fields" and 2 errors

WontonWonton MacauMember
Hello GATK team!
I am currently using Mutect2 & FilterMutectCalls & GenomicsDBImport for somatic calling. In the steps of Mutect2 & FilterMutectCalls, I got samples' gVCF fine. However, I want to use GenomicsDBImport to combine all gVCF and this step is not working with several errors and I am running out of ideas. Thank you.
GATK4.1.1.0
1: Mutect2 .bam to .g.vcf, seems ok and generate .vcf, .vcf.idx and .vcf.stats
gatk Mutect2 --reference .../hg19.fa --input ....bam --output ...g.vcf -ERC GVCF --tmp-dir ...
2: FilterMutectCalls .g.vcf to .g.vcf, seems ok and generate .vcf, .vcf.idx and .vcf.filteringStats.tsv
gatk FilterMutectCalls --reference .../hg19.fa --variant ...g.vcf --intervals ...hg19.bed --output ...g.vcf --tmp-dir ...
3. When combination:
gatk GenomicsDBImport --reference .../hg19.fa --sample-name-map ${sample_mapFile} --validate-sample-name-map true --intervals ...hg19.bed --genomicsdb-workspace-path ... --max-num-intervals-to-import-in-parallel 20 --consolidate true --batch-size 100 --merge-input-intervals true --tmp-dir ...
This commend is same and ok at germline pipline. VCF files can also be read now. But got this error:
Duplicate field name TLOD found in vid attribute "fields"
Duplicate field name TLOD found in vid attribute "fields"
terminate called after throwing an instance of 'FileBasedVidMapperException'
terminate called recursively
what(): FileBasedVidMapperException : Duplicate fields exist in vid attribute "fields"
4. I deleted this line and re-run:
##INFO=<ID=TLOD,Number=A,Type=Float,Description="Log odds ratio score for variant">
Then got this error:
htsjdk.tribble.TribbleException: The provided VCF file is malformed at approximately line number 171: . is not a valid start position in the VCF format, for input source: file:///home/yb87626/breast/variantCalling/SRR8437498.postM2.g.vcf
at htsjdk.variant.vcf.AbstractVCFCodec.generateException(AbstractVCFCodec.java:797)
at htsjdk.variant.vcf.AbstractVCFCodec.parseVCFLine(AbstractVCFCodec.java:324)
...
5. I deleted this line and re-run:
##tumor_sample=SAMN10735600
Then got this error:
[July 3, 2019 7:10:52 AM UTC] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.06 minutes.
Runtime.totalMemory()=1243086848
htsjdk.tribble.TribbleException: Line 169: there aren't enough columns for line END=17447;STRANDQ=93 GT:DP:MIN_DP:TLOD 0/0:1:1:-4.765e-01 (we expected 9 tokens, and saw 3 ), for input source: file:///home/yb87626/breast/variantCalling/SRR8437498.postM2.g.vcf
at htsjdk.variant.vcf.AbstractVCFCodec.decodeLine(AbstractVCFCodec.java:296)
at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:277)
...
You can see the program recognize 'END=17447;STRANDQ=93 GT:DP:MIN_DP:TLOD 0/0:1:1:-4.765e-01' as a line, but there are columns before them in the same line. The problem may not be this line, because same problem happens at the next line when I delete this line.
Now, I don't know how to solve it. And did I do right before 4 and 5?
Part of .g.vcf:
##fileformat=VCFv4.2
...
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMN10735589
chr1 1 . N <NON_REF> . PASS END=17405;STRANDQ=93 GT:DP:MIN_DP:TLOD 0/0:0:0:0.00

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited July 11

    Hi @Wonton

    Please try and run ValidateVariants on your gvcf files to determine if there is an issue with the vcf format.

  • WontonWonton MacauMember
    Hi bhanuGandham,
    Thank you for your help. No file or report was generated after I run ValidateVariants on the gvcf file. Is that means the gvcf file is ok?
    Here is the command:
    gatk ValidateVariants --reference .../hg19.fa --variant .../SRR8437498.postM2.raw.g.vcf --tmp-dir ... 1> .../report.txt 2>.../gatk.err
    Here is the "gatk.err":
    Using GATK jar /opt/conda/share/gatk4-4.1.1.0-0/gatk-package-4.1.1.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools...
    16:51:13.393 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/opt/conda/share/gatk4-4.1.1.0-0/gatk-package-4.1.1.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Jul 11, 2019 4:51:15 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    16:51:15.656 INFO ValidateVariants - ------------------------------------------------------------
    16:51:15.656 INFO ValidateVariants - The Genome Analysis Toolkit (GATK) v4.1.1.0
    16:51:15.656 INFO ValidateVariants - For support and documentation go to
    ...
    16:51:15.657 INFO ValidateVariants - Executing as [email protected] on Linux v3.10.0-693.5.2.el7.x86_64 amd64
    16:51:15.658 INFO ValidateVariants - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_152-release-1056-b12
    16:51:15.658 INFO ValidateVariants - Start Date/Time: July 11, 2019 4:51:13 PM UTC
    16:51:15.658 INFO ValidateVariants - ------------------------------------------------------------
    16:51:15.658 INFO ValidateVariants - ------------------------------------------------------------
    16:51:15.658 INFO ValidateVariants - HTSJDK Version: 2.19.0
    16:51:15.658 INFO ValidateVariants - Picard Version: 2.19.0
    16:51:15.658 INFO ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    16:51:15.658 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    16:51:15.658 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    16:51:15.658 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    16:51:15.658 INFO ValidateVariants - Deflater: IntelDeflater
    16:51:15.658 INFO ValidateVariants - Inflater: IntelInflater
    16:51:15.658 INFO ValidateVariants - GCS max retries/reopens: 20
    16:51:15.658 INFO ValidateVariants - Requester pays: disabled
    16:51:15.658 INFO ValidateVariants - Initializing engine
    16:51:15.857 INFO FeatureManager - Using codec VCFCodec to read file file:///home/yb87626/validate/SRR8437498.postM2.raw.g.vcf
    16:51:15.888 INFO ValidateVariants - Done initializing engine
    16:51:15.888 INFO ProgressMeter - Starting traversal
    16:51:15.888 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
    16:51:22.030 INFO ProgressMeter - chrUn_gl000219:99683 0.1 953167 9311302.5
    16:51:22.030 INFO ProgressMeter - Traversal complete. Processed 953167 total variants in 0.1 minutes.
    16:51:22.030 INFO ValidateVariants - Shutting down engine
    [July 11, 2019 4:51:22 PM UTC] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 0.14 minutes.
    Runtime.totalMemory()=1869086720
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @Wonton

    We recommend and support using gVCFs generated from Mutect2 only for Mitochondrial DNA.

  • WontonWonton MacauMember
    Hi bhanuGandham,
    I understand. Thank you for your reply.
  • gauthiergauthier Member, Broadie, Dev ✭✭✭

    Hi @Wonton ,

    After you deleted the problematic header line did you reindex the VCF? I think if you run gatk IndexFeatureFile -F <vcfFilePath> and try again the parsing error will go away. We don't support manual editing of VCFs, but I've done it myself and I believe I've resolved the same problem by reindexing.

    -Laura

  • WontonWonton MacauMember
    Hi Laura,
    I understand your suggestion. I don't like to manually edit them too. Thank you very much for your reply.
Sign In or Register to comment.