This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Duplicate field error in GenomicsDBImport 126.96.36.199
I've got a question about an error generated using GenomicsDBImport.
The gVCF I'm trying to import has three offending duplicate field names (BR, MQ & QD). Checking the header, I notice that each name is duplicated in both the INFO and FILTER fields (see below). I also notice other non-offending names (DP for example) that are duplicated in INFO and in FORMAT fields but which don't seem to cause any bother.
My question is why are duplicate names allowed in INFO & FORMAT fields but not in FILTER?
Secondly and more importantly is there some way (other than to rejig all my headers & data) to tell GenomicsDBImport that the duplicate names belong to different fields when creating the vid attributes file, or to maybe switch off the check?
=== offending field ===
INFO=<ID=QD,Number=A,Type=Float,Description="Ratio of phred-scaled posterior probability (PP) to number of supporting reads for each allele (VC).">
FILTER=<ID=QD,Description="Quality over Depth: Indicates low quality relative to number of supporting reads (any of INFO::QD < 15 for Indels or INFO::QD < 15 otherwise).">
INFO=<ID=BR,Number=A,Type=Float,Description="The median of the per-read min base quality (within a interval of the locus) taken over reads supporting each allele.">
FILTER=<ID=BR,Description="Bad Reads: Indicates low quality base pairs on reads in the vicinity of variant locus (any of INFO::BR < 15).">
=== non-offending field
INFO=<ID=DP,Number=1,Type=Integer,Description="Total depth of read coverage at this locus.">
FORMAT=<ID=DP,Number=1,Type=Integer,Description="Number of reads overlapping the variant site (i.e. INFO::DP split out by sample). For reference calls the average depth (rounded to the nearest integer) over the region is reported.">