If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Duplicate field error in GenomicsDBImport 18.104.22.168
I've got a question about an error generated using GenomicsDBImport.
The gVCF I'm trying to import has three offending duplicate field names (BR, MQ & QD). Checking the header, I notice that each name is duplicated in both the INFO and FILTER fields (see below). I also notice other non-offending names (DP for example) that are duplicated in INFO and in FORMAT fields but which don't seem to cause any bother.
My question is why are duplicate names allowed in INFO & FORMAT fields but not in FILTER?
Secondly and more importantly is there some way (other than to rejig all my headers & data) to tell GenomicsDBImport that the duplicate names belong to different fields when creating the vid attributes file, or to maybe switch off the check?
=== offending field ===
INFO=<ID=QD,Number=A,Type=Float,Description="Ratio of phred-scaled posterior probability (PP) to number of supporting reads for each allele (VC).">
FILTER=<ID=QD,Description="Quality over Depth: Indicates low quality relative to number of supporting reads (any of INFO::QD < 15 for Indels or INFO::QD < 15 otherwise).">
INFO=<ID=BR,Number=A,Type=Float,Description="The median of the per-read min base quality (within a interval of the locus) taken over reads supporting each allele.">
FILTER=<ID=BR,Description="Bad Reads: Indicates low quality base pairs on reads in the vicinity of variant locus (any of INFO::BR < 15).">
=== non-offending field
INFO=<ID=DP,Number=1,Type=Integer,Description="Total depth of read coverage at this locus.">
FORMAT=<ID=DP,Number=1,Type=Integer,Description="Number of reads overlapping the variant site (i.e. INFO::DP split out by sample). For reference calls the average depth (rounded to the nearest integer) over the region is reported.">