This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Filtering individual calls using CombineVariants
I was wondering if there is a method for filtering individual genotype calls when using CombineVariants to merge single-called VCF files. The desired behavior that I would like would be a hybrid between the KEEP_IF_ANY_UNFILTERED and KEEP_IF_ALL_UNFILTERED arguments to the -filteredRecordsMergeType. By this, I mean that any site that is unfiltered in any input will remain unfiltered in the output, but for any genotype call from a filtered input should have a filter annotation in the "FT" field of the genotype. I will show a simplified example below (extraneous columns removed from the sample files):
#CHROM POS ID (...) FILTER FORMAT SAMPLE1 1 11916764 rs79387574 (...) PASS GT:DP 0/0:45
#CHROM POS ID (...) FILTER FORMAT SAMPLE2 1 11916764 rs79387574 (...) LowQ GT:DP 0/1:3
#CHROM POS ID (...) FILTER FORMAT SAMPLE1 SAMPLE2 1 11916764 rs79387574 (...) PASS GT:DP:FT 0/0:45:PASS 0/1:3:LowQ
The reason for requesting this is there is occasionally a single sample that may have had a bad call at a site. Using the "KEEP_IF_ALL_UNFILTERED" filters N-1 high quality calls. However, on the other extreme, if we use "KEEP_IF_ANY_UNFILTERED" and only a single sample passes the filters, we introduce N-1 low quality calls and assert that they pass our requisite filters. The requested hybrid method will keep all information from the input samples and allow for better granularity.