Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Filtering individual calls using CombineVariants
I was wondering if there is a method for filtering individual genotype calls when using CombineVariants to merge single-called VCF files. The desired behavior that I would like would be a hybrid between the KEEP_IF_ANY_UNFILTERED and KEEP_IF_ALL_UNFILTERED arguments to the -filteredRecordsMergeType. By this, I mean that any site that is unfiltered in any input will remain unfiltered in the output, but for any genotype call from a filtered input should have a filter annotation in the "FT" field of the genotype. I will show a simplified example below (extraneous columns removed from the sample files):
#CHROM POS ID (...) FILTER FORMAT SAMPLE1 1 11916764 rs79387574 (...) PASS GT:DP 0/0:45
#CHROM POS ID (...) FILTER FORMAT SAMPLE2 1 11916764 rs79387574 (...) LowQ GT:DP 0/1:3
#CHROM POS ID (...) FILTER FORMAT SAMPLE1 SAMPLE2 1 11916764 rs79387574 (...) PASS GT:DP:FT 0/0:45:PASS 0/1:3:LowQ
The reason for requesting this is there is occasionally a single sample that may have had a bad call at a site. Using the "KEEP_IF_ALL_UNFILTERED" filters N-1 high quality calls. However, on the other extreme, if we use "KEEP_IF_ANY_UNFILTERED" and only a single sample passes the filters, we introduce N-1 low quality calls and assert that they pass our requisite filters. The requested hybrid method will keep all information from the input samples and allow for better granularity.