The current GATK version is 3.3-0

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

US Holiday notice: this Thursday and Friday (Nov 25-26) the forum will be unattended. Normal service will resume Monday Nov 29. Happy Thanksgiving!

# weird multisample output

LondonPosts: 14Member

Hi, I have run GATK Unified genotyper with multisample and I am getting unexpected output. (Hard filters used)

Here are examples of the output format I get in the vcf for a sample: 0/1:43,4:PASS:24:24,0,4368, ./.:.:PASS or 0/0:2,0:2:DPFilter:6:0,6,47 as well as 0/0:12,0:37:0,37,830. The latter is the format I would expect. What does this mean? Also I get the filter PASS even when all but two samples have ./.:.:PASS and the other two have 1/1:0,1:1:DPFilter;GQFilter:3:38,3,0 and 1/1:0,1:1:DPFilter;GQFilter:3:38,3,0.

Also I ran HalpotypeCaller multisample and get similar output all though more of the 0/0:12,0:37:0,37,830 format....

Tagged:

Nothing to worry about. The samples that are non-reference have either PASS or the name of the filter they failed; the samples that can't be called because there is no coverage have no-call fields (./. etc); and the last are called reference so there is no filtering annotation.

Geraldine Van der Auwera, PhD

• LondonPosts: 14Member

Thanks for your feedback, but I don't understand why the final Filter category is PASS if across all samples there is not enough information to call at that position or where there is they do not individually pass filter. Is this a bug? Also I don't understand why where there is not enough info to call a variant i.e. ./. why they are given the PASS category e.g. ./.:.:PASS

• LondonPosts: 14Member

Also we get 0/0:3,0:PASS:9:0,9,96 - i.e. reference but still get the filter info....

Two points:

1. There's a difference between site-based filtering and sample genotype-based filtering.
2. Check your JEXL expressions -- some expressions evaluate as true when an annotation is missing.

Geraldine Van der Auwera, PhD

• LondonPosts: 14Member

Could you elaborate on 1? I am not sure how this would justify a passed call in the example I gave.

Here is my JEXL expression for this data - can you see an issue??

$Java7 -Xmx30g -jar$GATKs/GenomeAnalysisTK.jar \ -R $Ref \ -T VariantFiltration \ --downsample_to_coverage$dcovg \ --downsampling_type BY_SAMPLE \ -o $MyOutDir/$TargetOut/$FamName.UnifiedGenotyper.snp.filtered.vcf \ --variant$MyOutDir/$TargetOut/$FamName.UnifiedGenotyper.snp.vcf \ --mask $MyOutDir/$TargetOut/$FamName.UnifiedGenotyper.indel.vcf \ --maskName InDel \ --filterExpression "QD < 2.0" \ --filterExpression "MQ < 40.0" \ --filterExpression "FS > 60.0" \ --filterExpression "MQRankSum < -12.5" \ --filterExpression "ReadPosRankSum < -8.0" \ --filterName QDFilter \ --filterName MQFilter \ --filterName FSFilter \ --filterName MQRankSumFilter \ --filterName ReadPosFilter\ --genotypeFilterExpression "DP < 4" \ --genotypeFilterName "DPFilter" \ --genotypeFilterExpression "GQ < 5.0" \ --genotypeFilterName "GQFilter" • Posts: 6,682Administrator, GATK Developer admin I was responding to this statement: Also I get the filter PASS even when all but two samples have ./.:.:PASS and the other two have 1/1:0,1:1:DPFilter;GQFilter:3:38,3,0 and 1/1:0,1:1:DPFilter;GQFilter:3:38,3,0. and: I don't understand why the final Filter category is PASS if across all samples there is not enough information to call at that position or where there is they do not individually pass filter. The main problem here is that you don't specify which level of filtering you mean in each case. I could be wrong in my interpretation but you seem to be conflating the two levels of filtering. It is quite possible to have a variant call pass a certain filter (at the site level) even if one or more samples fail some filters for this site. The two levels of filtering are independent. Geraldine Van der Auwera, PhD • LondonPosts: 14Member Ah, ok. Let me try and explain better. So in multisample output the overall Filter reports PASS, BUT all but two samples for this variant are reported as ./.:.:PASS and the other two have 1/1:0,1:1:DPFilter;GQFilter:3:38,3,0 and 1/1:0,1:1:DPFilter;GQFilter:3:38,3,0. • LondonPosts: 14Member I guess the question is two fold - why is there a PASS annotation for individual samples were there is not enough information to call (i.e. ./.:.:PASS) and why would the overall filter say PASS when it seems there is nothing to report. • Posts: 6,682Administrator, GATK Developer admin I see, thanks for clarifying your questions. why is there a PASS annotation for individual samples were there is not enough information to call (i.e. ./.:.:PASS) This is because the JEXL evaluation is myopic, in the sense that it only looks at the annotation you tell it to. This is completely independent of the calling mechanism. And unfortunately it's extremely literal-minded, so if the annotation is absent (because no call was made) the JEXL evaluator only sees that the filtering condition is not met (e.g. "." is not lower than 4) and therefore gives the sample a pass. This is documented as a gotcha of the JEXL evaluation system. The workaround is to add a check ("does the annotation exist?") before performing the evaluation, which acts sort of like a dead man switch. why would the overall filter say PASS when it seems there is nothing to report Same problem as above; the missing annotations lead the JEXL to pass the site because there is nothing there to make it fail. Remember that failing genotype-level filter has no bearing on the site-level filters; and you are not applying any filters at the site level that would fail that site. Does that make sense? Geraldine Van der Auwera, PhD • LondonPosts: 14Member Yes it does, thank you - how/where in the pipeline would I build in the "does annotation exist"? Also why are some sample genotype descriptors annotated with PASS or the filter they failed by and not others? • LondonPosts: 14Member Having thought more about it, I know the answer to the latter question. For multi sample, it looks like it is best to just use genotypexpression filters, right? • Posts: 6,682Administrator, GATK Developer admin Hi @ledwards, The basic filtering (choosing which variants are likely to be real) is typically done on the site-level annotations, not at the genotype level. The idea is that you first ask the question: "Is there convincing evidence that there is variation at this position in one or more samples in the cohort?". Then for each sample, you ask: "Given that there is variation here in the population, does this particular sample carry the variant?" Geraldine Van der Auwera, PhD • LondonPosts: 14Member My question still remains though, why do some sites have the genotype filter applied and others seem not to. see example output below (chr loaction missing) created from this command... Java7 -Xmx30g -jar$GATKs/GenomeAnalysisTK.jar \ -R $Ref \ -T VariantFiltration \ --downsample_to_coverage$dcovg \ --downsampling_type BY_SAMPLE \ -o $MyOutDir/$TargetOut/$FamName.UnifiedGenotyper.snp.filtered.vcf \ --variant$MyOutDir/$TargetOut/$FamName.UnifiedGenotyper.snp.vcf \ --mask $MyOutDir/$TargetOut/\$FamName.UnifiedGenotyper.indel.vcf \ --maskName InDel \ --filterExpression "QD < 2.0" \ --filterExpression "MQ < 40.0" \ --filterExpression "FS > 60.0" \ --filterExpression "MQRankSum < -12.5" \ --filterExpression "ReadPosRankSum < -8.0" \ --filterName QDFilter \ --filterName MQFilter \ --filterName FSFilter \ --filterName MQRankSumFilter \ --filterName ReadPosFilter\ --genotypeFilterExpression "DP < 4" \ --genotypeFilterName "DPFilter" \ --genotypeFilterExpression "GQ < 5.0" \ --genotypeFilterName "GQFilter"