Bug Bulletin: The GenomeLocPArser error in SplitNCigarReads has been fixed; if you encounter it, use the latest nightly build.

weird multisample output

ledwardsledwards LondonPosts: 14Member

Hi, I have run GATK Unified genotyper with multisample and I am getting unexpected output. (Hard filters used)

Here are examples of the output format I get in the vcf for a sample: 0/1:43,4:PASS:24:24,0,4368, ./.:.:PASS or 0/0:2,0:2:DPFilter:6:0,6,47 as well as 0/0:12,0:37:0,37,830. The latter is the format I would expect. What does this mean? Also I get the filter PASS even when all but two samples have ./.:.:PASS and the other two have 1/1:0,1:1:DPFilter;GQFilter:3:38,3,0 and 1/1:0,1:1:DPFilter;GQFilter:3:38,3,0.

Also I ran HalpotypeCaller multisample and get similar output all though more of the 0/0:12,0:37:0,37,830 format....

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,192Administrator, GATK Developer admin

    Nothing to worry about. The samples that are non-reference have either PASS or the name of the filter they failed; the samples that can't be called because there is no coverage have no-call fields (./. etc); and the last are called reference so there is no filtering annotation.

    Geraldine Van der Auwera, PhD

  • ledwardsledwards LondonPosts: 14Member

    Thanks for your feedback, but I don't understand why the final Filter category is PASS if across all samples there is not enough information to call at that position or where there is they do not individually pass filter. Is this a bug? Also I don't understand why where there is not enough info to call a variant i.e. ./. why they are given the PASS category e.g. ./.:.:PASS

  • ledwardsledwards LondonPosts: 14Member

    Also we get 0/0:3,0:PASS:9:0,9,96 - i.e. reference but still get the filter info....

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,192Administrator, GATK Developer admin

    Two points:

    1. There's a difference between site-based filtering and sample genotype-based filtering.
    2. Check your JEXL expressions -- some expressions evaluate as true when an annotation is missing.

    Geraldine Van der Auwera, PhD

  • ledwardsledwards LondonPosts: 14Member

    Could you elaborate on 1? I am not sure how this would justify a passed call in the example I gave.

    Here is my JEXL expression for this data - can you see an issue??

    $Java7 -Xmx30g -jar $GATKs/GenomeAnalysisTK.jar \ -R $Ref \ -T VariantFiltration \ --downsample_to_coverage $dcovg \ --downsampling_type BY_SAMPLE \ -o $MyOutDir/$TargetOut/$FamName.UnifiedGenotyper.snp.filtered.vcf \ --variant $MyOutDir/$TargetOut/$FamName.UnifiedGenotyper.snp.vcf \ --mask $MyOutDir/$TargetOut/$FamName.UnifiedGenotyper.indel.vcf \ --maskName InDel \ --filterExpression "QD < 2.0" \ --filterExpression "MQ < 40.0" \ --filterExpression "FS > 60.0" \ --filterExpression "MQRankSum < -12.5" \ --filterExpression "ReadPosRankSum < -8.0" \ --filterName QDFilter \ --filterName MQFilter \ --filterName FSFilter \ --filterName MQRankSumFilter \ --filterName ReadPosFilter\ --genotypeFilterExpression "DP < 4" \ --genotypeFilterName "DPFilter" \ --genotypeFilterExpression "GQ < 5.0" \ --genotypeFilterName "GQFilter"

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,192Administrator, GATK Developer admin

    I was responding to this statement:

    Also I get the filter PASS even when all but two samples have ./.:.:PASS and the other two have 1/1:0,1:1:DPFilter;GQFilter:3:38,3,0 and 1/1:0,1:1:DPFilter;GQFilter:3:38,3,0.

    and:

    I don't understand why the final Filter category is PASS if across all samples there is not enough information to call at that position or where there is they do not individually pass filter.

    The main problem here is that you don't specify which level of filtering you mean in each case. I could be wrong in my interpretation but you seem to be conflating the two levels of filtering. It is quite possible to have a variant call pass a certain filter (at the site level) even if one or more samples fail some filters for this site. The two levels of filtering are independent.

    Geraldine Van der Auwera, PhD

  • ledwardsledwards LondonPosts: 14Member

    Ah, ok. Let me try and explain better. So in multisample output the overall Filter reports PASS, BUT all but two samples for this variant are reported as ./.:.:PASS and the other two have 1/1:0,1:1:DPFilter;GQFilter:3:38,3,0 and 1/1:0,1:1:DPFilter;GQFilter:3:38,3,0.

  • ledwardsledwards LondonPosts: 14Member

    I guess the question is two fold - why is there a PASS annotation for individual samples were there is not enough information to call (i.e. ./.:.:PASS) and why would the overall filter say PASS when it seems there is nothing to report.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,192Administrator, GATK Developer admin

    I see, thanks for clarifying your questions.

    why is there a PASS annotation for individual samples were there is not enough information to call (i.e. ./.:.:PASS)

    This is because the JEXL evaluation is myopic, in the sense that it only looks at the annotation you tell it to. This is completely independent of the calling mechanism. And unfortunately it's extremely literal-minded, so if the annotation is absent (because no call was made) the JEXL evaluator only sees that the filtering condition is not met (e.g. "." is not lower than 4) and therefore gives the sample a pass. This is documented as a gotcha of the JEXL evaluation system. The workaround is to add a check ("does the annotation exist?") before performing the evaluation, which acts sort of like a dead man switch.

    why would the overall filter say PASS when it seems there is nothing to report

    Same problem as above; the missing annotations lead the JEXL to pass the site because there is nothing there to make it fail. Remember that failing genotype-level filter has no bearing on the site-level filters; and you are not applying any filters at the site level that would fail that site.

    Does that make sense?

    Geraldine Van der Auwera, PhD

  • ledwardsledwards LondonPosts: 14Member

    Yes it does, thank you - how/where in the pipeline would I build in the "does annotation exist"?

    Also why are some sample genotype descriptors annotated with PASS or the filter they failed by and not others?

  • ledwardsledwards LondonPosts: 14Member

    Having thought more about it, I know the answer to the latter question. For multi sample, it looks like it is best to just use genotypexpression filters, right?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,192Administrator, GATK Developer admin

    Hi @ledwards,

    The basic filtering (choosing which variants are likely to be real) is typically done on the site-level annotations, not at the genotype level. The idea is that you first ask the question: "Is there convincing evidence that there is variation at this position in one or more samples in the cohort?". Then for each sample, you ask: "Given that there is variation here in the population, does this particular sample carry the variant?"

    Geraldine Van der Auwera, PhD

  • ledwardsledwards LondonPosts: 14Member

    My question still remains though, why do some sites have the genotype filter applied and others seem not to. see example output below (chr loaction missing) created from this command...

    Java7 -Xmx30g -jar $GATKs/GenomeAnalysisTK.jar \ -R $Ref \ -T VariantFiltration \ --downsample_to_coverage $dcovg \ --downsampling_type BY_SAMPLE \ -o $MyOutDir/$TargetOut/$FamName.UnifiedGenotyper.snp.filtered.vcf \ --variant $MyOutDir/$TargetOut/$FamName.UnifiedGenotyper.snp.vcf \ --mask $MyOutDir/$TargetOut/$FamName.UnifiedGenotyper.indel.vcf \ --maskName InDel \ --filterExpression "QD < 2.0" \ --filterExpression "MQ < 40.0" \ --filterExpression "FS > 60.0" \ --filterExpression "MQRankSum < -12.5" \ --filterExpression "ReadPosRankSum < -8.0" \ --filterName QDFilter \ --filterName MQFilter \ --filterName FSFilter \ --filterName MQRankSumFilter \ --filterName ReadPosFilter\ --genotypeFilterExpression "DP < 4" \ --genotypeFilterName "DPFilter" \ --genotypeFilterExpression "GQ < 5.0" \ --genotypeFilterName "GQFilter"

    A T 542 PASS ABHet=0.591;ABHom=0.989;AC=1;AF=5.376e-03;AN=186;BaseQRankSum=-1.329;DB;DP=8946;Dels=0.00;FS=0.000;GC=44.55;HRun=0;HW=0.0;HaplotypeScore=68.9837;InbreedingCoeff=-0.0054;MLEAC=1;MLEAF=5.376e-03;MQ=45.25;MQ0=0;MQRankSum=-1.783;OND=0.091;QD=7.32;ReadPosRankSum=2.294 GT:AD:DP:GQ:PL 0/0:85,0:86:99:0,204,2166 0/0:41,0:43:99:0,114,1236 0/0:60,0:62:99:0,153,1619 0/0:57,0:59:99:0,162,1697 0/0:107,0:107:99:0,279,3023 C A 86.59 PASS ABHet=0.429;ABHom=0.982;AC=3;AF=0.018;AN=168;BaseQRankSum=-2.091;DB;DP=526;Dels=0.00;FS=1.712;GC=52.48;HRun=1;HW=14.4;HaplotypeScore=6.5558;InbreedingCoeff=0.0408;MLEAC=2;MLEAF=0.012;MQ=41.08;MQ0=0;MQRankSum=-0.477;OND=0.076;QD=8.66;ReadPosRankSum=1.035 GT:AD:DP:FT:GQ:PL 0/0:6,0:6:PASS:12:0,12,119 0/0:1,0:1:DPFilter;GQFilter:3:0,3,22 0/0:4,0:4:PASS:12:0,12,129 0/0:4,0:4:PASS:6:0,6,74 0/0:7,0:7:PASS:15:0,15,172

  • ebanksebanks Posts: 683GATK Developer mod

    When all genotypes within a record pass filters, it looks like as an optimization is just leaves out the PASS from each one (since it is implicit).

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

Sign In or Register to comment.