Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Interpreting '.' in AD field of

FabriceBesnardFabriceBesnard ParisMember
edited May 2014 in Ask the GATK team

Hi,

I'm doing a variant analysis of genomic DNA from 2 related samples. I followed the up-to-date Best practices using HaplotypeCaller in GVCF mode for both samples followed by GenotypeGVCF to compute a common vcf of variant loci.
I'm looking at variants that would be sample2-specific (present in sample2 but not in sample1)

Here is a line of this file:

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1 Sample2
chrIII 91124 . A AATAAGAGGAATTAGGCT 1132.42 . AC=2;AF=0.500;AN=4;DP=47;FS=0.000;MLEAC=2;MLEAF=0.500;MQ=58.85;MQ0=0;QD=7.99 GT:AD:DP:GQ:PL 1/1:0,25:25:55:1167,55,0 0/0:.:22:33:0,33,495

In the Genotype Field, sample2.AD is a . (dot) meaning that no reads passed the Quality filters. However, sample2.DP=22 meaning that 22 reads covered this position.
This line suggest that this variation is specific to sample1 (genotype HomVar 1/1) and is not present in sample2 (HomRef 0/0). But given the biological relationship between sample1 and 2 (the way they were generated), I doubt that this variation is true: it is very likely to be present in sample2 as well. It's a false

I have 416 loci like this. For the vast majority of them, sample1 and 2 likely share the same variation. But since it is not impossible that a very few of them are really sample1=HomVar and sample2=HomRef, could you suggest me a way to detect those guys?
What about comparing sample1.PL(1/1) and sample2.PL(0/0) ? For example could you suggest a rule of thumb to determine their ratio ?

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @FabriceBesnard‌

    Hi,

    Have you tried to use VQSR or manual filters to weed out the false positives?

    -Sheila

  • FabriceBesnardFabriceBesnard ParisMember

    Hi Sheila,
    Thank you for replying.
    I have a non model organism with no list of known variations: I can't apply VQSR

    I applied manual hard filters, mainly based on coverage, but I also look at QUAL, QD, PL. My reads are supposed to achieve ~ 20X average coverage, so that you can see that the record I gave you in example is well covered and other parameters are no low either.

    Basically I am looking for a better hard filter that would help me filter my call set.
    In my experiment, sample2 is derived from sample1 by mutation accumulation. Most of the mutations present in sample1 are then background mutations that will also be present in sample2: when sample1 is HomVar, sample2 is also very likely HomVar. But it is still possible that a mutation hit one of those background mutation and reverse it exactly as in the reference genome. If they exist, I would like to identify those putative unlikely mutations in my vcf !

Sign In or Register to comment.