Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!

Interpreting '.' in AD field of

FabriceBesnardFabriceBesnard ParisMember
edited May 2014 in Ask the GATK team


I'm doing a variant analysis of genomic DNA from 2 related samples. I followed the up-to-date Best practices using HaplotypeCaller in GVCF mode for both samples followed by GenotypeGVCF to compute a common vcf of variant loci.
I'm looking at variants that would be sample2-specific (present in sample2 but not in sample1)

Here is a line of this file:

chrIII 91124 . A AATAAGAGGAATTAGGCT 1132.42 . AC=2;AF=0.500;AN=4;DP=47;FS=0.000;MLEAC=2;MLEAF=0.500;MQ=58.85;MQ0=0;QD=7.99 GT:AD:DP:GQ:PL 1/1:0,25:25:55:1167,55,0 0/0:.:22:33:0,33,495

In the Genotype Field, sample2.AD is a . (dot) meaning that no reads passed the Quality filters. However, sample2.DP=22 meaning that 22 reads covered this position.
This line suggest that this variation is specific to sample1 (genotype HomVar 1/1) and is not present in sample2 (HomRef 0/0). But given the biological relationship between sample1 and 2 (the way they were generated), I doubt that this variation is true: it is very likely to be present in sample2 as well. It's a false

I have 416 loci like this. For the vast majority of them, sample1 and 2 likely share the same variation. But since it is not impossible that a very few of them are really sample1=HomVar and sample2=HomRef, could you suggest me a way to detect those guys?
What about comparing sample1.PL(1/1) and sample2.PL(0/0) ? For example could you suggest a rule of thumb to determine their ratio ?

Best Answer


  • SheilaSheila Broad InstituteMember, Broadie admin



    Have you tried to use VQSR or manual filters to weed out the false positives?


  • FabriceBesnardFabriceBesnard ParisMember

    Hi Sheila,
    Thank you for replying.
    I have a non model organism with no list of known variations: I can't apply VQSR

    I applied manual hard filters, mainly based on coverage, but I also look at QUAL, QD, PL. My reads are supposed to achieve ~ 20X average coverage, so that you can see that the record I gave you in example is well covered and other parameters are no low either.

    Basically I am looking for a better hard filter that would help me filter my call set.
    In my experiment, sample2 is derived from sample1 by mutation accumulation. Most of the mutations present in sample1 are then background mutations that will also be present in sample2: when sample1 is HomVar, sample2 is also very likely HomVar. But it is still possible that a mutation hit one of those background mutation and reverse it exactly as in the reference genome. If they exist, I would like to identify those putative unlikely mutations in my vcf !

Sign In or Register to comment.