If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

How is defined a missing genotype for a sample in a multi-sample calling ?

sletortsletort franceMember


We perform a GenotypeGVCFs+VQSR analysis following best practices.

We saw the option -stand_call_conf, but this option concern the variant.
We would like to know how a genotype is set to missing on a called variant ?

We suppose it involes DP and PL, but could you give details ?

Best Answers


  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    @sletort Maybe I misunderstand but either your site is marked as LowQual (if -stand_call_conf and -stand_emit_conf are different) or a missing genotype has GT ./. See the VCF specifications.

  • sletortsletort franceMember
    edited March 2015

    I forgot to explain we work on multi-sample vcf.
    A site can be called thanks to the information of many samples.

    My question is about a site that was called, that successfully PASS the VQSR filter.
    On such site, some sample have a genotype, other not.
    For many missing genotype this is due to "no data" in the bam, but when a few reads are present, it is not clear for us which values are used to decide if the sample will have a genotype or not.

  • sletortsletort franceMember

    I have some information to complete your answer Geraldine.

    I deeply checked all the missing data for one of our sample.
    It appears that all cases with reads belong to an overlapping deletion.

    It is the mix with indel and multi-sample that make it difficult.

    in the vcf (showing SAMPLE1 and SAMPLE2) :
    1 5 . TACATG T PASS ... GT 1/1 0/0 1 7 . C A PASS ... GT ./. 1/1
    So SAMPLE1 has a missing genotype because there is no data !

    For more complexe variants, there are some reads where threre should be no data (mis-alignment), but I suppose that GenotypeGVCFs detects the proximity of the deletion and the inconsistency to place a genotype where there should be no base.

  • sletortsletort franceMember

    I just discover that sometimes those overlapping do not make a ./.
    I'll put data on your ftp as @Sheila demanded on another thread "dp-0-and-dp-gt-with-no-read".
    -> same cause, different effect.

    Issue · Github
    by Geraldine_VdAuwera

    Issue Number
    Last Updated
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Thanks @sletort, I've put in an issue ticket; we'll try to look at this soon.

  • sletortsletort franceMember

    Well in fact this case is different from the one in the other thread (I made a separate bug report).
    I put this one on the ftp in the file

    Like I said, all are variants overlapping a deletion, with a GT, no AD but a DP.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin


    I think the bam snippet you submitted may be a little too small. It is better to test on a larger region surrounding the site of interest, so Haplotype Caller can find active regions and do reassembly. Can you please upload another bam snippet with ~500 bases on either side of the site of interest?


  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin


    This issue seems to be fixed in the latest version.


Sign In or Register to comment.