Hi GATK Users,

Happy Thanksgiving!
Our staff will be observing the holiday and will be unavailable from 22nd to 25th November. This will cause a delay in reaching out to you and answering your questions immediately. Rest assured we will get back to it on Monday November 26th. We are grateful for your support and patience.
Have a great holiday everyone!!!

GATK Staff

Genotype DP information missing in multisample vcf

prepagamprepagam Member
edited January 3 in Ask the GATK team

I used genotype gvcfs from 10 gvcf's - emitting all sites to get an all sites vcf (i'm interested in both variant and invariant sites).
I get some sites where there is DP in the INFO column, but DP information was not provided in genotype DP.
chr38 1326909 . T . 0.30 . DP=250 GT:AD ./.:0,0 ./.:0,0 ./.:0,0 ./.:0,0 ./.:0,0 ./.:0,0 ./.:0,0 ./.:0,0 ./.:0,0 ./.:0,0

Is this expected behaviour?

I'm using gatk 3.7
java -jar -Xmx16G ${GATK} \
-T GenotypeGVCFs \
-allSites \
-stand_call_conf 0 \
-L ${Chrom} \
-V [all the variant files]



  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Hi @prepagam,

    The ./. genotypes indicate inconclusive genotypes. Despite the read coverage, DP=250, it appears no reads are considered towards genotyping for all of your samples. In this region, you also have a REF allele but no ALT allele. Do you expect variation in this region?

    Please checkout https://software.broadinstitute.org/gatk/documentation/article.php?id=4721 for a discussion of AD versus DP. Notice that for DP:

    Coverage (DP): outputs the filtered depth of coverage for each sample and the unfiltered depth of coverage across all samples.

    So your cohort level DP field counts total read depth for all samples before any filtering, e.g. by the NotPrimaryAlignmentFilter, whereas AD will only count those reads that pass filtering and that are considered informative towards one allele or another by HaplotypeCaller. You can see the list of filters for HaplotypeCaller at https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_gatk_tools_walkers_haplotypecaller_HaplotypeCaller.php.

    For this site in question, can you check on IGV what the read and allele coverage look like? It may be that all of the reads in the region are secondary alignments. This can occur in regions of the genome that share sequence with other regions and is not uncommon.

  • prepagamprepagam Member

    Thank you this is helpful. I guess I would have expected genotype fields i.e. GT:AD to be GT:AD:DP and the DP value to be 0 rather than completely absent (because after filtering there are no reads left).

  • shleeshlee CambridgeMember, Broadie, Moderator admin


    Sorry, I misunderstood what you meant by expected behavior. You are referring to the tool behavior of outputting GT:AD:DP versus GT:AD and not the existence of regions where it would be possible to have all uninformative reads.

    Can you do me a favor then and post a representative record for the same site from the per sample GVCF? Also, can you post your HaplotypeCaller command? We should confirm that the DP information is output in the mode you ran HaplotypeCaller in, presumably -ERC BP_RESOLUTION.

Sign In or Register to comment.