Genotype DP information missing in multisample vcf

prepagamprepagam Member
edited January 3 in Ask the GATK team

I used genotype gvcfs from 10 gvcf's - emitting all sites to get an all sites vcf (i'm interested in both variant and invariant sites).
I get some sites where there is DP in the INFO column, but DP information was not provided in genotype DP.
i.e.
chr38 1326909 . T . 0.30 . DP=250 GT:AD ./.:0,0 ./.:0,0 ./.:0,0 ./.:0,0 ./.:0,0 ./.:0,0 ./.:0,0 ./.:0,0 ./.:0,0 ./.:0,0

Is this expected behaviour?

I'm using gatk 3.7
Command
java -jar -Xmx16G ${GATK} \
-T GenotypeGVCFs \
-R ${REFERENCE} \
-allSites \
-stand_call_conf 0 \
-L ${Chrom} \
-V [all the variant files]

Tagged:

Answers

  • shleeshlee CambridgeMember, Broadie, Moderator

    Hi @prepagam,

    The ./. genotypes indicate inconclusive genotypes. Despite the read coverage, DP=250, it appears no reads are considered towards genotyping for all of your samples. In this region, you also have a REF allele but no ALT allele. Do you expect variation in this region?

    Please checkout https://software.broadinstitute.org/gatk/documentation/article.php?id=4721 for a discussion of AD versus DP. Notice that for DP:

    Coverage (DP): outputs the filtered depth of coverage for each sample and the unfiltered depth of coverage across all samples.

    So your cohort level DP field counts total read depth for all samples before any filtering, e.g. by the NotPrimaryAlignmentFilter, whereas AD will only count those reads that pass filtering and that are considered informative towards one allele or another by HaplotypeCaller. You can see the list of filters for HaplotypeCaller at https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_gatk_tools_walkers_haplotypecaller_HaplotypeCaller.php.

    For this site in question, can you check on IGV what the read and allele coverage look like? It may be that all of the reads in the region are secondary alignments. This can occur in regions of the genome that share sequence with other regions and is not uncommon.

  • Thank you this is helpful. I guess I would have expected genotype fields i.e. GT:AD to be GT:AD:DP and the DP value to be 0 rather than completely absent (because after filtering there are no reads left).

  • shleeshlee CambridgeMember, Broadie, Moderator

    @prepagam,

    Sorry, I misunderstood what you meant by expected behavior. You are referring to the tool behavior of outputting GT:AD:DP versus GT:AD and not the existence of regions where it would be possible to have all uninformative reads.

    Can you do me a favor then and post a representative record for the same site from the per sample GVCF? Also, can you post your HaplotypeCaller command? We should confirm that the DP information is output in the mode you ran HaplotypeCaller in, presumably -ERC BP_RESOLUTION.

Sign In or Register to comment.