Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

What is the significance of "Depth across all samples" (DP) in INFO ?

nkobmoonkobmoo ParisMember

Hi,

Although I have read through the related topics, I'm still quite confused about the significance of "Depth across all samples" (DP) in INFO in the vcf file. Does "across samples" mean it addition the read depth of all the samples together or is it a mean over all the samples?
In my vcf file (after joint genotyping in gvcf mode), I obtained DP in INFO between 30 and 99 while the sample-DP are much less in general.
I think the DP in INFO is a sum of depth, am I right?

Best Answer

Answers

  • Hi Sheila,

    I'm encountering a related problem: the per-sample depth (DP in FORMAT) reported after running GenotypeGVCFs is consistently much higher than the sample depths reported in the original gvcfs. For example:

    From the vcf file, looking at sample 68148-2:
    GT:AD:DP:GQ:PGT:PID:PL 0/0:522,0:522:0:.:.:0,0,13060

    From the same site in 68148-2.g.vcf
    GT:DP:GQ:MIN_DP:PL 0/0:13:36:12:0,36,376

    As you can see, the depth in the vcf file is WAY higher. This is found for all samples at this site, and at a glance seems to be occurring at most of the sites in the vcf. Why might the depth scores be so inflated after genotypegvcfs?

    I don't know if this helps, but I checked a vcf produced from genotypegvcfs on a subset of my population, and I see that the same inflated depths are being reported.

    Thank you for your help!

  • Forgot to mention, this is with GATK 3.8. I'm re-running a couple jobs with 3.7 to see if the same thing occurs.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @twooldridge
    Hi,

    That is odd. Can you check if this still happens in GATK4 latest beta?

    Thanks,
    Sheila

  • Hi Sheila,

    Thank you for the reply. It'll take a little bit to run GATK4, as I have to make a GenomicsDB database first (I haven't used GATK4 yet), and this step seems to be taking quite a while. Is there anything else you would recommend trying in the meantime?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @twooldridge
    Hi,

    Can you simply use CombineGVCFs in GATK3 to produce a single GVCF, then use GenotypeGVCFs in GATK4? If that still produces the odd output, can you try running HaplotypeCaller in GATK4, CombineGVCFs in GATK3, and GenotypeGVCFs in GATK4? I am assuming the GenomicsDBImport step is the blocker in using GATK4.

    If not, can you submit a bug report? Development has pretty much halted in GATK3, but if this is a bug, I can make a ticket for someone to look into it.

    Thanks,
    Sheila

  • twooldridgetwooldridge Member
    edited December 2017

    Hi Sheila,

    I followed your advice, combining samples into a single gVCF first. After running GenotypeGVCFs in GATK4, I'm seeing the same phenomenon, with depths much higher than they should be according to the per-sample gvcf files. The depth counts are exactly the same at a given site between runs. I'm also seeing many no-calls where depth is quite high (e.g., ./.:70,0:70:.:.:.:0,0,0). I see that this problem is mentioned elsewhere in the forums and could be the result of other factors, but nevertheless I thought it was worth pointing out. Do you have any suggestions?

    Thank you for your help!

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @twooldridge
    Hi,

    Interesting. I am not sure I have heard of this issue specifically, but if you can submit a bug report, I can take a look. Instructions are here.

    Thanks,
    Sheila

Sign In or Register to comment.