Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office on October 14, 2019, due to the U.S. holiday. We will return to monitoring the forum on October 15.

What is the significance of "Depth across all samples" (DP) in INFO ?

nkobmoonkobmoo ParisMember

Hi,

Although I have read through the related topics, I'm still quite confused about the significance of "Depth across all samples" (DP) in INFO in the vcf file. Does "across samples" mean it addition the read depth of all the samples together or is it a mean over all the samples?
In my vcf file (after joint genotyping in gvcf mode), I obtained DP in INFO between 30 and 99 while the sample-DP are much less in general.
I think the DP in INFO is a sum of depth, am I right?

Best Answer

Answers

  • twooldridgetwooldridge Member

    Hi Sheila,

    I'm encountering a related problem: the per-sample depth (DP in FORMAT) reported after running GenotypeGVCFs is consistently much higher than the sample depths reported in the original gvcfs. For example:

    From the vcf file, looking at sample 68148-2:
    GT:AD:DP:GQ:PGT:PID:PL 0/0:522,0:522:0:.:.:0,0,13060

    From the same site in 68148-2.g.vcf
    GT:DP:GQ:MIN_DP:PL 0/0:13:36:12:0,36,376

    As you can see, the depth in the vcf file is WAY higher. This is found for all samples at this site, and at a glance seems to be occurring at most of the sites in the vcf. Why might the depth scores be so inflated after genotypegvcfs?

    I don't know if this helps, but I checked a vcf produced from genotypegvcfs on a subset of my population, and I see that the same inflated depths are being reported.

    Thank you for your help!

  • twooldridgetwooldridge Member

    Forgot to mention, this is with GATK 3.8. I'm re-running a couple jobs with 3.7 to see if the same thing occurs.

  • SheilaSheila admin Broad InstituteMember, Broadie, Moderator admin

    @twooldridge
    Hi,

    That is odd. Can you check if this still happens in GATK4 latest beta?

    Thanks,
    Sheila

  • twooldridgetwooldridge Member

    Hi Sheila,

    Thank you for the reply. It'll take a little bit to run GATK4, as I have to make a GenomicsDB database first (I haven't used GATK4 yet), and this step seems to be taking quite a while. Is there anything else you would recommend trying in the meantime?

  • SheilaSheila admin Broad InstituteMember, Broadie, Moderator admin

    @twooldridge
    Hi,

    Can you simply use CombineGVCFs in GATK3 to produce a single GVCF, then use GenotypeGVCFs in GATK4? If that still produces the odd output, can you try running HaplotypeCaller in GATK4, CombineGVCFs in GATK3, and GenotypeGVCFs in GATK4? I am assuming the GenomicsDBImport step is the blocker in using GATK4.

    If not, can you submit a bug report? Development has pretty much halted in GATK3, but if this is a bug, I can make a ticket for someone to look into it.

    Thanks,
    Sheila

  • twooldridgetwooldridge Member
    edited December 2017

    Hi Sheila,

    I followed your advice, combining samples into a single gVCF first. After running GenotypeGVCFs in GATK4, I'm seeing the same phenomenon, with depths much higher than they should be according to the per-sample gvcf files. The depth counts are exactly the same at a given site between runs. I'm also seeing many no-calls where depth is quite high (e.g., ./.:70,0:70:.:.:.:0,0,0). I see that this problem is mentioned elsewhere in the forums and could be the result of other factors, but nevertheless I thought it was worth pointing out. Do you have any suggestions?

    Thank you for your help!

  • SheilaSheila admin Broad InstituteMember, Broadie, Moderator admin

    @twooldridge
    Hi,

    Interesting. I am not sure I have heard of this issue specifically, but if you can submit a bug report, I can take a look. Instructions are here.

    Thanks,
    Sheila

Sign In or Register to comment.