depth of coverage from gVCF files

Dear all,

I have a large number of gVCF files, either single samples or combined in approximately 100 samples per combined gVCF file.
I would like to compute something like the average depth for a set of regions of interest from the combined or single gVCF files.
I can see that I could try to get a VCF output at every position, and use that to infer the percentage with a given read depth, but that seems mightily cumbersome.

So my question: is there an equivalent to the DepthOfCoverage GATK module that takes as input gVCF files? ideally combined gVCF and extract per sample average/median/minimum depth but otherwise I can work with single sample gVCF data.

Thank you in advance

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi Vincent,

    I'm afraid there's nothing built-in to do that. I guess you could use the GVCF depth annotations, but you'd have to script that, maybe via VariantsToTable. Any reason why you can't just go back to the bams to get that info?

  • OK I suppose it's not to hard to write a script to do that. We'll manage.

    My motivation for using the gVCFs is that starting from tons of BAM files, not always well organised, with some excluded for various reasons, it's hard to keep track. On the other hand I know what gVCF files I used, the IDs must match what I have in my final VCF... much cleaner to derive any QC statistic using the combined gVCF files.

  • simonsanchezjsimonsanchezj GermanyMember

    Hello, I also need to run GATK DepthofCoverage in some large data. Unfortunately, I do not have the BAM files with me, so I was wondering if a script to do this from gvcfs has bee written by GATK team or any other collaborator. Thanks

    @vplagnol said:
    OK I suppose it's not to hard to write a script to do that. We'll manage.

    My motivation for using the gVCFs is that starting from tons of BAM files, not always well organised, with some excluded for various reasons, it's hard to keep track. On the other hand I know what gVCF files I used, the IDs must match what I have in my final VCF... much cleaner to derive any QC statistic using the combined gVCF files.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @simonsanchezj
    Hi,

    I don't think we have any scripts to share with you, but hopefully some others in the forum can help you.

    -Sheila

Sign In or Register to comment.