The current GATK version is 3.8-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

#### ☞ Got a problem?

1. Search using the upper-right search box, e.g. using the error message.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

#### ☞ Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks (  ) each to make a code block as demonstrated here.

GATK version 4.beta.3 (i.e. the third beta release) is out. See the GATK4 beta page for download and details.

# Structural Variation identification using DepthOfCoverage query

LondonMember
edited October 2012

Hi GATK Team

You are doing an amazing job, keep it up!

I apologise in advance if this question has come up and I've not found it within the forum, but I am quite new to all of this and would like to ask you a few questions regarding identifying structural variation from exome resequencing data:

I am trying to assess the best method to identify potential structural variants from a single bam file: One way of doing this proposed to me was to look at DP values (using UnifiedGenotyper) that are less than 5 and understandably there are inherent confounders in doing so. So I ran the same bam file through the DepthOfCoverage tool to focus on regions of interest which have zero coverage. However, when I overlaid the data from both and mapped their co-ordinates to the human genome, I have found that the overlap between the DP values and DoC regions was extremely small (<5%) - why could this be? Surely there should be more overlap? Are they therefore measuring different things? Have I done something wrong somewhere and I don't know it? I have tried to access the documentation for DepthOfCoverage to try and make sense of it but it seems unavailable on the website (http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_coverage_DepthOfCoverage.html). Please could you advise?

Below are the command lines I've been using:

java -jar GenomeAnalysisTK.jar -T DepthOfCoverage -omitBaseOutput -omitLocusTable -R referencefilename.fa -I samplefilename.bam -L regionsofinterest.txt -o outputfile.coverage

java -jar GenomeAnalysisTK.jar -R referencefilename.fa -T UnifiedGenotyper -I samplefilename.bam --dbsnp dbsnpreferencefile.vcf --genotype_likelihoods_model SNP -o outputfilename.vcf --output_mode EMIT_ALL_SITES -stand_call_conf 50.0 -stand_emit_conf 0.0  -dcov 200 -L regionsofinterest.bed
`

SL

Post edited by Geraldine_VdAuwera on
Tagged:

Ah, compliments will get you very far with us

Thanks for pointing out the missing doc on DepthOfCoverage -- the tech docs are generated automatically when we release a new version, so I'll need to dig into the system to find out why this one article was not. In the meantime, your best bet is to look at the comments in the source code for info on the different modes and arguments of the tool. You'll find it at this link. I apologize for the inconvenience.

To address the differences you're observing: there are a number of things that could explain them, as the DoC annotator which generates the DP field, and the DoC walker are quite different tools. For example the DP field is sensitive to downsampling. There is also the question of whether you are measuring filtered or unfiltered depth, and whether you're measuring absolute depth per position, or averaging over intervals. Incidentally, we also have a tool called CoverageBySample which is a much simpler, straightforward coverage counting tool -- less powerful but also less tricky to use. It would also be easy for you to customize it to measure exactly want you want.

I hope this helps! Good luck.

• LondonMember

Dear Geraldine
Thank you so much for your quick response, and more so for the advice given.
I am using the DP values not found in the INFO column (from UnifiedGenotyper) as I am seeking sample specific DP values, and comparing them to regions with total and average depth of zero (from DoC). Would they not be the same? If not, which depth values do you suggest I look at for looking at potential structural variants?