It looks like you're new here. If you want to get involved, click one of these buttons!
Hi GATK Team
You are doing an amazing job, keep it up!
I apologise in advance if this question has come up and I've not found it within the forum, but I am quite new to all of this and would like to ask you a few questions regarding identifying structural variation from exome resequencing data:
I am trying to assess the best method to identify potential structural variants from a single bam file: One way of doing this proposed to me was to look at DP values (using UnifiedGenotyper) that are less than 5 and understandably there are inherent confounders in doing so. So I ran the same bam file through the DepthOfCoverage tool to focus on regions of interest which have zero coverage. However, when I overlaid the data from both and mapped their co-ordinates to the human genome, I have found that the overlap between the DP values and DoC regions was extremely small (<5%) - why could this be? Surely there should be more overlap? Are they therefore measuring different things? Have I done something wrong somewhere and I don't know it? I have tried to access the documentation for DepthOfCoverage to try and make sense of it but it seems unavailable on the website (http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_coverage_DepthOfCoverage.html). Please could you advise?
Below are the command lines I've been using:
java -jar GenomeAnalysisTK.jar -T DepthOfCoverage -omitBaseOutput -omitLocusTable -R referencefilename.fa -I samplefilename.bam -L regionsofinterest.txt -o outputfile.coverage java -jar GenomeAnalysisTK.jar -R referencefilename.fa -T UnifiedGenotyper -I samplefilename.bam --dbsnp dbsnpreferencefile.vcf --genotype_likelihoods_model SNP -o outputfilename.vcf --output_mode EMIT_ALL_SITES -stand_call_conf 50.0 -stand_emit_conf 0.0 -dcov 200 -L regionsofinterest.bed
Thank you in advance for your help, it is much appreciated