The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Powered by Vanilla. Made with Bootstrap.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.
Register now for the upcoming GATK Best Practices workshop, Feb 20-22 in Leuven, Belgium. Open to all comers! More info and signup at http://bit.ly/2i4mGxz

PICARD CalculateHsMetrics

merella.stefaniamerella.stefania ItalyMember Posts: 24

Dear all,
I have a question related to CalculateHsMetrics PICARD Tool.
I am analysing data coming from an Illumina MiSeq sequencer and we enriched DNA samples using the Illumina TruSight Cardio panel.
I aligned the data with BWA with default options and then I removed duplicates using MarkDuplicates PICARD Tool.
I am interested in finding regions that have a mean coverage less than 20X. To be more precise I divided the manifest regions in sub-regions of 50 bp using bedtools make windows:

bedtools makewindows -b /storage/genomes/hg19/panels/cardio_manifest.bed -w 50 -i src | sortBed -chrThenSizeA -i - > cardio_manifest_split50.bed

I then created the interval list file using BedToIntervaList with the UNIQUE option set to false:

java -jar /usr/local/cluster/bin/picard.jar BedToIntervalList INPUT=/storage/genomes/hg19/panels/cardio_manifest_split50.bed SEQUENCE_DICTIONARY=/storage/genomes/hg19/annotation/hg19.dict OUTPUT=/storage/genomes/hg19/panels/cardioSplit50_interval.list UNIQUE=false

The command used for HsMetrics is:

java -jar /usr/local/cluster/bin/picard.jar CalculateHsMetrics BAIT_INTERVALS=/storage/genomes/hg19/panels/cardioSplit50_interval.list REFERENCE_SEQUENCE=/storage/genomes/hg19/fa/hg19.fa TARGET_INTERVALS=/storage/genomes/hg19/panels/cardioSplit50_interval.list PER_TARGET_COVERAGE=16-0162_S8_PT METRIC_ACCUMULATION_LEVEL=ALL_READS INPUT=16-0162_S8_RGSorted.bam OUTPUT=16-0162_S8_HS

The problem is that looking at the alignments in the gene HCN4 there is a portion of ~200 bp that is covered less than 20X (~16-18 reads per bp) but it is not detected by CalculateHsMetrics . To verify if the problem was related with the interval list created as explained before, I created an interval list with only two entries, the ones that are exactly in the portion that from IGV I see having a low coverage. But again the coverage evaluated from CalculateHsMetrics is higher than the one expected.
I don't understand if I am doing something wrong using the tool or if I am missing something in what is the mean coverage evaluated by PICARD. Could anyone help me in this?

Thank you so much!!

Stefania

Best Answer

Answers

Sign In or Register to comment.