This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
I have a question related to CalculateHsMetrics PICARD Tool.
I am analysing data coming from an Illumina MiSeq sequencer and we enriched DNA samples using the Illumina TruSight Cardio panel.
I aligned the data with BWA with default options and then I removed duplicates using MarkDuplicates PICARD Tool.
I am interested in finding regions that have a mean coverage less than 20X. To be more precise I divided the manifest regions in sub-regions of 50 bp using bedtools make windows:
bedtools makewindows -b /storage/genomes/hg19/panels/cardio_manifest.bed -w 50 -i src | sortBed -chrThenSizeA -i - > cardio_manifest_split50.bed
I then created the interval list file using BedToIntervaList with the UNIQUE option set to false:
java -jar /usr/local/cluster/bin/picard.jar BedToIntervalList INPUT=/storage/genomes/hg19/panels/cardio_manifest_split50.bed SEQUENCE_DICTIONARY=/storage/genomes/hg19/annotation/hg19.dict OUTPUT=/storage/genomes/hg19/panels/cardioSplit50_interval.list UNIQUE=false
The command used for HsMetrics is:
java -jar /usr/local/cluster/bin/picard.jar CalculateHsMetrics BAIT_INTERVALS=/storage/genomes/hg19/panels/cardioSplit50_interval.list REFERENCE_SEQUENCE=/storage/genomes/hg19/fa/hg19.fa TARGET_INTERVALS=/storage/genomes/hg19/panels/cardioSplit50_interval.list PER_TARGET_COVERAGE=16-0162_S8_PT METRIC_ACCUMULATION_LEVEL=ALL_READS INPUT=16-0162_S8_RGSorted.bam OUTPUT=16-0162_S8_HS
The problem is that looking at the alignments in the gene HCN4 there is a portion of ~200 bp that is covered less than 20X (~16-18 reads per bp) but it is not detected by CalculateHsMetrics . To verify if the problem was related with the interval list created as explained before, I created an interval list with only two entries, the ones that are exactly in the portion that from IGV I see having a low coverage. But again the coverage evaluated from CalculateHsMetrics is higher than the one expected.
I don't understand if I am doing something wrong using the tool or if I am missing something in what is the mean coverage evaluated by PICARD. Could anyone help me in this?
Thank you so much!!