To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

PICARD CalculateHsMetrics

Dear all,
I have a question related to CalculateHsMetrics PICARD Tool.
I am analysing data coming from an Illumina MiSeq sequencer and we enriched DNA samples using the Illumina TruSight Cardio panel.
I aligned the data with BWA with default options and then I removed duplicates using MarkDuplicates PICARD Tool.
I am interested in finding regions that have a mean coverage less than 20X. To be more precise I divided the manifest regions in sub-regions of 50 bp using bedtools make windows:

bedtools makewindows -b /storage/genomes/hg19/panels/cardio_manifest.bed -w 50 -i src | sortBed -chrThenSizeA -i - > cardio_manifest_split50.bed

I then created the interval list file using BedToIntervaList with the UNIQUE option set to false:

java -jar /usr/local/cluster/bin/picard.jar BedToIntervalList INPUT=/storage/genomes/hg19/panels/cardio_manifest_split50.bed SEQUENCE_DICTIONARY=/storage/genomes/hg19/annotation/hg19.dict OUTPUT=/storage/genomes/hg19/panels/cardioSplit50_interval.list UNIQUE=false

The command used for HsMetrics is:

java -jar /usr/local/cluster/bin/picard.jar CalculateHsMetrics BAIT_INTERVALS=/storage/genomes/hg19/panels/cardioSplit50_interval.list REFERENCE_SEQUENCE=/storage/genomes/hg19/fa/hg19.fa TARGET_INTERVALS=/storage/genomes/hg19/panels/cardioSplit50_interval.list PER_TARGET_COVERAGE=16-0162_S8_PT METRIC_ACCUMULATION_LEVEL=ALL_READS INPUT=16-0162_S8_RGSorted.bam OUTPUT=16-0162_S8_HS

The problem is that looking at the alignments in the gene HCN4 there is a portion of ~200 bp that is covered less than 20X (~16-18 reads per bp) but it is not detected by CalculateHsMetrics . To verify if the problem was related with the interval list created as explained before, I created an interval list with only two entries, the ones that are exactly in the portion that from IGV I see having a low coverage. But again the coverage evaluated from CalculateHsMetrics is higher than the one expected.
I don't understand if I am doing something wrong using the tool or if I am missing something in what is the mean coverage evaluated by PICARD. Could anyone help me in this?

Thank you so much!!

Stefania

Best Answer

Answers

Sign In or Register to comment.