If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
I have a question related to CalculateHsMetrics PICARD Tool.
I am analysing data coming from an Illumina MiSeq sequencer and we enriched DNA samples using the Illumina TruSight Cardio panel.
I aligned the data with BWA with default options and then I removed duplicates using MarkDuplicates PICARD Tool.
I am interested in finding regions that have a mean coverage less than 20X. To be more precise I divided the manifest regions in sub-regions of 50 bp using bedtools make windows:
bedtools makewindows -b /storage/genomes/hg19/panels/cardio_manifest.bed -w 50 -i src | sortBed -chrThenSizeA -i - > cardio_manifest_split50.bed
I then created the interval list file using BedToIntervaList with the UNIQUE option set to false:
java -jar /usr/local/cluster/bin/picard.jar BedToIntervalList INPUT=/storage/genomes/hg19/panels/cardio_manifest_split50.bed SEQUENCE_DICTIONARY=/storage/genomes/hg19/annotation/hg19.dict OUTPUT=/storage/genomes/hg19/panels/cardioSplit50_interval.list UNIQUE=false
The command used for HsMetrics is:
java -jar /usr/local/cluster/bin/picard.jar CalculateHsMetrics BAIT_INTERVALS=/storage/genomes/hg19/panels/cardioSplit50_interval.list REFERENCE_SEQUENCE=/storage/genomes/hg19/fa/hg19.fa TARGET_INTERVALS=/storage/genomes/hg19/panels/cardioSplit50_interval.list PER_TARGET_COVERAGE=16-0162_S8_PT METRIC_ACCUMULATION_LEVEL=ALL_READS INPUT=16-0162_S8_RGSorted.bam OUTPUT=16-0162_S8_HS
The problem is that looking at the alignments in the gene HCN4 there is a portion of ~200 bp that is covered less than 20X (~16-18 reads per bp) but it is not detected by CalculateHsMetrics . To verify if the problem was related with the interval list created as explained before, I created an interval list with only two entries, the ones that are exactly in the portion that from IGV I see having a low coverage. But again the coverage evaluated from CalculateHsMetrics is higher than the one expected.
I don't understand if I am doing something wrong using the tool or if I am missing something in what is the mean coverage evaluated by PICARD. Could anyone help me in this?
Thank you so much!!