Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Fake GC content?
I tried GATK DiagnoseTargets tool to analyze my intervals metrics and found that INFO field GC sometimes differ from actual GC content from reference genome. There are more than 1000 intervals with GC=0.00. I supposed that this field is somewhat constant for the interval and is measured on reference sequence, but it looks like DiagnoseTargets tool has another logic for it. Is it possible to explain, how does DieagnoseTargets compute GC content?
I am using GATK Version=3.1-1
Tool was stared with default options:
java -jar GenomeAnalysisTK.jar -T DiagnoseTargets -R hg19.fa -I sample1.bam -I sample2.bam -I sample3.bam -I sample4.bam -I sample5.bam -I sample6.bam -I sample7.bam -L panel.bed -o output.vcf -missing missing.intervals
Example row from output vcf where GC=0 and actual GC content is not null (there are 61 GC on this interval):
chr11 75112674 . C <DT> . NO_READS END=75112787;GC=0.00;IDP=0.00
Thank you for your consideration!