The Frontline Support team will be offline February 18 for President's Day but will be back February 19th. Thank you for your patience as we get to all of your questions!
Fake GC content?
I tried GATK DiagnoseTargets tool to analyze my intervals metrics and found that INFO field GC sometimes differ from actual GC content from reference genome. There are more than 1000 intervals with GC=0.00. I supposed that this field is somewhat constant for the interval and is measured on reference sequence, but it looks like DiagnoseTargets tool has another logic for it. Is it possible to explain, how does DieagnoseTargets compute GC content?
I am using GATK Version=3.1-1
Tool was stared with default options:
java -jar GenomeAnalysisTK.jar -T DiagnoseTargets -R hg19.fa -I sample1.bam -I sample2.bam -I sample3.bam -I sample4.bam -I sample5.bam -I sample6.bam -I sample7.bam -L panel.bed -o output.vcf -missing missing.intervals
Example row from output vcf where GC=0 and actual GC content is not null (there are 61 GC on this interval):
chr11 75112674 . C <DT> . NO_READS END=75112787;GC=0.00;IDP=0.00
Thank you for your consideration!