The frontline support team will be unavailable to answer questions on April 15th and 17th 2019. We will be back soon after. Thank you for your patience and we apologize for any inconvenience!
Haplotype Caller on SOLID samples reports less coverage per variant compared to --bamoutput
I'm using the whole GATK workflow to analyze Target Resequencing data coming from SOLID platforms. I followed the Best Practices for analysis and used the proper SOLID flags when using BaseRecalibrator (--solid_recal_mode SET_Q_ZERO_BASE_N --solid_nocall_strategy PURGE_READ), however, when looking at the VCF files after Haplotype Caller something does not add up.
I checked some of the variants inside some of my samples and i found that the DP field does not report the same per base coverage value than the one that are reported by the bam (using the --bamOutput to produce a bam for Haplotype Caller) when looking at them using the IGV. As far as I understand, for each position there's a downsampling, but I'm see a lower DP value compared to the ones that are stored in the BAM
I'm attaching an IGV screenshots of one of the variants in which i'm encountering this problem. I deactivated all filtering alignment options in IGV, as well as downsampling. Here's the line Reported in the VCF for this variant:
chr17 45249306 rs62077265 T C 11069.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=-1.010;ClippingRankSum=-0.616;DB;DP=375;FS=90.048;MLEAC=1;MLEAF=0.500;MQ=59.56;MQRankSum=1.319;QD=29.52;ReadPosRankSum=2.229;SOR=0.016 GT:AD:DP:GQ:PL 0/1:150,224:374:99:11098,0,5080
As you can see from the screenshot, not only the covers differ, but a lot of reads that maps according to the reference are missing-
Does somebody has an idea of what happened to the coverage inside the VCF?
Thanks a lot for your time!