Haplotype Caller on SOLID samples reports less coverage per variant compared to --bamoutput

Hello Everyone!

I'm using the whole GATK workflow to analyze Target Resequencing data coming from SOLID platforms. I followed the Best Practices for analysis and used the proper SOLID flags when using BaseRecalibrator (--solid_recal_mode SET_Q_ZERO_BASE_N --solid_nocall_strategy PURGE_READ), however, when looking at the VCF files after Haplotype Caller something does not add up.

I checked some of the variants inside some of my samples and i found that the DP field does not report the same per base coverage value than the one that are reported by the bam (using the --bamOutput to produce a bam for Haplotype Caller) when looking at them using the IGV. As far as I understand, for each position there's a downsampling, but I'm see a lower DP value compared to the ones that are stored in the BAM
I'm attaching an IGV screenshots of one of the variants in which i'm encountering this problem. I deactivated all filtering alignment options in IGV, as well as downsampling. Here's the line Reported in the VCF for this variant:

chr17 45249306 rs62077265 T C 11069.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=-1.010;ClippingRankSum=-0.616;DB;DP=375;FS=90.048;MLEAC=1;MLEAF=0.500;MQ=59.56;MQRankSum=1.319;QD=29.52;ReadPosRankSum=2.229;SOR=0.016 GT:AD:DP:GQ:PL 0/1:150,224:374:99:11098,0,5080

As you can see from the screenshot, not only the covers differ, but a lot of reads that maps according to the reference are missing-
Does somebody has an idea of what happened to the coverage inside the VCF?

Thanks a lot for your time!

Daniele

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @wariobrega
    Hi Daniele,

    The DP in the VCF is the filtered depth. So, I suspect the missing 111 reads either have a low mapping quality or low base quality at the site.

    The AD field contains unfiltered depths, but it does not contain uninformative reads. Uninformative reads do not statistically favor one allele over another allele so they are not included in AD.

    I hope this helps.

    -Sheila

  • wariobregawariobrega RomeMember

    Hi Sheila, and thanks for the quick reply!

    Your answer clarifies me a lot of doubts I was having! However, I still don't understand why the bam generated with the --bamOutput option in Haplotype Caller should contain these reads that are filtered by HC itself. How does this option works then? I assumed that the bam generated through this options (that I'm using only for debugging) was supposed to represent how HC realigned the reads when calling the variants!

    Thanks again for your quick reply and for your kindness!

    Daniele

  • wariobregawariobrega RomeMember

    @Sheila said:
    wariobrega
    Hi Daniele,

    The DP in the VCF is the filtered depth. So, I suspect the missing 111 reads either have a low mapping quality or low base quality at the site.

    The AD field contains unfiltered depths, but it does not contain uninformative reads. Uninformative reads do not statistically favor one allele over another allele so they are not included in AD.

    I hope this helps.

    -Sheila

    Also, another thing that is now coming up to my mind: you stated that the AD consider the UNFILTERED allele depth, however, its sum is inferior to the DP, which you state being filtered. How is that possible?

  • wariobregawariobrega RomeMember
    edited August 2015

    @Sheila

    Thanks a lot for your quick reply! now it's much more clearer :D I'll dig into the articles as well ASAP!

    Thanks again,

    Daniele

Sign In or Register to comment.