On Monday and Tuesday, November 12-13, the communications team will be out of the office for a U.S. federal holiday and a team event. We will be back in action on November 14th and apologize for any inconvenience this may cause. Thank you for using the forum.

RGQ returning 0 for high numbers of aligned reads

simono101simono101 London, UKMember
edited May 2016 in Ask the GATK team

Hi,
Just wondering what the possible reasons could be for Haplotype Caller (version: 3.5-0-g36282e4) to declare a reference genotype quality of 0 for positions where the read depth is relatively high, such as the following region:

SC1 1628 . T . . PASS DP=50;GC=33.33 GT:AD:DP:RGQ 0/0:50:50:99 SC1 1629 . T . . PASS DP=50;GC=38.1 GT:AD:DP:RGQ 0/0:50:50:99 SC1 1630 . T . . FAIL_RGQ DP=50;GC=38.1 GT:AD:DP:RGQ 0/0:45:50:5 SC1 1631 . C . . PASS DP=51;GC=33.33 GT:AD:DP:RGQ 0/0:51:51:99 SC1 1632 . A . . PASS DP=51;GC=33.33 GT:AD:DP:RGQ 0/0:51:51:96

In this instance the RGQ of the failed position above (at SC1:1630) was actually 5 (which I set as the threshold for filtering in this example), but I have plenty of instances where the read depth and resultant RGQ are like:

SC1 1630 . T . . FAIL_RGQ DP=50;GC=38.1 GT:AD:DP:RGQ ./.:45:50:5 SC1 1640 . T . . FAIL_RGQ DP=48;GC=38.1 GT:AD:DP:RGQ ./.:34:48:0 SC1 1805 . T . . FAIL_RGQ DP=36;GC=33.33 GT:AD:DP:RGQ ./.:32:36:0 SC1 2046 . A . . FAIL_RGQ DP=37;GC=19.05 GT:AD:DP:RGQ ./.:33:37:2 SC1 2345 . A . . FAIL_RGQ DP=105;GC=23.81 GT:AD:DP:RGQ ./.:90:105:0 SC1 2352 . A . . FAIL_RGQ DP=116;GC=19.05 GT:AD:DP:RGQ ./.:103:116:0 SC1 2356 . C . . FAIL_RGQ DP=112;GC=23.81 GT:AD:DP:RGQ ./.:100:112:0 SC1 2359 . G . . FAIL_RGQ DP=111;GC=28.57 GT:AD:DP:RGQ ./.:99:111:0

It feels like something funny is going on. Should it be possible for RGQ to be so low with such high depth? Also, I thought the AD format tag gave the count of unfiltered reads, whilst the format DP tag gave the filtered read depth (i.e. reads HC finds informative). Therefore shouldn't the AD count always be at least as high as the DP count?

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @simono101
    Hi,

    Can you please post the GVCF records for those questionable sites?

    As for AD vs DP, have a look at this article.

    -Sheila

  • simono101simono101 London, UKMember

    Dear Sheila,
    Thanks for the speedy response. I thought it might be more helpful if I included the raw gvcf and header so I have attached the gvcf of the first 2400 positions (which include those above). As it also contains the header info you can also see if I maybe did something upstream that was really dumb and caused the strange RGQ I see after genotyping.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @simono101
    Hi,

    Okay. What was the exact command you ran to produce the final VCF? And, how many samples are in your final VCF?

    It is indeed possible to have bad GQ at sites with high depth. Some reasons could be low mapping quality or low base quality reads at the site. Also, in the article I pointed you to above, the uninformative reads could also cause a low quality call. You can also compare the original BAM file to the bamout file and see what is happening after reassembly. Can you post a few BAM and bamout IGV screenshots of the questionable sites above? Please include about 100 bases before and after the sites of interest.

    Thanks,
    Sheila

Sign In or Register to comment.