To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

Many reference reads found with GQ=0 and cannot be visualized

xin.huangxin.huang Washington, DCMember

Dear GATK Team,

I've seen a phenomenon that I've been struggling to explain:

Using HaplotypeCaller, at one particular locus, there seems to be 192 reference reads, however, the GQ is zero, with the output GT:DP:GQ:MIN_DP:PL 0/0:192:0:192:0,0,0

I used the -ERC GVCF flag to get reference coverage as well. I also tried the --emitDroppedReads with the -bamout flags to get any reads that aligned to this region, but still couldn't see anything from IGV.

Would you have any insight into the cause of this GQ=0 with so many reads?

Many thanks,

Xin

Tagged:

Answers

  • xin.huangxin.huang Washington, DCMember

    To be more specific, I used the following command:
    $ java -Xmx10g -jar /pathto/GenomeAnalysisTK-3.6.jar -T HaplotypeCaller --dontUseSoftClippedBases -R /pathto/genome -I input.bam -o output.vcf -L JXUM01S006308 -ERC GVCF -variant_index_type LINEAR -variant_index_parameter 128000 --emitDroppedReads -bamout bamout/30-3H_JXUM01S006308_allReads.bam

    The region I'm interested in has the following GVCF output (coordinate 41290 has the GQ=0 output):
    JXUM01S006308 40800 . C . . END=40824 GT:DP:GQ:MIN_DP:PL 0/0:188:99:179:0,120,1800
    JXUM01S006308 40825 . C . . END=40825 GT:DP:GQ:MIN_DP:PL 0/0:181:0:181:0,0,2139
    JXUM01S006308 40826 . A . . END=40834 GT:DP:GQ:MIN_DP:PL 0/0:179:99:175:0,120,1800
    JXUM01S006308 40835 . A . . END=40835 GT:DP:GQ:MIN_DP:PL 0/0:173:0:173:0,0,1027
    JXUM01S006308 40836 . T . . END=40863 GT:DP:GQ:MIN_DP:PL 0/0:172:99:165:0,120,1800
    JXUM01S006308 40864 . T . . END=40864 GT:DP:GQ:MIN_DP:PL 0/0:168:0:168:0,0,2263
    JXUM01S006308 40865 . T . . END=40875 GT:DP:GQ:MIN_DP:PL 0/0:169:99:168:0,120,1800
    JXUM01S006308 40876 . G . . END=40876 GT:DP:GQ:MIN_DP:PL 0/0:170:0:170:0,0,1386
    JXUM01S006308 40877 . T . . END=40890 GT:DP:GQ:MIN_DP:PL 0/0:165:99:155:0,120,1800
    JXUM01S006308 40891 . T . . END=40891 GT:DP:GQ:MIN_DP:PL 0/0:151:0:151:0,0,1429
    JXUM01S006308 40892 . T . . END=40893 GT:DP:GQ:MIN_DP:PL 0/0:150:99:150:0,120,1800
    JXUM01S006308 40894 . T . . END=40894 GT:DP:GQ:MIN_DP:PL 0/0:146:0:146:0,0,1069
    JXUM01S006308 40895 . G . . END=40897 GT:DP:GQ:MIN_DP:PL 0/0:145:99:144:0,120,1800
    JXUM01S006308 40898 . A . . END=40898 GT:DP:GQ:MIN_DP:PL 0/0:141:0:141:0,0,908
    JXUM01S006308 40899 . C . . END=40900 GT:DP:GQ:MIN_DP:PL 0/0:141:99:140:0,120,1800
    JXUM01S006308 40901 . G . . END=40901 GT:DP:GQ:MIN_DP:PL 0/0:132:0:132:0,0,1137
    JXUM01S006308 40902 . T . . END=40902 GT:DP:GQ:MIN_DP:PL 0/0:131:99:131:0,120,1800
    JXUM01S006308 40903 . G . . END=40903 GT:DP:GQ:MIN_DP:PL 0/0:130:0:130:0,0,1215
    JXUM01S006308 40904 . C . . END=40919 GT:DP:GQ:MIN_DP:PL 0/0:127:99:123:0,120,1800
    JXUM01S006308 40920 . T . . END=41096 GT:DP:GQ:MIN_DP:PL 0/0:0:0:0:0,0,0
    JXUM01S006308 41097 . T . . END=41106 GT:DP:GQ:MIN_DP:PL 0/0:35:99:35:0,105,890
    JXUM01S006308 41107 . T . . END=41107 GT:DP:GQ:MIN_DP:PL 0/0:40:0:40:0,0,0
    JXUM01S006308 41108 . C . . END=41109 GT:DP:GQ:MIN_DP:PL 0/0:51:99:51:0,120,1800
    JXUM01S006308 41110 . A . . END=41110 GT:DP:GQ:MIN_DP:PL 0/0:51:0:51:0,0,0
    JXUM01S006308 41111 . C . . END=41118 GT:DP:GQ:MIN_DP:PL 0/0:62:99:60:0,120,1800
    JXUM01S006308 41119 . G . . END=41119 GT:DP:GQ:MIN_DP:PL 0/0:63:0:63:0,0,0
    JXUM01S006308 41120 . C . . END=41128 GT:DP:GQ:MIN_DP:PL 0/0:80:99:80:0,120,1800
    JXUM01S006308 41129 . G . . END=41129 GT:DP:GQ:MIN_DP:PL 0/0:83:0:83:0,0,1383
    JXUM01S006308 41130 . A . . END=41136 GT:DP:GQ:MIN_DP:PL 0/0:92:99:84:0,120,1800
    JXUM01S006308 41137 . T . . END=41138 GT:DP:GQ:MIN_DP:PL 0/0:91:0:90:0,0,0
    JXUM01S006308 41139 . T . . END=41142 GT:DP:GQ:MIN_DP:PL 0/0:100:99:100:0,120,1800
    JXUM01S006308 41143 . T . . END=41143 GT:DP:GQ:MIN_DP:PL 0/0:101:0:101:0,0,0
    JXUM01S006308 41144 . G . . END=41148 GT:DP:GQ:MIN_DP:PL 0/0:111:99:111:0,120,1800
    JXUM01S006308 41149 . T . . END=41149 GT:DP:GQ:MIN_DP:PL 0/0:111:0:111:0,0,0
    JXUM01S006308 41150 . G . . END=41163 GT:DP:GQ:MIN_DP:PL 0/0:116:99:111:0,120,1800
    JXUM01S006308 41164 . A . . END=41164 GT:DP:GQ:MIN_DP:PL 0/0:114:0:114:0,0,0
    JXUM01S006308 41165 . C . . END=41175 GT:DP:GQ:MIN_DP:PL 0/0:119:99:117:0,120,1800
    JXUM01S006308 41176 . G . . END=41176 GT:DP:GQ:MIN_DP:PL 0/0:117:0:117:0,0,1878
    JXUM01S006308 41177 . C . . END=41187 GT:DP:GQ:MIN_DP:PL 0/0:118:99:116:0,120,1800
    JXUM01S006308 41188 . T . . END=41188 GT:DP:GQ:MIN_DP:PL 0/0:112:0:112:0,0,1755
    JXUM01S006308 41189 . G . . END=41193 GT:DP:GQ:MIN_DP:PL 0/0:119:99:118:0,120,1800
    JXUM01S006308 41194 . T . . END=41194 GT:DP:GQ:MIN_DP:PL 0/0:107:0:107:0,0,0
    JXUM01S006308 41195 . C . . END=41196 GT:DP:GQ:MIN_DP:PL 0/0:120:99:118:0,120,1800
    JXUM01S006308 41197 . G . . END=41197 GT:DP:GQ:MIN_DP:PL 0/0:118:0:118:0,0,1360
    JXUM01S006308 41198 . A . . END=41204 GT:DP:GQ:MIN_DP:PL 0/0:122:99:121:0,120,1800
    JXUM01S006308 41205 . A . . END=41205 GT:DP:GQ:MIN_DP:PL 0/0:119:0:119:0,0,981
    JXUM01S006308 41206 . C . . END=41208 GT:DP:GQ:MIN_DP:PL 0/0:120:99:118:0,120,1800
    JXUM01S006308 41209 . A . . END=41209 GT:DP:GQ:MIN_DP:PL 0/0:113:0:113:0,0,0
    JXUM01S006308 41210 . A . . END=41220 GT:DP:GQ:MIN_DP:PL 0/0:125:99:122:0,120,1800
    JXUM01S006308 41221 . A . . END=41221 GT:DP:GQ:MIN_DP:PL 0/0:122:0:122:0,0,0
    JXUM01S006308 41222 . G . . END=41229 GT:DP:GQ:MIN_DP:PL 0/0:122:99:121:0,120,1800
    JXUM01S006308 41230 . C . . END=41230 GT:DP:GQ:MIN_DP:PL 0/0:124:0:124:0,0,2046
    JXUM01S006308 41231 . G . . END=41250 GT:DP:GQ:MIN_DP:PL 0/0:147:99:134:0,120,1800
    JXUM01S006308 41251 . T . . END=41251 GT:DP:GQ:MIN_DP:PL 0/0:150:0:150:0,0,0
    JXUM01S006308 41252 . A . . END=41262 GT:DP:GQ:MIN_DP:PL 0/0:169:99:157:0,120,1800
    JXUM01S006308 41263 . G . . END=41263 GT:DP:GQ:MIN_DP:PL 0/0:183:0:183:0,0,2558
    JXUM01S006308 41264 . A . . END=41289 GT:DP:GQ:MIN_DP:PL 0/0:189:99:184:0,120,1800
    JXUM01S006308 41290 . T . . END=41290 GT:DP:GQ:MIN_DP:PL 0/0:192:0:192:0,0,0
    JXUM01S006308 41291 . G . . END=41292 GT:DP:GQ:MIN_DP:PL 0/0:196:99:193:0,120,1800
    JXUM01S006308 41293 . T . . END=41293 GT:DP:GQ:MIN_DP:PL 0/0:189:0:189:0,0,2457
    JXUM01S006308 41294 . C . . END=41307 GT:DP:GQ:MIN_DP:PL 0/0:190:99:188:0,120,1800
    JXUM01S006308 41308 . C . . END=41308 GT:DP:GQ:MIN_DP:PL 0/0:190:0:190:0,0,0
    JXUM01S006308 41309 . C . . END=41332 GT:DP:GQ:MIN_DP:PL 0/0:198:99:187:0,120,1800

    And the bamout visualized in IGV is attached. 40820-41313 is missing any coverage, however, there's clearly sufficient reference coverage for this interval seen in the GVCF.

    Thank you for any insight you might have.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @xin.huang
    Hi Xin,

    For reference sites, the original BAM file depth is used. I suspect the coverage is high in the original BAM file.

    -Sheila

  • xin.huangxin.huang Washington, DCMember

    @Sheila
    Hi Sheila,

    Thanks for your clarification. I've looked at this site in the original BAM files and indeed, there's ample coverage there. Please see the screenshot attached from IGV. 23-2 and 30-3 are two hybrid lines that I have, and H (high) and L (low) indicates two extremes of a phenotype.

    In the SNP calling results, there's a fixed SNP called in both Ls, and all reference in both Hs, with a lot of coverage (>130), but a GQ of 0. The ratio kinda made sense in the original BAM, but there seems to be quite a lot of alternative reads in the Hs as well. I assume that these would change after HaplotypeCaller de novo assembling the reads and those alternative reads would be aligned elsewhere. Is that reasonable? It seems like using the original read depth for reference sites could be potentially an issue in this kind of situations, because we don't know how many actual reference reads are aligned to this locus after HaplotypeCaller. And the GQ of 0 could be resulted by this kind of re-assembly of the reads, and the program is not confident at the genotype calling at this locus after HaplotypeCaller? And would it be reasonable to assume that if there's a GQ of 0, read coverage values are not that reliable anymore?

    Thanks for your help and I will submit a bug report for when the GQ=0 but still there's a genotype call in the joint genotype calling in GATK 3.6.

    Sincerely,

    Xin

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @xin.huang
    Hi Xin,

    I will have a look at your test files and get back to you.

    Thanks for sharing.

    -Sheila

  • xin.huangxin.huang Washington, DCMember

    @Sheila

    Hi Sheila,

    Thanks for getting back to me. I'll have to submit the bug report after August, as my defense is less than a month away...

    Best,

    Xin

Sign In or Register to comment.