We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Haplotype caller: StrandAlleleCountsBySample (SAC) and multiallelic sites

I'm running haplotype caller (latest nightly build) with -A StrandAlleleCountsBySample parameter to get strand specific read counts (SAC). For variants with more than the default 6 maximal alt alleles there is a problem with the SAC field:

2 47641559 . TAAAAAAAAAAA T,TA,TAA,TAAA,TAAAA,TAAAAAA,<NON_REF> 1308.73 . BaseQRankSum=0.434;ClippingRankSum=0.768;DP=105;ExcessHet=3.0103;MLEAC=0,0,0,0,0,1,1;MLEAF=0.00,0.00,0.00,0.00,0.00,0.500,0.500;MQRankSum=-1.704;RAW_MQ=378000.00;ReadPosRankSum=1.971 GT:AD:DP:GQ:PL:SAC:SB 6/7:3,0,0,3,4,5,16,9:40:99:1346,1509,3479,1488,3459,3455,1204,2706,2706,2585,989,2303,2303,2215,2132,692,1723,1720,1604,1576,1507,277,1002,983,781,714,657,745,268,447,447,355,313,232,0,147:3,0,0,0,0,0,0,3,1,3,0,5,0,16,0,0:3,0,1,27

So there are 9 reads originating from another than one of the given alt alleles (=NON_REF), but the SAC field is missing these reads. This gets especially annoying if one of the NON_REF alleles is selected as most likely when combining the sample with others in GenotypeGVCFs.

Another example:
11 108141955 . CTTTT C,CT,CTT,CTTT,ATTTT,TTTTT,<NON_REF> 1552.73 . BaseQRankSum=-0.227;DP=704;ExcessHet=3.0103;MLEAC=0,0,0,1,0,0,0;MLEAF=0.00,0.00,0.00,0.500,0.00,0.00,0.00;MQ=60.02;MQRankSum=-0.254;ReadPosRankSum=1.249 GT:AD:DP:GQ:PL:SAC:SB 0/4:431,5,4,27,127,4,3,3:604:99:1590,3247,26394,3043,24416,23841,2156,18595,18550,17063,0,11232,11190,10498,9517,3572,20205,18965,15617,10454,20237,3558,20037,18797,15420,10362,19931,19926,2484,13421,13344,12357,9074,12834,12837,11563:213,218,2,3,2,2,14,13,54,73,0,4,0,3,0,0:213,218,72,98

Issue · Github
by Sheila

Issue Number
341
State
closed
Last Updated
Closed By
chandrans

Answers

Sign In or Register to comment.