This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Haplotype caller: StrandAlleleCountsBySample (SAC) and multiallelic sites
I'm running haplotype caller (latest nightly build) with -A StrandAlleleCountsBySample parameter to get strand specific read counts (SAC). For variants with more than the default 6 maximal alt alleles there is a problem with the SAC field:
2 47641559 . TAAAAAAAAAAA T,TA,TAA,TAAA,TAAAA,TAAAAAA,<NON_REF> 1308.73 . BaseQRankSum=0.434;ClippingRankSum=0.768;DP=105;ExcessHet=3.0103;MLEAC=0,0,0,0,0,1,1;MLEAF=0.00,0.00,0.00,0.00,0.00,0.500,0.500;MQRankSum=-1.704;RAW_MQ=378000.00;ReadPosRankSum=1.971 GT:AD:DP:GQ:PL:SAC:SB 6/7:3,0,0,3,4,5,16,9
So there are 9 reads originating from another than one of the given alt alleles (=NON_REF), but the SAC field is missing these reads. This gets especially annoying if one of the NON_REF alleles is selected as most likely when combining the sample with others in GenotypeGVCFs.
11 108141955 . CTTTT C,CT,CTT,CTTT,ATTTT,TTTTT,<NON_REF> 1552.73 . BaseQRankSum=-0.227;DP=704;ExcessHet=3.0103;MLEAC=0,0,0,1,0,0,0;MLEAF=0.00,0.00,0.00,0.500,0.00,0.00,0.00;MQ=60.02;MQRankSum=-0.254;ReadPosRankSum=1.249 GT:AD:DP:GQ:PL:SAC:SB 0/4:431,5,4,27,127,4,3,3