we observe an unexpected deletion call (GATK UnifiedGenotyper 3.0 and 2.5) in a setup with three samples called together. In the file call.txt you find the call. From the pileup (pileup.txt, position 19:57325555) we would expect, that the indel would only be called for the first sample.
Is there another source of information, that makes GATK believe, that the deletion also occurs in the second and third sample?

  • ebanksebanks Broad InstituteMember, Broadie, Dev

    If you care about indels then I'd recommend that you switch over to using the Haplotype Caller. The Unified Genotyper's variation model isn't sophisticated enough to perform really well for indels.

  • That might be right, but we (and I suppose many others that read results from published studies) need to know if you can trust the UnifiedGenotyper indel calls.
    So, is this a bug or do I just not understand whats going on there?

  • Strange, somehow the files are gone. I'll put it here instead:

    the strange call:
    19 57325555 . TTGGCTCAGCAGCCTCCACTTC T 1806.41 . AC=3;AF=0.500;AN=6;BaseQRankSum=1.959;DP=129;FS=1.065;MLEAC=3;MLEAF=0.500;MQ=59.35;MQ0=0;MQRankSum=1.803;QD=2.00;RPA=2,1;RU=TGGCTCAGCAGCCTCCACTTC;ReadPosRankSum=0.950;STR GT:AD:DP:GQ:PL 0/1:11,16:43:99:1286,0,1342 0/1:17,9:38:99:556,0,1762 0/1:32,2:48:2:2,0,3911

    and the pileup:
    19 57325555 T 43 .$......,.,,.-21TGGCTCAGCAGCCTCCACTTC.-21TGGCTCAGCAGCCTCCACTTC.-21TGGCTCAGCAGCCTCCACTTC,,..-21TGGCTCAGCAGCCTCCACTTC,,,,-21tggctcagcagcctccacttc,-21tggctcagcagcctccacttc,-21tggctcagcagcctccacttc.-21TGGCTCAGCAGCCTCCACTTC,-21tggctcagcagcctccacttc,-21tggctcagcagcctccacttc,C,,,..-21TGGCTCAGCAGCCTCCACTTC,.-21TGGCTCAGCAGCCTCCACTTC..-21TGGCTCAGCAGCCTCCACTTC.-21TGGCTCAGCAGCCTCCACTTC,-21tggctcagcagcctccacttc,-21tggctcagcagcctccacttc., ;@AAAAABBCCBBB@?BBABBBBBBBC???@BB;A;@!'!!A@ 37 ...,..,.........,,.,,........,,.,,,,, 9>?BAAB??-B<ABBBCBABCA?>BABAB@BAB>>?A 48 ...,$.......,.,.....,,,..,,...,,,.,,,,,,.,,,,,,,, >?A;A@AA@AABBBBABBBBBBBBCC8B=BBCBBBBBA@BBB>ABA@A

    As you can see, the second and the third sample do not have any evidence for a call in their reads. But still, GATK has 17,9 and 32,2 in its AD for the indel. So the only explanation would be soft-clipped bases at the end of some reads in the second and third sample?


  • ebanksebanks Broad InstituteMember, Broadie, Dev

    Right, either that or as I say the UG is just not good with large indels.

  • Ok, I just looked into IGV again and indeed I found soft-clipped reads there. Now I'm confident that we can go on using UnifiedGenotyper until HaplotypeCaller is stable and we have the capacity to recompute everything.

  • ebanksebanks Broad InstituteMember, Broadie, Dev
