Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Unexpected indel call

Hi,
we observe an unexpected deletion call (GATK UnifiedGenotyper 3.0 and 2.5) in a setup with three samples called together. In the file call.txt you find the call. From the pileup (pileup.txt, position 19:57325555) we would expect, that the indel would only be called for the first sample.
Is there another source of information, that makes GATK believe, that the deletion also occurs in the second and third sample?

Thanks in advance,
Johannes

Best Answer

Answers

  • ebanksebanks Broad InstituteMember, Broadie, Dev ✭✭✭✭

    Hi there,

    If you care about indels then I'd recommend that you switch over to using the Haplotype Caller. The Unified Genotyper's variation model isn't sophisticated enough to perform really well for indels.

  • That might be right, but we (and I suppose many others that read results from published studies) need to know if you can trust the UnifiedGenotyper indel calls.
    So, is this a bug or do I just not understand whats going on there?

  • Strange, somehow the files are gone. I'll put it here instead:

    the strange call:
    19 57325555 . TTGGCTCAGCAGCCTCCACTTC T 1806.41 . AC=3;AF=0.500;AN=6;BaseQRankSum=1.959;DP=129;FS=1.065;MLEAC=3;MLEAF=0.500;MQ=59.35;MQ0=0;MQRankSum=1.803;QD=2.00;RPA=2,1;RU=TGGCTCAGCAGCCTCCACTTC;ReadPosRankSum=0.950;STR GT:AD:DP:GQ:PL 0/1:11,16:43:99:1286,0,1342 0/1:17,9:38:99:556,0,1762 0/1:32,2:48:2:2,0,3911

    and the pileup:
    19 57325555 T 43 .$......,.,,.-21TGGCTCAGCAGCCTCCACTTC.-21TGGCTCAGCAGCCTCCACTTC.-21TGGCTCAGCAGCCTCCACTTC,,..-21TGGCTCAGCAGCCTCCACTTC,,,,-21tggctcagcagcctccacttc,-21tggctcagcagcctccacttc,-21tggctcagcagcctccacttc.-21TGGCTCAGCAGCCTCCACTTC,-21tggctcagcagcctccacttc,-21tggctcagcagcctccacttc,C,,,..-21TGGCTCAGCAGCCTCCACTTC,.-21TGGCTCAGCAGCCTCCACTTC..-21TGGCTCAGCAGCCTCCACTTC.-21TGGCTCAGCAGCCTCCACTTC,-21tggctcagcagcctccacttc,-21tggctcagcagcctccacttc., ;@[email protected]?BBABBBBBBBC???@BB;A;@!'[email protected] 37 ...,..,.........,,.,,........,,.,,,,, 9>?BAAB??-B<ABBBCBABCA?>[email protected]>>?A 48 ...,$.......,.,.....,,,..,,...,,,.,,,,,,.,,,,,,,, >?A;[email protected]@[email protected]>[email protected]

    As you can see, the second and the third sample do not have any evidence for a call in their reads. But still, GATK has 17,9 and 32,2 in its AD for the indel. So the only explanation would be soft-clipped bases at the end of some reads in the second and third sample?

    Thanks,
    Johannes

  • ebanksebanks Broad InstituteMember, Broadie, Dev ✭✭✭✭

    Right, either that or as I say the UG is just not good with large indels.

  • Ok, I just looked into IGV again and indeed I found soft-clipped reads there. Now I'm confident that we can go on using UnifiedGenotyper until HaplotypeCaller is stable and we have the capacity to recompute everything.

    Thank you so much for your help!
    Johannes

  • ebanksebanks Broad InstituteMember, Broadie, Dev ✭✭✭✭
Sign In or Register to comment.