Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Multiallelic indel with HaplotypeCaller-GenotypeGivenAlleles

dkolbedkolbe IowaMember
edited August 2017 in Ask the GATK team

I'm seeing unusual behavior when trying to force genotyping of a polymorphic repeat expansion/contraction variant - rs34983651. The reference sequence starting here is CATATATATATATATAA... and the known variants insert AT, insert ATAT, or delete AT. The sample I'm using as an example here is het for the AT insertion.
Discovery genotyping result:
2 234668879 rs34983651 C CAT 3377.73 PASS AC=1;AF=0.5;AN=2;BaseQRankSum=0.523;ClippingRankSum=0;DB;DP=301;ExcessHet=3.0103;FS=4.243;MLEAC=1;MLEAF=0.5;MQ=60;MQRankSum=0;QD=13.14;ReadPosRankSum=-0.735;SOR=0.48; GT:AD:DP:GQ:PL 0/1:143,114:266:99:3415,0,4642

For the analysis I'm doing, explicit genotyping of all known alleles is important, so I'm running with Genotype Given Alleles. If my reference VCF for this site contains:
2 234668879 . CAT C,CATAT,CATATAT . . .
I get this result (inaccurate):
2 234668879 . CAT C,CATAT,CATATAT 0 LowQual AC=0,0,0;AF=0.00,0.00,0.00;AN=2;BaseQRankSum=-0.009;ClippingRankSum=0.000;DP=290;ExcessHet=3.0103;FS=3.104;MLEAC=0,0,0;MLEAF=0.00,0.00,0.00;MQ=60.00;MQRankSum=0.000;ReadPosRankSum=0.610;SOR=1.367 GT:AD:DP:GQ:PL 0/0:254,9,0,0:263:99:0,386,6080,888,6968,2147483647,888,6968,2147483647,2147483647

If I split out the insertions and deletions on separate lines in the reference file, the result is much better.
Input:
2 234668879 . CAT C . . .
2 234668879 . C CAT,CATAT . . .
Output:
2 234668879 . CAT C,CATAT,CATATAT 3204.73 . AC=0,1,0;AF=0.00,0.500,0.00;AN=2;BaseQRankSum=0.719;ClippingRankSum=0.000;DP=290;ExcessHet=3.0103;FS=4.354;MLEAC=0,1,0;MLEAF=0.00,0.500,0.00;MQ=60.00;MQRankSum=0.000;QD=12.62;ReadPosRankSum=-0.724;SOR=0.481 GT:AD:DP:GQ:PL 0/2:139,9,105,1:254:99:3242,3627,9321,0,4541,4519,3555,8697,5153,10106

This input/output combination works, but it really seems like I should be able to put all the variants on the same line (like it is in the output!), and get the right result. If I turn around and used this output file as the reference file for another sample, it behaves like the first example! This is with version 3.6-0-g89b7209.

Answers

Sign In or Register to comment.