Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Multiallelic indel with HaplotypeCaller-GenotypeGivenAlleles
I'm seeing unusual behavior when trying to force genotyping of a polymorphic repeat expansion/contraction variant - rs34983651. The reference sequence starting here is CATATATATATATATAA... and the known variants insert AT, insert ATAT, or delete AT. The sample I'm using as an example here is het for the AT insertion.
Discovery genotyping result:
2 234668879 rs34983651 C CAT 3377.73 PASS AC=1;AF=0.5;AN=2;BaseQRankSum=0.523;ClippingRankSum=0;DB;DP=301;ExcessHet=3.0103;FS=4.243;MLEAC=1;MLEAF=0.5;MQ=60;MQRankSum=0;QD=13.14;ReadPosRankSum=-0.735;SOR=0.48; GT:AD:DP:GQ:PL 0/1:143,114:266:99:3415,0,4642
For the analysis I'm doing, explicit genotyping of all known alleles is important, so I'm running with Genotype Given Alleles. If my reference VCF for this site contains:
2 234668879 . CAT C,CATAT,CATATAT . . .
I get this result (inaccurate):
2 234668879 . CAT C,CATAT,CATATAT 0 LowQual AC=0,0,0;AF=0.00,0.00,0.00;AN=2;BaseQRankSum=-0.009;ClippingRankSum=0.000;DP=290;ExcessHet=3.0103;FS=3.104;MLEAC=0,0,0;MLEAF=0.00,0.00,0.00;MQ=60.00;MQRankSum=0.000;ReadPosRankSum=0.610;SOR=1.367 GT:AD:DP:GQ:PL 0/0:254,9,0,0:263:99:0,386,6080,888,6968,2147483647,888,6968,2147483647,2147483647
If I split out the insertions and deletions on separate lines in the reference file, the result is much better.
2 234668879 . CAT C . . .
2 234668879 . C CAT,CATAT . . .
2 234668879 . CAT C,CATAT,CATATAT 3204.73 . AC=0,1,0;AF=0.00,0.500,0.00;AN=2;BaseQRankSum=0.719;ClippingRankSum=0.000;DP=290;ExcessHet=3.0103;FS=4.354;MLEAC=0,1,0;MLEAF=0.00,0.500,0.00;MQ=60.00;MQRankSum=0.000;QD=12.62;ReadPosRankSum=-0.724;SOR=0.481 GT:AD:DP:GQ:PL 0/2:139,9,105,1:254:99:3242,3627,9321,0,4541,4519,3555,8697,5153,10106
This input/output combination works, but it really seems like I should be able to put all the variants on the same line (like it is in the output!), and get the right result. If I turn around and used this output file as the reference file for another sample, it behaves like the first example! This is with version 3.6-0-g89b7209.