Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Calling Multi-allelic sites in pooled samples

Hi,

Is GATK Unified genotyper able to call multi-allelic positions in a single pooled sample? Case is a pool of 13 samples, we use UG with ploidy set to 26. If I understand the supplementaries of the original publications correct, UG will never be able to call three alleles at a single position. in single sample calling. Or does this not hold for high ploidy analysis?

If needed, we can call multiple pools together, but this becomes computationally intensive.

In summary, we would like to call a 14xG,6xA,6xT call for example.

Also, how does UG take noise into account when genotyping (sequencing errors), when for example 3% of reads is aberrant at a position, this could correspond to ~ 1/26.

Thanks for any guidelines,

geert

Best Answers

Answers

  • Hi, thanks for that information. It's good to know that we're not missing multi-allelic sites, would they be present in the data.

    For the noise modeling, this is still not completely clear to me. As UG is a locus-walker, does it take sample wide noise estimations into account when calculating likelihoods? We have a significant amount of variants called as 1/26 (~3.8%) alleles being aberrant with read fractions of just 1% , while a quick distribution of the noise (non-reference bases/coverage depth for each position in the target region) tells us that we would need > 5% of aberrant reads in the total coverage fraction to reach a significant signal.

    an example vcf line:

    chr22 33040147 . C A 42.88 . AC=1;AF=0.038;AN=26;BaseQRankSum=-3.430;DP=1080;Dels=0.00;FS=0.000;HaplotypeScore=20.6747;MLEAC=1;MLEAF=0.038;MQ=60.00;MQ0=0;MQRankSum=-0.279;QD=0.04;ReadPosRankSum=-0.572 T:AD:DP:GQ:MLPSAC:MLPSAF 0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/1:1070,10:1080:67:1:0.038

    Or would UG rather call the variants anyway and leave it up to filtering / VQSR to decide whether we trust the variant or not?

  • ok, thank you for that explanation.

    Geert

Sign In or Register to comment.