UnifiedGenotyper miss some alleles while using GENOTYPE_GIVEN_ALLELES mode

jchoojchoo Member
edited July 2017 in Ask the GATK team

Dear GATK team,
We are using UnifiedGenotyper GENOTYPE_GIVEN_ALLELES modes to do genotyping, but we found that not all given alleles were genotyped.
For example, the input vcf is:

    13      20763485        .       AG      A       30      PASS    AC=1;AF=0.500;AN=2;set=variant2 GT:GQ   ./.     0/1:30
    13      20763485        .       A       G       30      PASS    AC=1;AF=0.500;AN=2;set=variant  GT:GQ   0/1:30  ./.

in the output vcf, we only got genotypes:

13  20763485    rs80338943  AG  A   0   LowQual AC=0;AF=0.00;AN=2;BaseQRankSum=1.075;DB;DP=820;ExcessHet=3.0103;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=59.95;MQ0=0;MQRankSum=0.074;RPA=3,2;RU=G;ReadPosRankSum=1.087;SOR=1.546;STR  GT:AD:DP:GQ:PL  0/0:819,1:820:99:0,2462,36389

another allele were missed
the command we used is:

java -jar GenomeAnalysisTK-3.6.jar -T UnifiedGenotyper -mbq 10 -stand_call_conf 20 -dt NONE -R hs37d5.fa -I S44-EL-20-1.recal.bam -D dbsnp147_GRCH37_All_20160601.vcf --genotyping_mode GENOTYPE_GIVEN_ALLELES --alleles input.vcf -L input.vcf -o output.vcf --output_mode EMIT_ALL_SITES

Is there any options we can make UnifiedGenotyper output all alleles?
Thanks a lot

Post edited by shlee on

Comments

  • shleeshlee CambridgeMember, Administrator, Broadie, Moderator admin

    Hi @jchoo,

    When using --genotyping_mode GENOTYPE_GIVEN_ALLELES I believe the allele representations must match exactly. Is it possible that the --alleles file only has the first variant but not the second?

    Also, is there a reason why you are not using HaplotypeCaller?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @jchoo
    Hi,

    Can you also post an IGV Screenshot of the BAM file at that position? If the A/G SNP is not present in the BAM file, it will not be output in the VCF.

    -Sheila

  • @shlee @Sheila I found if we combine two variants into one line (multiple allele), GATK can output this three genotype results, but another question is the GQ of this variants is 0.
    9 34648361 rs111033738 GC AC,G 0 LowQual AC=0,0;AF=0.00,0.00;AN=2;DB;DP=663;ExcessHet=3.0103;FS=0.000;MLEAC=0,0;MLEAF=0.00,0.00;MQ=60.00;MQ0=0;SOR=0.693 GT:AD:DP:GQ:PL 0/0:0,0,0:662:0:0,0,0,1992,1992,29179

  • Hi, my question is pretty simple, I have a batch of sites (may be multiple allele in one site), I would like to genotype all this sites, output all alleles' depth and genotype quality and genotypes. Which tools can do this in GATK?
    As I know, HC may miss parts of the sites, UG can call all sites, but in a multiple allele site, the GQ of all allele is zero.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @jchoo
    Hi,

    It looks like the tool is not confident in any of the reads supporting the alleles. Notice the DP of 662, but ADs of 0. Have a look at this article.

    We recommend HaplotypeCaller for germline variant calling. I am not sure what you mean by "miss parts of the sites". In some cases, when the tool is not confident in a variant call, it will not be emitted. If you need to emit low quality sites, you can lower the --standard_min_confidence_threshold_for_calling.

    -Sheila

Sign In or Register to comment.