To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

UnifiedGenotyper miss some alleles while using GENOTYPE_GIVEN_ALLELES mode

jchoojchoo Member
edited July 2017 in Ask the GATK team

Dear GATK team,
We are using UnifiedGenotyper GENOTYPE_GIVEN_ALLELES modes to do genotyping, but we found that not all given alleles were genotyped.
For example, the input vcf is:

    13      20763485        .       AG      A       30      PASS    AC=1;AF=0.500;AN=2;set=variant2 GT:GQ   ./.     0/1:30
    13      20763485        .       A       G       30      PASS    AC=1;AF=0.500;AN=2;set=variant  GT:GQ   0/1:30  ./.

in the output vcf, we only got genotypes:

13  20763485    rs80338943  AG  A   0   LowQual AC=0;AF=0.00;AN=2;BaseQRankSum=1.075;DB;DP=820;ExcessHet=3.0103;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=59.95;MQ0=0;MQRankSum=0.074;RPA=3,2;RU=G;ReadPosRankSum=1.087;SOR=1.546;STR  GT:AD:DP:GQ:PL  0/0:819,1:820:99:0,2462,36389

another allele were missed
the command we used is:

java -jar GenomeAnalysisTK-3.6.jar -T UnifiedGenotyper -mbq 10 -stand_call_conf 20 -dt NONE -R hs37d5.fa -I S44-EL-20-1.recal.bam -D dbsnp147_GRCH37_All_20160601.vcf --genotyping_mode GENOTYPE_GIVEN_ALLELES --alleles input.vcf -L input.vcf -o output.vcf --output_mode EMIT_ALL_SITES

Is there any options we can make UnifiedGenotyper output all alleles?
Thanks a lot

Post edited by shlee on

Comments

  • shleeshlee CambridgeMember, Broadie, Moderator

    Hi @jchoo,

    When using --genotyping_mode GENOTYPE_GIVEN_ALLELES I believe the allele representations must match exactly. Is it possible that the --alleles file only has the first variant but not the second?

    Also, is there a reason why you are not using HaplotypeCaller?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @jchoo
    Hi,

    Can you also post an IGV Screenshot of the BAM file at that position? If the A/G SNP is not present in the BAM file, it will not be output in the VCF.

    -Sheila

  • @shlee @Sheila I found if we combine two variants into one line (multiple allele), GATK can output this three genotype results, but another question is the GQ of this variants is 0.
    9 34648361 rs111033738 GC AC,G 0 LowQual AC=0,0;AF=0.00,0.00;AN=2;DB;DP=663;ExcessHet=3.0103;FS=0.000;MLEAC=0,0;MLEAF=0.00,0.00;MQ=60.00;MQ0=0;SOR=0.693 GT:AD:DP:GQ:PL 0/0:0,0,0:662:0:0,0,0,1992,1992,29179

  • Hi, my question is pretty simple, I have a batch of sites (may be multiple allele in one site), I would like to genotype all this sites, output all alleles' depth and genotype quality and genotypes. Which tools can do this in GATK?
    As I know, HC may miss parts of the sites, UG can call all sites, but in a multiple allele site, the GQ of all allele is zero.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @jchoo
    Hi,

    It looks like the tool is not confident in any of the reads supporting the alleles. Notice the DP of 662, but ADs of 0. Have a look at this article.

    We recommend HaplotypeCaller for germline variant calling. I am not sure what you mean by "miss parts of the sites". In some cases, when the tool is not confident in a variant call, it will not be emitted. If you need to emit low quality sites, you can lower the --standard_min_confidence_threshold_for_calling.

    -Sheila

Sign In or Register to comment.