Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

VariantAnnotator requirements for some annotations

AlexanderVAlexanderV BerlinMember
edited September 2015 in Ask the GATK team

Hi @Team,

I found that VariantAnnotator sometimes does not annotate some annotations that are requested.

A ) The Rank Sum Test annotations MQRankSum & BaseQRankSum
I was not able to identify the requirements that have to be met, so they are being calculated for a variant.

B ) InbreedingCoeff
This one seems to be connected to the number of total called alleles (AN).
For me there needed to be at least 10% alleles be called (19/186).
The doc for that says [1] "at least 10 founder samples". Maybe this has to be updated to 10%?

These are the ones I observed.
Can someone tell me more about that?

Thanks,
Alexander

[1] https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_annotator_InbreedingCoeff.php

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @AlexanderV
    Hi Alexander,

    Can you post the exact command you ran? I suspect you did not input the bam file so BaseQRankSum and MQRankSum could be calculated.

    For Inbreeding Coefficient, can you tell me why you think more than 10 samples may be needed? Like, did the sites where less than 10% of the samples had variants not get a value for IC?

    Thanks,
    Sheila

  • AlexanderVAlexanderV BerlinMember

    Dear @Sheila,

    indeed I input all the bam for all the individuals present with genotyping information.
    The command I used is below.

    In fact the RankSum Annotations are added, but just for ~26% of the variants.
    Before VariantAnnotator, this annotation was completely absent, so the VariantAnnotator must have added it for some of them.
    That's why I am wondering why some are not being annotated.

    For the InbreedingCoeff:
    The 10 SAMPLES restriction I got from the linked article.
    My 10 PERCENT restriction I derived from the fact that all variants that didn't get the annotation have AN less than 10% of the maximal number of alleles.

    Best,
    Alexander

    java -Xmx4g -jar /path/GATK/3.4-46/GenomeAnalysisTK.jar -T VariantAnnotator -R /path/PGSC_DM_v4.03_all.renamed.fasta -I /path/761005_1.bam -I /path/761005_2.bam -I /path/761005_3.bam -I /path/761159_2.bam -I /path/761159_3.bam -I /path/761967_1.bam -I /path/761967_2.bam -I /path/761967_3.bam -I /path/762333_1.bam -I /path/762333_2.bam -I /path/762333_3.bam -I /path/762836_1.bam -I /path/762836_2.bam -I /path/763644_1.bam -I /path/763644_2.bam -I /path/763644_3.bam -I /path/760029_1.bam -I /path/760029_2.bam -I /path/760029_3.bam -I /path/760034_1.bam -I /path/760034_2.bam -I /path/760034_3.bam -I /path/760235_1.bam -I /path/760235_2.bam -I /path/760235_3.bam -I /path/760467_1.bam -I /path/760467_2.bam -I /path/760467_3.bam -I /path/760530_1.bam -I /path/760530_2.bam -I /path/760530_3.bam -I /path/760673_1.bam -I /path/760673_2.bam -I /path/760673_3.bam -I /path/761159_1.bam -I /path/761384_1.bam -I /path/761384_2.bam -I /path/761384_3.bam -I /path/761830_1.bam -I /path/761830_2.bam -I /path/761830_3.bam -I /path/761948_1.bam -I /path/761948_2.bam -I /path/761948_3.bam -I /path/762068_1.bam -I /path/762068_2.bam -I /path/762068_3.bam -I /path/762089_1.bam -I /path/762089_2.bam -I /path/762089_3.bam -I /path/762183_1.bam -I /path/762183_2.bam -I /path/762183_3.bam -I /path/762320_1.bam -I /path/762320_2.bam -I /path/762320_3.bam -I /path/762418_1.bam -I /path/762418_2.bam -I /path/762418_3.bam -I /path/762495_1.bam -I /path/762495_2.bam -I /path/762495_3.bam -I /path/762617_1.bam -I /path/762617_2.bam -I /path/762617_3.bam -I /path/762799_1.bam -I /path/762799_2.bam -I /path/762799_3.bam -I /path/762832_1.bam -I /path/762832_2.bam -I /path/762832_3.bam -I /path/762836_3.bam -I /path/762845_1.bam -I /path/762845_2.bam -I /path/762845_3.bam -I /path/762867_1.bam -I /path/762867_2.bam -I /path/762867_3.bam -I /path/762897_1.bam -I /path/762897_2.bam -I /path/762897_3.bam -I /path/762978_1.bam -I /path/762978_2.bam -I /path/762978_3.bam -I /path/763010_1.bam -I /path/763010_2.bam -I /path/763010_3.bam -I /path/763693_1.bam -I /path/763693_2.bam -I /path/763693_3.bam -I /path/763932_1.bam -I /path/763932_2.bam -I /path/763932_3.bam -o out.vcf.gz -V in.vcf.gz -A BaseQualityRankSumTest -A ChromosomeCounts -A Coverage -A FisherStrand -A HaplotypeScore -A InbreedingCoeff -A MappingQualityRankSumTest -A QualByDepth -A RMSMappingQuality -A ReadPosRankSumTest -A StrandOddsRatio -L in.vcf.gz
    
  • KurtKurt Member ✭✭✭

    Are the sites where they are not being calculated all non-reference reads (for example, all homozygous non-reference)?

  • AlexanderVAlexanderV BerlinMember

    Hi @Kurt,

    thank you for your answer.

    No. This is not the case.
    Here are some examples.
    With medium to high AN and low AC for annotated and not annotated variants.
    Also found one with annotation where AC==AN (your theory)

    Best,
    Alexander

    1       142669  .       T       A       1171.83 .       AC=10;AF=0.238;AN=42;DP=161;FS=0.000;InbreedingCoeff=0.5695;MLEAC=11;MLEAF=0.262;MQ=25.15;QD=22.98;SOR=0.191
    1       227759  .       T       C       135.52  .       AC=3;AF=0.022;AN=134;DP=166;FS=0.000;InbreedingCoeff=-0.0308;MLEAC=3;MLEAF=0.022;MQ=59.85;QD=13.55;SOR=0.003
    1       227807  .       T       A,G     1384.06 .       AC=20,4;AF=0.156,0.031;AN=128;BaseQRankSum=-1.583;DP=170;FS=0.000;InbreedingCoeff=0.5800;MLEAC=15,2;MLEAF=0.117,0.016;MQ=59.78;MQRankSum=0.528;QD=35.07;ReadPosRankSum=-0.528;SOR=0.118
    1       227897  .       G       A       47.06   .       AC=2;AF=0.017;AN=118;BaseQRankSum=-0.796;DP=187;FS=0.000;InbreedingCoeff=0.0324;MLEAC=1;MLEAF=8.475e-03;MQ=59.73;MQRankSum=0.265;QD=23.53;ReadPosRankSum=0.796;SOR=0.145
    1       227947  .       C       T       113.18  .       AC=2;AF=1.00;AN=2;BaseQRankSum=-0.755;DP=148;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=59.97;MQRankSum=-0.044;QD=26.62;ReadPosRankSum=0.844;SOR=0.013
    
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @AlexanderV
    Hmm. Okay. Can you post some examples with the FORMAT field present?

    Thanks,
    Sheila

  • AlexanderVAlexanderV BerlinMember

    Interesting point.
    Here is the corresponding FORMAT field to the positions above:

    GT:AD:DP:GQ:PL
    GT:AD:DP:GQ:PL
    GT:AD:DP:GQ:PGT:PID:PL
    GT:AD:DP:GQ:PGT:PID:PL
    GT:AD:DP:GQ:PGT:PID:PL
    

    If that's not it, I attached a subset of the vcf with genotype info.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @AlexanderV
    Hi Alexander,

    Geraldine gave some good insight to this one. @Geraldine_VdAuwera

    As for the missing Rank Sum Test annotations, a possible explanation is that some reads are not qualifying to be included in the test. Therefore, we are not getting enough reference and alternate reads. This issue is more likely to happen when there is low depth. You can check if the distribution of depth per sample for the sites missing the annotations is indeed very low.

    As for Inbreeding Coefficient, did you input a pedigree file? There has to be at least 10 founding members in your cohort.

    -Sheila

Sign In or Register to comment.