Different AS_QD values when genotyping a multi-allelic site alone or chromosome wide

Hi,

I have found that AS_QD values differ at multi-allelic sites when genotyping a multi-allelic site alone or when genotyping the whole chromosome on 4 batches of around 250 WGS samples using GATK-3.8.

For example, when performing the GenotypeGVCFs for the whole chromosome the AS_QD values for a multi-allelic site are : AS_QD=29.59,0.02

When genotyping only this multi-allelic site (using -L XX:XXXXXX) the AS_QD values are : AS_QD=34.24,0.02

All values are the same and all genotypes and their information (AD, DP, PL, etc.) are also the same. The only difference is the AS_QD values for the first alternate allele passing from 29.59 to 34.24 which is quite different.

When genotyping this multi-allelic site alone and adding a padding of 10,000 bases, the AS_QD value for the first alternate allele is 32.17. A padding of 20,000 and the AS_QD value is now 32.52.When adding a padding of 30,000 bases the AS_QD values match the values obtained from the whole chromosome genotyping (29.59).

I tested 2 others multi-allelic sites and each time the AS_QD values differ for some alternate allele(s) when genotyping only the site. Adding a padding large enough around the site seems to revert the AS_QD value to its initial value obtained chromosome wide.

Here is the command line I used on combined GVCF files :

java \
 -Xmx22g \
 -jar GenomeAnalysisTK-3.8.jar \
 -R ${ref} \
 -T GenotypeGVCFs \
 -G AS_StandardAnnotation \
 -G StandardAnnotation 
 -A AS_InsertSizeRankSum \
 -A AS_MQMateRankSumTest \
 -A FractionInformativeReads \
 -A LikelihoodRankSumTest \
 -A GCContent \
 -L ${chr} \ # or -L ${chr}:${pos} or -L ${chr}:${pos} and -ip ${padding}
 -o ${res}
 -V batch1.g.vcf.gz \
 -V batch2.g.vcf.gz \
 -V batch3.g.vcf.gz \
 -V batch4.g.vcf.gz \
 --annotateNDA \
 --max_alternate_alleles 20 \
 -newQual

I wonder why AS_QD values differ depending on the size of the region to genotype since it should only use information at the current site.

Best,

Issue · Github
by Sheila

Issue Number
2705
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
chandrans

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @bgrenier
    Hi,

    My first thought was downsampling/different reads used in the active regions. But you say the AD/DP is the same in the records. Can you post some IGV screenshots of the original BAM file and bamout files for each padding you used?

    Can you also check if this happens in GATK4 latest beta?

    Thanks,
    Sheila

  • bgrenierbgrenier FranceMember

    @Sheila
    Hi,

    Since this is at the GenotypeGVCFs step, the input files are combined gvcf files. So, I don't see the relationships with bam files. Also, I think GenotypeGVCFs does not perform any downsampling.

    I found also some discussions on the forum complaining about differences in QD values (not AS_QD but maybe linked) when using multithreading. But I don't use any multi-threading in my tests.

    These differences of AS_QD depending on the region size seem to occur at every multi-allelic sites, at least with my files. I didn't see any difference at bi-allelic sites. Also, it seems that these differences only appear when there are enough individuals to genotype (the issue does not occur when using 10 individuals for example).

    At the moment, I can't try with GATK4 because of the different way to handle combined files between GATK3 (combineGVCFs) and GATK4 (GenomicsDB) and I have many individuals to include.

    Also, I don't think I will have the right to extract such large region with many individuals and upload it for you to test.

    Best,

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @bgrenier
    Hi,

    Alright. Let me check with the team and get back to you.

    -Sheila

  • bgrenierbgrenier FranceMember

    @Sheila ,

    Hi,

    So large QD values are replaced with a random value near 30 and this happen most frequently at multi-allelic sites.

    Thank you for the answer.

    Best,

Sign In or Register to comment.