To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

MQRankSum and ReadPosRankSum for SNPs in a haploid organism?

rorycraigrorycraig EdinburghMember

Hi,

Apologies if this has been addressed previously. I'm working with genomic resequencing data for a haploid organism, and I have created a VCF file using GenotypeGVCFs from 33 gVCFs created using HaplotypeCaller (using best practices). I did not set the ploidy option when using GenotypeGVCFs as directed. My final aim is to filter for a subset of high-quality SNPs for a downstream analysis.

As I understand it the parameters MQRankSum and ReadPosRankSum can only be calculated if there is an individual with a heterozygous genotype (ref and alt alleles) at that position. Around 15% of my SNPs have been scored for these parameters, can anyone explain what this means for a haploid? Are these sites good candidates to filter outright?

An example SNP is below:

chromosome_1 3316 . G A 492.42 . AC=3;AF=0.136;AN=22;BaseQRankSum=0.731;ClippingRankSum=1.70;DP=1488;FS=0.000;MLEAC=3;MLEAF=0.136;MQ=31.59;MQRankSum=-5.660e-01;QD=16.98;ReadPosRankSum=0.731;SOR=1.308 GT:AD:DP:GQ:PL 0:86,0:86:99:0,1800 0:89,0:89:99:0,1800 0:147,0:147:99:0,1800 0:51,5:56:99:0,1800 1:1,4:5:80:80,0 0:271,0:271:99:0,1800 1:0,8:8:99:247,0 0:21,1:22:99:0,814 .:0,0 1:5,11:16:99:211,0 0:140,72:212:99:0,1800 0:242,13:255:99:0,1800 0:252,0:252:99:0,1800 .:0,0 .:0,0 .:0,0 0:1,0:1:44:0,44 .:0,0 .:0,0 0:3,0:3:99:0,112 0:1,0:1:39:0,39 .:0,0 .:0,0 .:0,0 .:0,0 0:17,0:17:99:0,360 0:1,0:1:37:0,37 0:4,0:4:99:0,135 0:10,0:10:99:0,270 0:3,0:3:99:0,119 0:3,0:3:99:0,111 0:3,0:3:99:0,119 .:0,0

Cheers,
Rory

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @rorycraig
    Hi Rory,

    Can you confirm that the RankSum annotations do not appear in the final VCF if you do set ploidy in GenotypeGVCFs? I don't think you should use the RankSum annotations for haploid samples, as the annotation is meant for diploid samples.

    -Sheila

  • rorycraigrorycraig EdinburghMember

    Hi Sheila, sorry for the slow reply. I can confirm that these annotations do still appear if ploidy is set to 1 in the GenotypeGVCFs command. Do you have any insight on whether it's best to ignore these annotations, or actively filter them? Thanks!

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @rorycraig
    Hi,

    We don't have any recommendations for using or not using rank sum annotations in haploid (or non-diploid) samples. I think the best thing to do is try both ways (filtering with and without the rank sum annotations) and see which works best for your dataset.

    -Sheila

Sign In or Register to comment.