We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Filtering based on BaseQRankSum

davetangdavetang AustraliaMember

Hello,

I was plotting the distribution of BaseQRankSum and noticed a large number of variants with a BaseQRankSum outside of Z-score of +/- 2, which suggests that a lot of variants have significant base quality differences between the REF and ALT. I plotted the distribution of ClippingRankSum, MQRankSum, and ReadPosRankSum and the majority of variants had Z-scores inside +/- 2.

Is this typical and what is this suggestive of? I followed the best practices for DNA sequencing using GATK3.

I found this post (http://gatkforums.broadinstitute.org/discussion/2035/z-scores-for-baseqranksum), which is similar to what I'm asking but has a different distribution of Z-scores.

Thank you in advance.

Dave

Issue · Github
by Sheila

Issue Number
90
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
chandrans

Best Answer

Answers

  • timflutretimflutre Montpellier, FranceMember

    Your question prompted me to check this on my data, but I don't see any such thing (image below). At least, your distribution looks symmetric. Did you run tools such as FastQC and CutAdapt before the alignment step?

  • davetangdavetang AustraliaMember

    I did run FastQC and all the base qualities looked normal; the only flags were for GC content in some samples. Actually all the Z-score distributions (BaseQRankSum, ClippingRankSum, MQRankSum, and ReadPosRankSum) were symmetrical but only the BaseQRankSum had values outside of +/- 2.

    I also used various filters for my variant data but even with high quality variants, half of their BaseQRankSum Z-scores were outside of +/- 2.

Sign In or Register to comment.