To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

Is the SOR the natural logarithm of odds ratio?

Hi,

I'm trying to figure out a good threshold for the SOR for hard-filtering a SNP data set. The best practices documentation suggests using a cutoff threshold of 4, but in reading a bit more about odds-ratios, it seems that taking the natural logarithm of the odds-ratio may be more ideal. Is the SOR reported in the vcf file actually the natural logarithm of the odds ratio? This information is not provided in the page on statistical tests or the documentation for the SOR.

Thanks,
Jen

Issue · Github
by Geraldine_VdAuwera

Issue Number
340
State
closed
Last Updated
Milestone
Array
Closed By
vdauwera

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi Jen, I'm not sure but will ask the developer of this annotation to clarify.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Ah, the developer answered that unfortunately the name is a bit misleading because the implementation was changed somewhere between the start of development and release of this annotation. Now SOR isn't really an odds ratio anymore. The goal was to separate certain cases of data without penalizing variants that occur at the ends of exons because they tend to only be covered by reads in one direction (depending on which end of the exon they're on), so if a variant has 10 ref reads in the + direction, 1 ref read in the - direction, 9 alt reads in the + direction and 2 alt reads in the - direction, it's actually not strand biased, but the FS score is pretty bad. The implementation that resulted derived in part from empirically testing some read count tables of various sizes with various ratios and deciding from there. But the reported SOR value is indeed ln-scaled. We'll clarify this in the documentation.

    Issue · Github
    by Geraldine_VdAuwera

    Issue Number
    1227
    State
    closed
    Last Updated
    Milestone
    Array
    Closed By
    vdauwera
  • jenmodjenmod durhamMember

    Thanks for looking into this for me! ...but that raises a whole new set of questions, as in, does this change the recommendations for hard filtering cutoffs for the SOR or FS values? I was going to use the SOR as a filtering criterion instead of the FS, because it seemed like the better option. Is this true?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator
    Accepted Answer

    @jenmod
    Hi Jen,

    We do recommend using SOR over FS because SOR is more stable at various depths (FS is better for lower depth regions). However, it is easier to use SOR in VQSR than in hard filtering. Have a look at this thread for more information on how to use SOR in hard filtering: http://gatkforums.broadinstitute.org/discussion/5533/strandoddsratio-computation

    -Sheila

  • jenmodjenmod durhamMember

    Great, thanks, and sorry, I didn't mean to WTF anyone, I just hit that accidentally!

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @jenmod
    haha I have done that before. No worries, you will only be the third person to give me that honor :wink:

    -Sheila

Sign In or Register to comment.