Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Help choosing truth sensitivity

sp580sp580 GermanyMember
edited March 10 in Ask the GATK team

Hello,

I am trying to decide which set of SNPs to use for my downstream analyses. I need to have > 1SNP per Kbp to detect signatures of selection (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4611237/). The tranches that achieve this SNP density start from 80% truth sensitivity.
Looking at the tranches plot , tranches 80-90 are reasonable based on the novel TiTv ratio for the species I am working with.

However, the fraction of false positives (Target TiTv ratio = 2) might be too high for these tranches (0.2-0.25):

    | targetSensitivity|     nTP|     nFP| FP_fraction|
    |-----------------:|-------:|-------:|-----------:|
    |                60|  140491|    5357|       0.037|
    |                65|  200758|   10581|       0.050|
    |                70|  289547|   23544|       0.075|
    |                75|  303540|   76741|       0.202|
    |                80|  311006|   81051|       0.207|
    |                85|  322111|   90922|       0.220|
    |                90|  340943|  110478|       0.245|
    |                95|  377469|  340152|       0.474|
    |               100| 1479090| 2361373|       0.615|

But I am not really sure how to interpret this information. The way this makes sense to me is that if I use, for instance, tranche 85 as my final SNP subset, I would accept that each novel SNP has 22% chance of being a false positive. However, this corresponds to 90922 SNPs and 1% of the total SNP set, which I am willing to live with and move on with the analysis.

I would like to know if this interpretation correct and if you have any suggestions (i.e. would it make a big difference to choose tranche 90 instead of 85?).

Thanks!

Post edited by sp580 on

Answers

Sign In or Register to comment.