Simulating reads to test methods

evolvedmicrobeevolvedmicrobe MGHPosts: 14Member
edited January 2013 in Ask the GATK team

I just wrote a walker to look for particular types of low frequency mutations, and I wanted to verify that the methods were working. I was hoping to simulate some illumina data with the variants and then run the methods against this data.

However, I don't know what a realistic error model is for common Illumina data and so am not sure how realistic my simulations are (Proportion of gaps, A->C versus A->G, etc.). Does the GATK include a read simulator? I saw one walker in the documentation but it seemed to rely on inputting settings that I didn't know about it and looked a bit out of date.

Any help appreciated.

Post edited by Geraldine_VdAuwera on

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,163Administrator, GATK Developer admin
    edited January 2013

    We don't normally work with simulated data, so we don't have any up-to-date tools to do this, sorry. Perhaps someone in the community can suggest a good package that does this?

    EDIT: I stand corrected. See Mark's remark for pointers, although this only applies to folks within Broad. Suggestions from the community are still welcome to help external folks!

    Post edited by Geraldine_VdAuwera on

    Geraldine Van der Auwera, PhD

  • evolvedmicrobeevolvedmicrobe MGHPosts: 14Member

    Found a program that accounts for the error model (if in a somewhat unspecified way)...
    http://www.niehs.nih.gov/research/resources/software/biostatistics/art/

Sign In or Register to comment.