To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

VQSR on exome of small specific population

Hello,
I've asked this question at the workshop in Brussels, and I would like to post it here:
I'm working on an exome analysis on trio. I would like to run VQSR of filteration on the data. since this is an exome project, there are not a lot of varients, and therefore, as I understand. the VQSR is not accurate. You suggest to add more data from 1000Genomes or other published data.
The families that I'm working on belongs to a very small and specific population, and I'm afraid that adding published data will add a lot of noise.
What do you think, should I add more published data? change parameters such as maxGaussians? do hard filteration?

Thanks,
Maya

Tagged:

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi Maya,

    I've thought about your case some more and I think you would still benefit from adding exomes from the 1kG dataset to your cohort. While this won't do anything to boost discovery of any rare variants that are unique to your population, it will allow you to use VQSR, which is always a good thing. It won't reduce sensitivity to the rare variants, and it won't increase the false positive discovery rate, if that's what you're worried about. Then once you have your recalibrated variant over the cohort, you can subset the variants of your samples, to get rid of all the stuff you're not interested in.

  • mayaabmayaab IsraelMember

    Thanks. I can't understand how it won't reduce sensitivity to rare variants and won't increase false positives. can you explain it?
    and one important thing - I have very few samples: one of each parent and a child, meaning only three samples. which will make 1kG database be the majority of the data.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    VQSR does not track how many samples in a cohort show evidence for the variant. It performs site-level, not sample-level filtering, using annotations that are not expected to be influenced by the frequency of variants in the population. What it does effectively is detect trends in the data that indicate artifactual calls, and filters those out. As long as the annotations of rare variants follow the same distribution as the rest (and there is no reason I can think of why they would not) they will be treated the same way. Variants do not need to be represented in the truth and training resources in order to be classified appropriately; if they were, it would defeat the purpose of the modeling.

    Does this help?

  • mayaabmayaab IsraelMember

    OK, now I understand. thanks Geraldine

Sign In or Register to comment.