To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

VQSR on specific genomic region

Dear GATK Team,

I have exome-data of many individuals (>2000) called with the HaplotypeCaller, but only of a specific set of genes from the genome. I would like to apply the VQSR-tool to recalibrate my variants, but (as expected) I get back an error 'No data found'.
I know there is an option to 'pad' your data with other exomes, but then the generation method needs to be comparable to my dataset (which whole-exome-sequencing is not).
Alternatively, I was therefore wondering if there is an option to 'focus' the VQSR-tool only on specific regions of the genome/exome?
Because I know for sure that if only my regions would be considered in the recalibration I would have enough variants to create a recalibration model.

Thank you for your help in advance,
Kind regards,

rosannevd

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @rosannevd
    Hi rosannevd,

    I am confused. Do you have exome data and are only interested in specific genes? Or, do you only have data from specific genes? If you only have data from specific genes, you cannot pad with other samples. The tool needs many different sites to make a good model. You will need to hard filter.

    -Sheila

Sign In or Register to comment.