Generating a vcf with the information of specific genome positions (hotspots)

Hello,

I'm developing a pipeline that needs to take into account the information about variants that are present on a list of hotspots on the genome, because my final analysis uses the information contained on the vcf.

I'm working with WES data, and although I can use as input a bed file with the regions to be analyzed, if the genome region on my hotspot list is showing the same base as the reference genome, it won't appear on my vcf.

So when I filter my vcf for the specific variants on my list, I don't know if the variants that are not present on the file aren't present because it is showing the same base as the reference genome, or because there was a low coverage for the sample on these regions.

I was wondering if GATK best practices pipeline could offer the option to input the list of hotspot so it could generate a vcf with the information for these variants.

If not, what could be the best approach to solve it, and still use GATK's pipeline.

I appreciate the attention,

Gabriel

Answers

  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin
    edited December 2018

    Hi @gabrielgoncalves

    I am not quite clear what the question is here.

    I was wondering if GATK best practices pipeline could offer the option to input the list of hotspot so it could generate a vcf with the information for these variants.

    Would you please explain this statement in more detail. What information do you expect to see?

  • GabrielSGoncalvesGabrielSGoncalves BrazilMember

    Hi Bhanu,

    What I wanted to know is if there is a parameter on GATK variant caller I can use to provide a list of variants I want to see on the vcf.

    e.g.

    I have the following variants on a bed file:

    chr10   96535172    96535173    rs41291556
    chr10   96535209    96535210    rs72552267
    chr10   96535245    96535246    rs17884712
    chr10   96540409    96540410    rs4986893
    

    So the variant caller would take into account these positions and would add them to the vcf file, even if the sample doesn't have mutations on the respective hotspots.

    Do you think that's possible?

    Thank you for your attention,

    Gabriel

  • SChaluvadiSChaluvadi Member, Broadie, Moderator admin

    @GabrielSGoncalves
    If I understand correctly, you want your vcf file to contain records of your hotspots irrespective of them being an actual variant. I think that to do this, using HaplotypeCaller, you can use the --emitRefConfidence (-ERC) in the BP_RESOLUTION mode. Here is a post that also explore calling variant and invariant sites using the BP_RESOLUTION mode.
    Please let us know if this is not what you are looking for and if not we can help you further!

Sign In or Register to comment.