Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Masking Polymorphic Regions Before Variant Calling

I notice that the best practices workflows treat all regions in the reference genome the same. A region such as the MHC region containing the HLA genes is extremely polymorphic. There are thousands of known alleles in IMGT/HLA database and a recent article in PLoS Genetics estimates that there are 8 million to 9 million HLA alleles in the human population. Would it be better by default if the SNP calling best practices didn't output results for this region and explained in their guides why? A reviewer for Nature Communications recently asked for germline SNP calling to be done for the HLA alleles, which demonstrates a lack of understanding of when the reference genome is useful and when it's not. Having tools like GATK not output such misleading results by default would help to change researcher perceptions over time.

Answers

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    Depending on your personal taste of variant calling this is perfectly doable. I would even say go for it. SNP calling for some of the genes in the genome is pretty much useless.

    I would also add to this recipe that one should check for Retrotransposed cDNAs to be removed from all reads since they tend to mess up variant calling in some important genes. Unfortunately there are no tools to do this job.

Sign In or Register to comment.