Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

How to ask GATK HaplotypeCaller to call the best variant from a bam file

nathajolinathajoli quebecMember

Hi,

I am working on alignment files, in which I have aligned Illumina raw reads to a reference genome. I would like to compare the reads consensus sequence with the reference genome and call the variants in order to see the differences between my two environments. As far as now, I have been able to create the BAM file, produce the vcf file and then call the variant using Haplotype caller and look to their effect using snp eff. The problem that I met is when I am vizualising my alignment file using Geneious. When I look to the list of variants that have an HIGH effect on the final product, I can see some reads that are showing the variations but some others that doesn't. So I cant really trust my list of variants gave by snp eff and call by haplotype caller because sometime, it predicts a variation like stop_gained for instance, but it s only the case for some reads and it is not confirmed by all my reads. I would feel more confident if I could call the variant only when the majority of reads agrees on this variations. I think that comparing a consensus sequences of the reads against the reference genome could be the solution? I am not interested by seeing the polymorphism among the reads, I just want to see the differences with the reference genome.

Would be great if you could let me know what parameters of GATK haplotype caller I should modified to obtain what I need!

Thanks a lot,

Nathalie.

Answers

  • SheilaSheila Broad InstituteMember, Broadie admin

    @nathajoli
    Hi Nathalie,

    Haplotype Caller is designed to be very sensitive so it does not miss any potential variation. Have you filtered your variants? You can set a higher stringency level so most of your variants are true positives.

    It sounds like you want most or all of the reads to be the variant allele at a site. Those sites will be called 1/1. However, if a site is heterozygous, there will be a mix of reference and variant alleles at the site.

    So, do you only want to have sites that are 1/1? Are you not interested in heterozyous sites?

    There is no way to get only sites that have majority alternate reads from GATK tools. However, you can use Variants To Table to get the AD for each site and write a script to determine whether there are mostly variant alleles or reference alleles. https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_variantutils_VariantsToTable.php

    -Sheila

Sign In or Register to comment.