Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

No mutations in BAM (IGV) but a mutation in final VCF?

mhmtgencmhmtgenc TurkeyMember

I use GATK 4.0 for the variant calling pipeline. my steps involve MarkDuplicates, BaseRecalibration, ApplyBaseRecalibration and HaplotypeCaller. When I check in a loci there is no mutation in the original BAM file in IGV, but there is a mutation in final VCF and when I check the bamout of the HaplotypeCaller there seems to be a mutation. Then I tried Sanger sequencing and see that there is actually no mutation. So the original Bam file is the right one and bamout is the wrong mutation.

So how could I overcome this problem? This is a serious issue and occurs several times. Thanks in advance.

Tagged:

Answers

  • kelepiradamkelepiradam TurkeyMember

    Can you post an image or a snippet of your data? What kind of data are you working on?

  • kelepiradamkelepiradam TurkeyMember
    edited April 2018

    There are several reasons for this issue
    1- The reference genome that you are using: If you are using a reference genome that is missing the unmapped contigs or random contigs your study will be prone to more false positive variants. You may want to switch to a genome used by 1000G or Broad Institute best practices. That means you need to remap and re do all the analyses.

    2- Your DNA amplification/capture technology may be causing false positive variants at certain positions. This happens alot with different exome capture kits. The best way to overcome this problem is to use joint genotyping and variant recalibration to filter out false positives as much as possible. After a certain number of samples you will have confident list of false positives that you will omit from your gold standard variants.

    3- The region that you are looking could have related pseudogenes or pseudosequences that could not be eliminated simply by PCR amplification or the capture technology. This is something you need to figure out yourself.

    False positives are a fact and there are many things that could be done to eliminate them computationally.

    If you can send us a snippet or image it will be much easier for us to pinpoint the exact problem.

    PS: Are you at Bilkent University? I am an alumni as well (2004).

Sign In or Register to comment.