We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

GATK different from Sanger results

geneontologygeneontology ChicagoMember
edited April 2015 in Ask the GATK team

Hi All,

Below are two SNPs I obtained for two samples from Exom DNAseq and Sanger.
From Sanger's results, it seems both SNPs are heterozygous, while GATK(v3.2) call one sample as homozygous.
Both SNP's VQSLOD > 6, does this mean we still need to do filtering based on sequencing depth, which is hard filtering, then why do we need the machine learning-based soft filtering?
If we need to do filtering based on sequencing depth, what threshold would you recommend?

Sample1:GATK Sample2:GATK Sanger for both samples Sample1:IGV Sample2:IGV
CT T C,T C:9,T:2 T:9
T GT G,T T:7 G:7,T:8

Thanks much!

Answers

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    @geneontology
    Hi,

    Just to clarify, the first variant is called C/T by Sanger, and GATK calls it C/T in sample 1 and T/T in sample 2? And, the second variant is called G/T by Sanger, but GATK calls it T/T in sample 1 and G/T in sample 2? Can you post the IGV screenshots of the bamout files for the two positions in the two samples? It does not look like there is any evidence for the variant in both examples you provided. All of our analyses are done on 30X data, so our tools perform best at that coverage. We are looking into what can be done to increase sensitivity at lower coverages. However, if there are no reads that support a variant at a position, there is nothing that can be done.

    Variant Recalibration is for determining high quality variant sites. If you are interested in genotypes, you can try our Genotype Refinement workflow. http://gatkforums.broadinstitute.org/discussion/4723/genotype-refinement-workflow

    -Sheila

  • geneontologygeneontology ChicagoMember
    edited April 2015

    Thanks Sheila.
    Just to clarify, the first variant is called C/T by Sanger, and GATK calls it C/T in sample 1 and T/T in sample 2?
    CORRECT.

    IGV picture:
    SNP1: https://dl.dropboxusercontent.com/u/62547840/SNP1.jpg
    SNP2: https://dl.dropboxusercontent.com/u/62547840/SNP2.jpg

    The average coverage for our exome sequencing is more than 50X, but the coverage is not uniformly distributed, e.g. some exon has much higher coverage than the others, also, the coverage in an exon is a bell shape.
    More importantly, GATK, if using VQSLOD (e.g.>4) as a determinant, will most likely picks those SNPs with low sequencing depth in the resulting VCF file.

    Post edited by geneontology on
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @geneontology I don't think I understand what you're asking here. Can you please clarify your question? Are you asking why this SNP is being filtered, or not filtered out? We may need more information, such as what commands were run on the data, and complete VCF records. A summary of allele depth is not sufficient to evaluate a call.

Sign In or Register to comment.