The current GATK version is 3.6-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Powered by Vanilla. Made with Bootstrap.

some genotype calls are wrong, why?

jimmikeselfjimmikeself Member Posts: 10
edited October 2012 in Ask the GATK team

Hello the team,

For some genotypes, it seems are wrong, I know it's model based, and base q, map q, etc are considered in the model.
I also read this link:
http://gatkforums.broadinstitute.org/discussion/1235/why-didnt-the-unified-genotyper-call-my-snp-i-can-see-it-right-there-in-igv#latest
But my case are special,
the format is (ref allele count)/(alternative allele count) genotype call:
22/24 0/0
109/125 0/0
85/109 0/0
26/32 0/0
40/161 0/0
195/6 1/1
239/5 1/1
83/6 1/1
46/28 1/1

In one case, the two variants are adjacent to each other.
In some case, they are one base indels.

Thanks,

Jim

Post edited by Geraldine_VdAuwera on

Comments

  • Geraldine_VdAuweraGeraldine_VdAuwera Administrator, Dev Posts: 10,690 admin

    Hi Jim,

    Can you give some more supporting info? For example, some IGV or pileup screenshots that show the reads and base calls at those loci? Without seeing more it's impossible to interpret the genotype numbers.

    Geraldine Van der Auwera, PhD

  • jimmikeselfjimmikeself Member Posts: 10

    Hi Geraldine,
    Thanks. Here is the genotype call and attached is the IGV snapshot

    chr10:124329672 T T/T
    chr10:124329673 C C/C

    I can email you a small slice of the BAM file, since it has some meta info, I do not want to post it here.

    Thank you,
    -Jim

    igv_wrongcall.png
    1680 x 885 - 27K
  • ebanksebanks Broad InstituteMember, Administrator, Broadie, Moderator, Dev Posts: 698 admin

    What is your command-line for calling? Are you possibly running with the generalized ploidy model and asking to genotype a ploidy of 1?

    Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

  • jimmikeselfjimmikeself Member Posts: 10

    Eric,
    Thank you. I am using genotype_likelihoods_model, here is the cmd:
    java -Xmx4g -jar GATK
    -T UnifiedGenotyper -nt 8 -R HG19REF -I test.bam -o test.vcf
    -metrics OUTPUT/test.txt -dcov 250
    -L chr10 -glm BOTH -stand_call_conf 30 -stand_emit_conf 30
    -baq CALCULATE_AS_NECESSARY
    -G Standard -A AlleleBalance -A DepthOfCoverage
    -A HomopolymerRun -A QualByDepth

    -Jim

  • jimmikeselfjimmikeself Member Posts: 10

    The version is version 1.6-11-g3b2fab9 on Linux 64

  • ebanksebanks Broad InstituteMember, Administrator, Broadie, Moderator, Dev Posts: 698 admin

    Hi Jim,

    I would try updating to the latest version of the code and turning off BAQ (which has been known to kill multi-nucleotide polymorphisms). If that doesn't work, it would be good to see what the HaplotypeCaller calls at this position.

    Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

  • jimmikeselfjimmikeself Member Posts: 10

    Eric,
    Thanks a lot! That's exactly the cause. Can you give me a clue about the effect on false positive calls if I turn it off? To my understanding (please correct me if I am wrong or incomplete), BAQ was introduced to reduce false positive calls around alignment induced indels. I wonder how much benefits it has.
    -Jim

  • ebanksebanks Broad InstituteMember, Administrator, Broadie, Moderator, Dev Posts: 698 admin

    BAQ has tons of benefit for regular SNPs, but it tends to kill more complex events like MNPs (as you saw firsthand) and indels. This is precisely why we have developed the HaplotypeCaller as the successor to the UnifiedGenotyper.

    Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

  • jimmikeselfjimmikeself Member Posts: 10

    Eric,
    Thanks for the insights.
    So HaplotypeCaller kept the benefits and eliminated the drawbacks on complex events? Is 1000genome project using HaplotypeCaller?
    I'd like to know more about it. Could you point me to some docs or presentations about it, or the comparison of it and UnifiedGenotyper.
    What is the major difference?
    Thanks,
    -Jim

  • jimmikeselfjimmikeself Member Posts: 10

    thanks!

Sign In or Register to comment.