The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

some genotype calls are wrong, why?

jimmikeselfjimmikeself Member Posts: 10
edited October 2012 in Ask the GATK team

Hello the team,

For some genotypes, it seems are wrong, I know it's model based, and base q, map q, etc are considered in the model.
I also read this link:
http://gatkforums.broadinstitute.org/discussion/1235/why-didnt-the-unified-genotyper-call-my-snp-i-can-see-it-right-there-in-igv#latest
But my case are special,
the format is (ref allele count)/(alternative allele count) genotype call:
22/24 0/0
109/125 0/0
85/109 0/0
26/32 0/0
40/161 0/0
195/6 1/1
239/5 1/1
83/6 1/1
46/28 1/1

In one case, the two variants are adjacent to each other.
In some case, they are one base indels.

Thanks,

Jim

Post edited by Geraldine_VdAuwera on

Comments

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,731 admin

    Hi Jim,

    Can you give some more supporting info? For example, some IGV or pileup screenshots that show the reads and base calls at those loci? Without seeing more it's impossible to interpret the genotype numbers.

    Geraldine Van der Auwera, PhD

  • jimmikeselfjimmikeself Member Posts: 10

    Hi Geraldine,
    Thanks. Here is the genotype call and attached is the IGV snapshot

    chr10:124329672 T T/T
    chr10:124329673 C C/C

    I can email you a small slice of the BAM file, since it has some meta info, I do not want to post it here.

    Thank you,
    -Jim

    igv_wrongcall.png
    1680 x 885 - 27K
  • ebanksebanks Broad InstituteMember, Broadie, Dev Posts: 692 ✭✭✭

    What is your command-line for calling? Are you possibly running with the generalized ploidy model and asking to genotype a ploidy of 1?

    Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

  • jimmikeselfjimmikeself Member Posts: 10

    Eric,
    Thank you. I am using genotype_likelihoods_model, here is the cmd:
    java -Xmx4g -jar GATK
    -T UnifiedGenotyper -nt 8 -R HG19REF -I test.bam -o test.vcf
    -metrics OUTPUT/test.txt -dcov 250
    -L chr10 -glm BOTH -stand_call_conf 30 -stand_emit_conf 30
    -baq CALCULATE_AS_NECESSARY
    -G Standard -A AlleleBalance -A DepthOfCoverage
    -A HomopolymerRun -A QualByDepth

    -Jim

  • jimmikeselfjimmikeself Member Posts: 10

    The version is version 1.6-11-g3b2fab9 on Linux 64

  • ebanksebanks Broad InstituteMember, Broadie, Dev Posts: 692 ✭✭✭

    Hi Jim,

    I would try updating to the latest version of the code and turning off BAQ (which has been known to kill multi-nucleotide polymorphisms). If that doesn't work, it would be good to see what the HaplotypeCaller calls at this position.

    Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

  • jimmikeselfjimmikeself Member Posts: 10

    Eric,
    Thanks a lot! That's exactly the cause. Can you give me a clue about the effect on false positive calls if I turn it off? To my understanding (please correct me if I am wrong or incomplete), BAQ was introduced to reduce false positive calls around alignment induced indels. I wonder how much benefits it has.
    -Jim

  • ebanksebanks Broad InstituteMember, Broadie, Dev Posts: 692 ✭✭✭

    BAQ has tons of benefit for regular SNPs, but it tends to kill more complex events like MNPs (as you saw firsthand) and indels. This is precisely why we have developed the HaplotypeCaller as the successor to the UnifiedGenotyper.

    Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

  • jimmikeselfjimmikeself Member Posts: 10

    Eric,
    Thanks for the insights.
    So HaplotypeCaller kept the benefits and eliminated the drawbacks on complex events? Is 1000genome project using HaplotypeCaller?
    I'd like to know more about it. Could you point me to some docs or presentations about it, or the comparison of it and UnifiedGenotyper.
    What is the major difference?
    Thanks,
    -Jim

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,731 admin
  • jimmikeselfjimmikeself Member Posts: 10
Sign In or Register to comment.