The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Surround blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block.
Powered by Vanilla. Made with Bootstrap.
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

SNP genotyping algorithm consraints on +/- read proportion

genegene Member Posts: 17
edited October 2012 in Ask the GATK team

Hi,

I am observing the following scenario at one particular SNP (C/G) using two different enrichment technologies:
(I am using IGV syntax: ALLELE|number of reads w/ allele|%of total reads|+strand reads|- strand reads)

  • technology1:
    C: 15 47% 15+,0-
    G: 17 53% 17+,0-

  • technology2:
    C: 17 37% 13+,4-
    G: 29 63% 26+,3-
    As you can see both technologies have good coverage of the SNP and also good representation of each allele. SNP(C/G) does not get called in technology1.

My questions are:
1- Does the GATK algorithm have some sort of constraint on the proportion of reads coming from only one strand (as with technology1) in order to try to predict or discard duplicates?
2- I know that the base call of a particular base is bounded by the mapping quality of its read. If my --stand_call_conf is 30 and one of the bases at this SNP position has MQ<30 does this avoid this position getting called? Or is it more like the avg(MQ) has to be >30 (meaning more than one read at this position is taken into account)?

Thanks for any clarification,
Gene

Post edited by Geraldine_VdAuwera on

Answers

  • ebanksebanks Broad InstituteMember, Broadie, Dev Posts: 692 admin

    1) No, we don't mark duplicates (that's done in a previous data processing step as per our best practices).
    2) No, the calling confidence threshold is not the same thing as a base quality cutoff; it is the value of the QUAL in the VCF record (see the documentation for how they are related).

    Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

  • genegene Member Posts: 17

    Hi,

    Thanks for your answer.

    I looked in the paper "A framework for variation discovery and genotyping
    using next-generation DNA sequencing data" but could not find the relationship between QUAL and the base calls for a specific SNP position. Would it be possible to point me to the documentation of this relationship?

    Thx,
    Gene

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,388 admin
    edited January 2013

    Hi Gene,

    The math is detailed in this document. Note that it covers single-allele cases only; for multi-allelic calls, see this slide.

    Post edited by Geraldine_VdAuwera on

    Geraldine Van der Auwera, PhD

  • newbie16newbie16 Member Posts: 42

    Hi I also wanted to check out the documents you mentioned above. However the links don't work. Could you post new links?
    Thanks

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,388 admin

    I fixed the links in the above comment, thanks for reporting them.

    Geraldine Van der Auwera, PhD

  • newbie16newbie16 Member Posts: 42
Sign In or Register to comment.