The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

UnifiedGenotyper: different -glm value result in different sets of variants?

ihleeihlee Member Posts: 2

Hi, I'm running UnifiedGenotyper with different glm values (BOTH, INDEL, SNP).
I expected that the set of variants from glm BOTH is the same as the union of variants from glm SNP and INDEL, but it wasn't.
Althought the different was not big (less than 100 variants), I'm curious why there's such a difference and want to know which is better way to find variants (both snps & indels).
Thank you.

Best Answer

  • ebanksebanks Broad InstituteMember, Broadie, Dev Posts: 692 admin
    Accepted Answer

    The GATK uses randomization for down-sampling high coverage sites. It does, however, use the same initial seed for the randomization (so that's why it is deterministic when running single threaded). Because the genotyper is triggering at a different set of sites between SNP and BOTH, it affects the random numbers used at any given position.

    BOTH mode does not do joint SNP and indel calling, but rather is just a way for users to save IO by only having to read the data once instead of twice. If you want joint calling of all variant types then you should check out the Haplotype Caller which does this.

    Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

Answers

  • vanheel123vanheel123 Member Posts: 5

    I'd also be interested in the comments from GATK team on this, please.

    Also - apologies as I havent yet tried it - but what would happen at a site with a typical REF/ALT biallelic SNP, but also a single base indel - would this be outputted as a triallelic variant in the vcf with UnifiedGenotype -glm BOTH ?

    regards, david van heel

  • ebanksebanks Broad InstituteMember, Broadie, Dev Posts: 692 admin
    Accepted Answer

    The GATK uses randomization for down-sampling high coverage sites. It does, however, use the same initial seed for the randomization (so that's why it is deterministic when running single threaded). Because the genotyper is triggering at a different set of sites between SNP and BOTH, it affects the random numbers used at any given position.

    BOTH mode does not do joint SNP and indel calling, but rather is just a way for users to save IO by only having to read the data once instead of twice. If you want joint calling of all variant types then you should check out the Haplotype Caller which does this.

    Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

  • vanheel123vanheel123 Member Posts: 5

    Dear Eric,

    How does HaplotypeCaller perform on single samples? I have very high depth data and UnifiedGenotyper gives great SNP calling results on single samples at a time. Such that I have not even tried multi-sample calling, given issues of batches, memory, etc (I have thousands of samples). As posted above, I am keen to have joint calling of both SNPs and INDELs, with output of reference homozygote, non-ref het and non-ref hom genotypes. I presume "HaplotypeCaller --output_mode EMIT_ALL_CONFIDENT_SITES" will do this.

    Sorry - couldnt find that much description of algorithms behind HaplotypeCaller on searching - may have missed it and would appreciate being pointed in right direction if so!

    regards, david

  • vanheel123vanheel123 Member Posts: 5

    Please ignore the above post - I think I have to use multi-sample calling to get what I want with indels (or feed a single sample caller a list of positions/alleles to call).
    david

Sign In or Register to comment.