The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Did you remember to?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?

Then follow instructions in Article#1894.

Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

Haploid genomes

gilgigilgi Member Posts: 19
edited July 2012 in Ask the GATK team

Dear GATK team,

I know that in the past GATK was not suitable for haploid genomes.
I wanted to ask if this possibly changed since then - and whether it is possible to use GATK for haploid genomes.

Thanks a lot,

Post edited by Carneiro on

Best Answer

  • Mark_DePristoMark_DePristo Broad InstituteMember Posts: 153 admin
    Accepted Answer

    The ug in gatk2 can call haploid sequence natively now. You just set ploidy to 1.

    Mark A. DePristo, Ph.D.
    Co-Director, Medical and Population Genetics
    Broad Institute of MIT and Harvard


  • Mark_DePristoMark_DePristo Broad InstituteMember Posts: 153 admin
    Accepted Answer

    The ug in gatk2 can call haploid sequence natively now. You just set ploidy to 1.

    Mark A. DePristo, Ph.D.
    Co-Director, Medical and Population Genetics
    Broad Institute of MIT and Harvard

  • gilgigilgi Member Posts: 19

    Thanks! This is great news! I'll try to work with it.

  • gilgigilgi Member Posts: 19

    We tried to download gatk2 from here:

    But after download saw that the version doesn't seem to be 2 but: version 1.6-596-g3b9929c

    Is this the correct version?


  • ebanksebanks Broad InstituteMember, Broadie, Dev Posts: 692 admin

    The 2.0 release should be coming out later today.

    Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

  • gilgigilgi Member Posts: 19

    Thanks a lot. I downloaded the gatk 2.0.

    When trying to use the UnifiedGenotyper with --sample_ploidy 1 I get an error:
    MESSAGE: Incorrect genotype calculation model chosen. Only [POOLSNP|POOLINDEL|POOLBOTH] supported with this walker if sample ploidy != 2

    What does this mean? My data isn't pool, I have individual (barcoded) haploid sequenced strains.
    I tried to add:
    --genotype_likelihoods_model POOLBOTH

    But then I get:
    MESSAGE: Incorrect AF Calculation model. Only POOL model supported if sample ploidy != 2

    I tried to look for the answer in the guide - without success.
    Can you help please?

  • gilgigilgi Member Posts: 19

    Thanks a lot!!!

  • hillihilli Member Posts: 1

    Ive used the -pnrm POOL option but still getting the same error as gilgi. So I have

    java -Xmx30g -jar /usr/local/gatk2/GenomeAnalysisTK.jar -T UnifiedGenotyper -R ref -I bam -I bam -pnrm POOL -polidy 1 -o vcf

    Any help?

  • ebanksebanks Broad InstituteMember, Broadie, Dev Posts: 692 admin

    We have changed the arguments so that they more accurately reflect what they are doing:
    So you'll want e.g. -pnrm GeneralPloidySNP

    Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

  • flowflow Member Posts: 14

    With v2.0-39 I had been using -ploidy 1 -pnrm POOL -glm POOLSNP. Can you confirm that for calling of snps/indels in haploid genomes as of v2.1 would now be -ploidy 1 -pnrm EXACT -glm GeneralPloidySNP. Is the -pnrm POOL option now defunct?

  • delangeldelangel Broad InstituteMember Posts: 71

    You just need -ploidy 1. "-pnrm EXACT" will work but there's no other option :). "-glm GeneralPloidySNP" will not work - you need either SNP, INDEL or BOTH.

  • flowflow Member Posts: 14

    Thanks for your reply. Under what circumstance should -glm GeneralPloidySNP/GeneralPloidyINDEL be used?

  • aunderwoaunderwo Member Posts: 3

    I am experiencing a problem with the ploidy 1 option.
    Having used GATK2 unified genotyper with the params --sample_ploidy 1 --genotype_likelihoods_model BOTH -rf BadCigar
    I get the following line in a vcf file (see sample in bold)

    Staphylococcus 1553115 . A G 24454.01 . AC=13;AF=0.813;AN=16;BaseQRankSum=1.072;DP=1040;Dels=0.00;FS=32.822;HaplotypeScore=3.3463;MLEAC=13;MLEAF=0.813;MQ=40.20;MQ0=47;MQRankSum=-10.543;QD=32.13;ReadPosRankSum=-1.148;SB=-9.076e+03 GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:0,29:29:99:1:1.00:1015,0 1:0,62:62:99:1:1.00:2053,0 1:0,106:106:99:1:1.00:3210,0 1:0,102:102:99:1:1.00:3305,0 1:0,88:88:99:1:1.00:2750,0 1:0,41:41:99:1:1.00:1324,0 1:0,76:76:99:1:1.00:2448,0 1:0,39:39:99:1:1.00:1303,0 0:64,40:104:99:0:0.00:0,1334 1:0,41:41:99:1:1.00:1373,0 1:0,49:49:99:1:1.00:1668,0 0:72,50:122:99:0:0.00:0,1258 1:0,59:59:99:1:1.00:1852,0 1:0,38:38:99:1:1.00:1192,0 1:0,31:31:99:1:1.00:961,0 0:53,0:53:99:0:0.00:0,1633

    The sample in bold is called as WT (genotype 0) with a high GQ despite there being 72 reads of genotype 0 and 50 of genotype 1. Examining the bam file suggests that this is a mapping error in a repetitive phage region

    If I set ploidy to be 2 the equivalent line in the resulting vcf file is

    Staphylococcus 1553115 . A G 24788.02 . AC=28;AF=0.875;AN=32;BaseQRankSum=0.947;DP=1040;Dels=0.00;FS=32.822;HaplotypeScore=3.3463;InbreedingCoeff=0.4286;MLEAC=28;MLEAF=0.875;MQ=40.20;MQ0=47;MQRankSum=-10.096;QD=25.11;ReadPosRankSum=-1.177;SB=-9.871e+03 GT:AD:DP:GQ:PL 1/1:0,29:29:81:986,81,0 1/1:0,62:62:99:1895,156,0 1/1:0,106:106:99:2992,247,0 1/1:0,102:102:99:3169,268,0 1/1:0,88:88:99:2452,193,0 1/1:0,41:41:99:1243,99,0 1/1:0,76:76:99:2283,193,0 1/1:0,39:39:99:1233,105,0 0/1:64,40:104:99:886,0,1706 1/1:0,41:41:99:1298,108,0 1/1:0,49:49:99:1581,129,0 0/1:72,50:122:99:1235,0,2126 1/1:0,59:59:99:1649,132,0 1/1:0,38:38:87:1065,87,0 1/1:0,31:31:69:821,69,0 0/0:53,0:53:99:0,138,1588

    As can be seen from the bold text, the same position is called as heterozygote which based on the number of the reads mapping would be likley except for the fact this is a bacterial haploid genome. Previously I would have discarded this since the heterozygous call indicates mis-mapping as the bam file confirms. I had been hoping to use the sample_polidy option set to 1 for bacterial genomes but this results concerns me. I could obviously filter based on AD but the wonder why the sample was given a high GQ when the polidy is set to 1 and the AD suggests the call of genotype 0 should be doubted.

    Any suggestions on what is going on here?? Many thanks


  • delangeldelangel Broad InstituteMember Posts: 71

    The code is actually doing what it's designed to do - when you're using -ploidy 1, there are only 2 possible genotype assignments, and the assignment "0" is by far the most likely one even if 40% of your reads have another base. In the default diploid case, the most likely genotype is the 0/1 one, which is exactly what you're getting.
    Even in the haploid case, there's considerable evidence that favors the "0" genotype (plus the population prior), so you'll get a high value of GQ anyway - your PL values of 0,1258 indicate that, statistically, it's 10^125 likelier that your data came from a reference site than from an alt site based on all the available reads.

  • flowflow Member Posts: 14
    edited January 2013

    In the current documentation (v2.3-9) for the Unified Genotyper there is a caveat stating "We only handle diploid genotypes". Has something changed or can -ploidy still be safely set to 1?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,659 admin

    That's an old caveat, I'll remove it from the doc. You're good to go.

    Geraldine Van der Auwera, PhD

  • JeegarJeegar CanadaMember Posts: 16

    Hi @Sheila @Geraldine_VdAuwera @Mark_DePristo

    I am working on a Haploid non-model organism. After following GATK's best practices for SNP calling, I ended up with 1.9k variants. I want to further filter on the basis of homozygous non-reference containing 0% reads in reference and 100% reads in sample. Is there any tool available for that. On what criteria I should filter out to get true variants?

    I have attached an image below for your reference.


    Screen Shot 2015-09-18 at 6.00.32 PM.png
    734 x 317 - 114K
  • SheilaSheila Broad InstituteMember, Broadie, Moderator Posts: 4,709 admin
Sign In or Register to comment.