Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Regarding Ploidy Option of HaplotypeCaller

I recently checked your GATK latest version (3.3),
and found that you made the option of "ploidy" for usage of HaplotypeCaller.

We checked it and discovered that it is only available with the number less than 20 (till 19).
Do you have any reasons for making the upper limit of the ploidy option?
or any plans of adjustment of limitation to higher values? (such as 50)

I always thank you all~!!

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @for3guy‌

    Hi,

    What exactly happens when you set the ploidy to 20? Does it give you an error message or warning?

    -Sheila

  • for3guyfor3guy Member

    This is what happens here.
    I'm sorry for the long lines=_=

    INFO 09:02:16,059 HelpFormatter - --------------------------------------------------------------------------------

    INFO 09:02:16,065 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.3-0-g37228af, Compiled 2014/10/24 01:07:22

    INFO 09:02:16,065 HelpFormatter - Copyright (c) 2010 The Broad Institute

    INFO 09:02:16,065 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk

    INFO 09:02:16,070 HelpFormatter - Program Args: -T HaplotypeCaller -R /gmi-l1/_90.User_Data/ysk819/Tool/Reference/Human_B19.fa -I AlignedReads/10-213_1.sorted.dp.ir.recal.bam --emitRefConfidence GVCF --variant_index_type LINEAR --variant_index_parameter 128000 --dbsnp /gmi-l1/_90.User_Data/ysk819/Tool/DataBase/SNP/All_with_chr.vcf -L Epilepsy_RefSeq.bed --sample_ploidy 20 -o VariantCalls/10-213_1.ploidy20.variants.raw.vcf

    INFO 09:02:16,076 HelpFormatter - Executing as [email protected] on Linux 2.6.18-194.el5 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_51-b13.

    INFO 09:02:16,077 HelpFormatter - Date/Time: 2014/10/29 09:02:16

    INFO 09:02:16,077 HelpFormatter - --------------------------------------------------------------------------------

    INFO 09:02:16,077 HelpFormatter - --------------------------------------------------------------------------------

    INFO 09:02:17,095 GenomeAnalysisEngine - Strictness is SILENT

    INFO 09:02:17,299 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 250

    INFO 09:02:17,309 SAMDataSource$SAMReaders - Initializing SAMRecords in serial

    INFO 09:02:17,402 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.08

    INFO 09:02:17,432 HCMappingQualityFilter - Filtering out reads with MAPQ < 20

    INFO 09:02:18,200 IntervalUtils - Processing 227903 bp from intervals

    INFO 09:02:18,818 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files

    INFO 09:02:18,951 GenomeAnalysisEngine - Done preparing for traversal

    INFO 09:02:18,952 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]

    INFO 09:02:18,952 ProgressMeter - | processed | time | per 1M | | total | remaining

    INFO 09:02:18,953 ProgressMeter - Location | active regions | elapsed | active regions | completed | runtime | runtime

    INFO 09:02:18,953 HaplotypeCaller - Currently, physical phasing is not available when ploidy is different than 2; therefore it won't be performed

    INFO 09:02:18,953 HaplotypeCaller - Standard Emitting and Calling confidence set to 0.0 for reference-model confidence output

    INFO 09:02:18,954 HaplotypeCaller - All sites annotated with PLs forced to true for reference-model confidence output

    INFO 09:02:19,299 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.ArrayIndexOutOfBoundsException: 20

    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.ReferenceConfidenceModel.indelPLCache(ReferenceConfidenceModel.java:279)
    
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.ReferenceConfidenceModel.getIndelPLs(ReferenceConfidenceModel.java:270)
    
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.ReferenceConfidenceModel.calculateRefConfidence(ReferenceConfidenceModel.java:227)
    
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.referenceModelForNoVariation(HaplotypeCaller.java:1209)
    
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:983)
    
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:221)
    
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:709)
    
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:705)
    
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
    
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
    
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:274)
    
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:78)
    
    at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
    
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:319)
    
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
    
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
    
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
    
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:107)
    
    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 3.3-0-g37228af):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: 20
    ERROR ------------------------------------------------------------------------------------------
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @for3guy‌

    Hi,

    Can you please submit a snippet of the file that is causing the error? We would like to debug locally. Directions are here: http://gatkforums.broadinstitute.org/discussion/1894/how-do-i-submit-a-detailed-bug-report

    Thanks,
    Sheila

  • FerFer AustriaMember

    Hi, thanks for the ploidy option in HaplotypeCaller GATKv3.3. It has potential, though I also got a bit disappointed that anything above 19 gives an error; therefore it is a bit limited for pools.
    However, my question is regarding its compatibility with the other nice implementation you came up with in the latest version: physical phasing. Is it possible to get PID (for phase identifier) and PGT (phased genotype) when running HaplotypeCaller in ploidy mode? I only got the following in the format columns GT:AD:DP:GQ:SB
    Best regards,
    Alberto

  • valentinvalentin Cambridge, MAMember, Dev ✭✭

    Hi @Fer,

    for what I can see in the error message submitted by @for3guy this is fact a bug and it may well be that you will be able to work with ploidy greater than 19 when it gets fixed; if it was an actual limitation of the tool, the error message would be a bit nicer indicating so explicitly.

    As for physical phasing, it only works for diploid analysis I am afraid; you should see a INFO message saying so starting with "Currently, physical phasing is not available when ploidy is different than..." if ploidy is set to anything but 2. I am not aware of any plans to extend it to non-diploid analysis anytime soon.

    V.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    To add to Valentin's note, there is no set limitation for the ploidy option, and based on the stack trace above I think the ploidy value and the number 20 in the error are entirely coincidental. We're still waiting for a test case from @for3guy to figure out exactly what is causing the error.

  • FerFer AustriaMember

    Thanks for the clarification in both of my concerns. Looking forward to play a bit more with the ploidy option.

  • FrorFror PolandMember

    @Geraldine_VdAuwera said:
    To add to Valentin's note, there is no set limitation for the ploidy option, and based on the stack trace above I think the ploidy value and the number 20 in the error are entirely coincidental. We're still waiting for a test case from for3guy to figure out exactly what is causing the error.

    Have you solved this problem? Do you have any suggestions form test case from for3guy study?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Fror
    Hi,

    Unfortunately, we never got a snippet to debug. If you can upload snippets, that would be helpful.

    Thanks,
    Sheila

  • kvaneijkkvaneijk UtrechtMember

    Dear all,

    We are wondering whether it is possible to generate gVCFs with a ploidy > 20.
    We are analyzing mitochondrial variants and use a ploidy of 50 (we have 40x coverage for autosomes and between 1000-4000x coverage for mitochondrial reads).
    However, we learned on this forum that it's not possible to generate gVCFs for variants with a ploidy > 20.
    We read about a nightly build that was available in January this year, but in addition, we read that nightly builds are not supported and there can be other issues with such a build.

    Are there plans to change the maximum ploidy with gVCFs? Or what would you recommend?

    Thank you!

    Best wishes,

    Kristel

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @kvaneijk
    Hi Kristel,

    We have a new stable release that has come out after January. It is able to handle ploidies greater than 20, however, there are known performance issues. We are working on addressing those in the near future.

    -Sheila

  • kvaneijkkvaneijk UtrechtMember

    Hi Sheila,

    Thank you! We have tried it with GATK3.3 and will try it again with GATK3.4.
    In the mean time, we are looking forward to a next version that has solved the performance issues.

    Best,
    Kristel

  • archanaarchana IndiaMember

    Hello

    Till data ploidy is of two types - allotplodiy (tetra- Having two diploid sets of chromosomes each derived from a different parental species, resulting in an allopolyploid organism.) and autoploidy(an individual or strain whose chromosome complement consists of four copies of a single genome due to doubling of an ancestral chromosome complement) . Want to know if this 'ploidy' covers autoploidy or alloplidy .

    Secondly I want to know effect of "ploidy" parameter on SNP prediction. From where i can get its detail in documentation form.

    Thank you

    Archana

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    For variant calling, it doesn't matter if the tetra ploidy results from allo or auto ploidy. The software simply adapts its expectation of how many haplotypes to model in total.

Sign In or Register to comment.