Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

How to call GVCF with HaplotypeCaller using Queue

SunhyeSunhye KoreaMember
edited September 2015 in Ask the GATK team

Hi.
I want to call GVCF with HaplotypeCaller like example.

example )

        Single-sample all-sites calling on DNAseq (for `-ERC GVCF` cohort analysis workflow)
           java
             -jar GenomeAnalysisTK.jar
             -T HaplotypeCaller
             -R reference.fasta
             -I sample1.bam \
             --emitRefConfidence GVCF \
             --variant_index_type LINEAR \
             --variant_index_parameter 128000
             [--dbsnp dbSNP.vcf] \
             [-L targets.interval_list] \
             -o output.raw.snps.indels.g.vcf

But because HaplotypeCaller is slow, I attempt to use Queue.

I wrote HaplotypeCaller.scala with reference to the GATK's document,and to calling GVCF mode, added to argument.

    @Argument(doc="Mode for emitting reference confidence scores", shortName="ERC", required=false)
    var EmitRefConfidence: String = _

and

//Optional
 if (EmitRefConfidence != null) {
      haplotypeCaller.emitRefConfidence = EmitRefConfidence
  }

But I encountered error.

  INFO  17:15:45,314 QScriptManager - Compiling 1 QScript 
ERROR 17:15:47,345 QScriptManager - HaplotypeCaller.scala:70: type mismatch;
 found   : String
 required: org.broadinstitute.gatk.tools.walkers.haplotypecaller.ReferenceConfidenceMode 
ERROR 17:15:47,350 QScriptManager -             haplotypeCaller.emitRefConfidence = EmitRefConfidence 
ERROR 17:15:47,353 QScriptManager -                                                 ^ 
ERROR 17:15:47,949 QScriptManager - two errors found 
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace 
org.broadinstitute.gatk.queue.QException: Compile of /BiO/psh/BioTools/scala/GATK-QScripts/HaplotypeCaller.scala failed with 2 errors
    at org.broadinstitute.gatk.queue.QScriptManager.loadScripts(QScriptManager.scala:79)
    at org.broadinstitute.gatk.queue.QCommandLine.org$broadinstitute$gatk$queue$QCommandLine$$qScriptPluginManager$lzycompute(QCommandLine.scala:94)
    at org.broadinstitute.gatk.queue.QCommandLine.org$broadinstitute$gatk$queue$QCommandLine$$qScriptPluginManager(QCommandLine.scala:92)
    at org.broadinstitute.gatk.queue.QCommandLine.getArgumentSources(QCommandLine.scala:229)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:205)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
    at org.broadinstitute.gatk.queue.QCommandLine$.main(QCommandLine.scala:61)
    at org.broadinstitute.gatk.queue.QCommandLine.main(QCommandLine.scala)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.4-46-gbc02625):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Compile of /BiO/psh/BioTools/scala/GATK-QScripts/HaplotypeCaller.scala failed with 2 errors
##### ERROR ------------------------------------------------------------------------------------------
INFO  17:15:48,242 QCommandLine - Shutting down jobs. Please wait... 

How do I revise my script?

Comments

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Ah, this is a Java typing problem. You defined the ERC argument as a String, but in fact the code says it should be an enumerated type. You'll need to import the correct type object which is org.broadinstitute.gatk.tools.walkers.haplotypecaller.ReferenceConfidenceMode.GVCF and specify haplotypeCaller.emitRefConfidence = GVCF. I also recommend using var EmitRefConfidence: Boolean = false so you can just set it as a flag. Unless you want to be able to switch between GVCF and BP_RESOLUTION, in which case you need to add a bit more logic.

  • SunhyeSunhye KoreaMember

    Thanks Geraldine!
    I revised my script according to your advice.
    This script run but other errors occur.

    Here is error.

    ##### ERROR MESSAGE: GVCF output requires a specific indexing strategy.  Please re-run including the arguments -variant_index_type LINEAR -variant_index_parameter 128000.
    

    So I attmpt to add 2 argument to my script.

        @Argument(doc="Variant index type", shortName="variant_index_type", required=true)
            var variantIndexType: Boolean = false
    
        @Argument(doc="Variant index parameter", shortName="variant_index_parameter", required=true)
            var variantIndexPara: Int = _
    
     def script() {
            val haplotypeCaller = new HaplotypeCaller
              ...
            haplotypeCaller.variant_index_parameter = variantIndexPara
            haplotypeCaller.variant_index_type = LINEAR
              ...
      }
    

    But another errors occur.

    INFO  14:27:19,755 QScriptManager - Compiling 1 QScript 
    ERROR 14:27:21,719 QScriptManager - HaplotypeCaller.scala:69: not found: value LINEAR 
    ERROR 14:27:21,723 QScriptManager -         haplotypeCaller.variant_index_type = LINEAR 
    ERROR 14:27:21,726 QScriptManager -                                              ^ 
    ERROR 14:27:22,366 QScriptManager - two errors found 
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR stack trace 
    org.broadinstitute.gatk.queue.QException: Compile of /BiO/psh/BioTools/scala/GATK-QScripts/HaplotypeCaller.scala failed with 2 errors
        at org.broadinstitute.gatk.queue.QScriptManager.loadScripts(QScriptManager.scala:79)
        at org.broadinstitute.gatk.queue.QCommandLine.org$broadinstitute$gatk$queue$QCommandLine$$qScriptPluginManager$lzycompute(QCommandLine.scala:94)
        at org.broadinstitute.gatk.queue.QCommandLine.org$broadinstitute$gatk$queue$QCommandLine$$qScriptPluginManager(QCommandLine.scala:92)
        at org.broadinstitute.gatk.queue.QCommandLine.getArgumentSources(QCommandLine.scala:229)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:205)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
        at org.broadinstitute.gatk.queue.QCommandLine$.main(QCommandLine.scala:61)
        at org.broadinstitute.gatk.queue.QCommandLine.main(QCommandLine.scala)
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A GATK RUNTIME ERROR has occurred (version 3.4-46-gbc02625):
    ##### ERROR
    ##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ##### ERROR Visit our website and forum for extensive documentation and answers to 
    ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ##### ERROR
    ##### ERROR MESSAGE: Compile of /BiO/psh/BioTools/scala/GATK-QScripts/HaplotypeCaller.scala failed with 2 errors
    ##### ERROR ------------------------------------------------------------------------------------------
    INFO  14:27:22,679 QCommandLine - Shutting down jobs. Please wait... 
    

    I wonder how to add 2 Argument( variant_index_type , variant_index_parameter ) to queue script.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    If you name your output file with a ".g.vcf" suffix you don't need to specify those arguments.

  • SunhyeSunhye KoreaMember

    Thanks Geraldine!

  • bcosclolbcosclol Boston, MAMember

    Hi Geraldine,

    I tried naming my outputfile with ".g.vcf" and I get the error that I have to include variant_index_type LINEAR -variant_index_parameter 128000.

    For what its worth, I'm using GATK Queue 3.3 and I cannot update to 3.4 at this time.

    What are the variables I need to set in order for these parameters to get translated to the command line? i.e. haplotypeCaller.variant_index_parameter = 128000 and haplotypeCaller.variant_index_type = LINEAR. What should I change variant_index_parameter and variant_index_type to here?

    Thanks!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @bcosclol,

    You should be able to set haplotypeCaller.variant_index_parameter = 128000 in your qscript. For haplotypeCaller.variant_index_type it's a little trickier because it is an enumerated type (ENUM) so you will need to import the appropriate type. Let me know if this is something you need help with.

  • meharmehar Member ✭✭

    @Sunhye said:
    Thanks Geraldine!

    Dear Sunhye, Could you post the working version of the haplotype caller qscript as i am in similar situation and with a similar error. It would be helpful if you could post the script and the command used. Here is my post [http://gatkforums.broadinstitute.org/discussion/6519/haplotypecaller-gvcf-mode-qscript?new=1] .

  • meharmehar Member ✭✭

    @Geraldine_VdAuwera said:
    Hi @bcosclol,

    You should be able to set haplotypeCaller.variant_index_parameter = 128000 in your qscript. For haplotypeCaller.variant_index_type it's a little trickier because it is an enumerated type (ENUM) so you will need to import the appropriate type. Let me know if this is something you need help with.

    Dear Geraldine,

    I am following these posts to make the queue script for Haplotypecaller in GVCF mode. I met with similar error:

    ERROR MESSAGE: GVCF output requires a specific indexing strategy. Please re-run including the arguments -variant_index_type LINEAR -variant_index_parameter 128000.

    Could you let me know how to set variable type for haplotypeCaller.variant_index_type in the Qscript?

    Issue · Github
    by Sheila

    Issue Number
    392
    State
    closed
    Last Updated
    Milestone
    Array
    Closed By
    vdauwera
  • meharmehar Member ✭✭

    Dear Sheila,

    Have you made an edit or answer to the post? I could see that you have updated but could not figure out what has been updated.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @mehar

    Hi,

    I just posted a comment in Github so someone else from my team can help me answer your question :smile: I will get back to you when I have an answer from my team.

    -Sheila

  • rernstrernst UMC UtrechtMember

    You need to import: org.broadinstitute.gatk.utils.variant.GATKVCFIndexType, which allows you to set:
    haplotypeCaller.variant_index_type = GATKVCFIndexType.LINEAR
    haplotypeCaller.variant_index_parameter = 128000

    Here is an example of the qscript I use: https://github.com/CuppenResearch/IAP/blob/master/QScripts/HaplotypeCaller_gVCF.scala

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @rernst
    Hi,

    Thank you for posting your solution!

    -Sheila

  • meharmehar Member ✭✭

    Hi,

    Thank you. The script appears to run without any errors. However, i would like to find documentation about maxMem and numScatters used in the script. I am not able to find the documentation for these arguments. Could you direct me to the source page.

Sign In or Register to comment.