Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

HaploytpeCaller gVCF calling for WGS

Hi,

I am doing gVCF calls for whole genome samples and I would notice that the gvcf-calling jobs for some of the samples would fail at random genomic locations and if I resubmit those failed jobs, they would either finish successfully or fail again at a different genomic location ('genomic location' info from "ProgressMeter" line inside logs).

  • I am doing one gVCF job per WGS sample. Right now there are more than 70% of jobs that are failing. Is there anything that should be changed on the parameters?
  • Do you have something like a SOP for best practises on doing HaplotypeCaller calling for WGS samples? I understand the process is very similar to exome sequencing gVCF calling but somehow I see many more job failures with gVCF calling on WGS samples.

I am using the following parameters for gVCF call:

java -Xmx128g -XX:+UseConcMarkSweepGC -XX:-UseGCOverheadLimit -jar GenomeAnalysisTK.jar         
    -T HaplotypeCaller  
    -I file.bam 
    -nct 8 
    -R human_g1k_v37.fasta 
    -o /ttemp/file.g.vcf            
    -L b37_wgs.intervals
    —emitRefConfidence GVCF 
    --variant_index_type LINEAR --variant_index_parameter 128000            
    -dcov 250 
    -minPruning 3 
    -stand_call_conf 30 
    -stand_emit_conf 30
    -G Standard -A AlleleBalance -A Coverage            
    -A HomopolymerRun -A QualByDepth

Compute: One full node (“256GB RAM, 20 cores” per node) per single sample WGS gvcf job.
GATK version being used is "3.1”

P.S. I am also testing out the latest version of GATK (3.4) without “-dcov” option to see if that resolves the issue.

Thanks,

Shalabh

Best Answers

Answers

  • SheilaSheila Broad InstituteMember, Broadie admin

    @shalabhsuman
    Hi Shalabh,

    I suspect there are some regions that have a very high depth which are causing the program to stall. -dcov does not work in Haplotype Caller. Can you check the regions where the program is stalling and see if they are indeed in regions of high depth?

    -Sheila

  • shalabhsumanshalabhsuman NIHMember

    This is the error log:

    INFO  17:12:19,207 ProgressMeter -      1:16305426        4.54e+07   47.5 m       62.0 s      0.5%         6.0 d     6.0 d 
    INFO  17:13:19,208 ProgressMeter -      1:16505667        4.54e+07   48.5 m       64.0 s      0.6%         6.1 d     6.0 d 
    WARN  17:13:20,503 ExactAFCalc - this tool is currently set to genotype at most 6 alternate alleles in a given context, but the context at 1:16506095 has 8 alternate alleles so only the top alleles will be used; see the --max_alternate_alleles argument 
    WARN  17:13:36,398 ExactAFCalc - this tool is currently set to genotype at most 6 alternate alleles in a given context, but the context at 1:16519293 has 13 alternate alleles so only the top alleles will be used; see the --max_alternate_alleles argument 
    INFO  17:14:19,209 ProgressMeter -      1:16835298        4.54e+07   49.5 m       65.0 s      0.6%         6.1 d     6.0 d 
    INFO  17:15:03,681 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused 
    INFO  17:15:03,682 HttpMethodDirector - Retrying request 
    INFO  17:15:03,684 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused 
    INFO  17:15:03,684 HttpMethodDirector - Retrying request 
    INFO  17:15:03,686 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused 
    INFO  17:15:03,686 HttpMethodDirector - Retrying request 
    INFO  17:15:03,688 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused 
    INFO  17:15:03,688 HttpMethodDirector - Retrying request 
    INFO  17:15:03,690 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused 
    INFO  17:15:03,690 HttpMethodDirector - Retrying request 
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR stack trace 
    java.lang.NullPointerException
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.computeDiploidHaplotypeLikelihoods(PairHMMLikelihoodCalculationEngine.java:443)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.computeDiploidHaplotypeLikelihoods(PairHMMLikelihoodCalculationEngine.java:417)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.calculateGLsForThisEvent(GenotypingEngine.java:385)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.assignGenotypeLikelihoods(GenotypingEngine.java:222)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:880)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:141)
        at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:708)
        at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:704)
        at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:722)
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A GATK RUNTIME ERROR has occurred (version 3.1-1-g07a4bf8):
    ##### ERROR
    ##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ##### ERROR Visit our website and forum for extensive documentation and answers to 
    ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ##### ERROR
    ##### ERROR MESSAGE: Code exception (see stack trace for error itself)
    ##### ERROR ------------------------------------------------------------------------------------------
    
    
  • KurtKurt Member ✭✭✭

    BTW. Since I've taken out -nct, i've never had this issue again (and I use scatter/gather for whole genomes).

  • shalabhsumanshalabhsuman NIHMember

    Thanks, Kurt. I am trying that (WGS gVcf calling without -nct) right now!

Sign In or Register to comment.