Our documentation websites are currently offline due to a data center fire. We do not yet have an ETA for restoring service; we’ll update this message when we know more.

HaploytpeCaller gVCF calling for WGS

Hi,

I am doing gVCF calls for whole genome samples and I would notice that the gvcf-calling jobs for some of the samples would fail at random genomic locations and if I resubmit those failed jobs, they would either finish successfully or fail again at a different genomic location ('genomic location' info from "ProgressMeter" line inside logs).

  • I am doing one gVCF job per WGS sample. Right now there are more than 70% of jobs that are failing. Is there anything that should be changed on the parameters?
  • Do you have something like a SOP for best practises on doing HaplotypeCaller calling for WGS samples? I understand the process is very similar to exome sequencing gVCF calling but somehow I see many more job failures with gVCF calling on WGS samples.

I am using the following parameters for gVCF call:

java -Xmx128g -XX:+UseConcMarkSweepGC -XX:-UseGCOverheadLimit -jar GenomeAnalysisTK.jar         
    -T HaplotypeCaller  
    -I file.bam 
    -nct 8 
    -R human_g1k_v37.fasta 
    -o /ttemp/file.g.vcf            
    -L b37_wgs.intervals
    —emitRefConfidence GVCF 
    --variant_index_type LINEAR --variant_index_parameter 128000            
    -dcov 250 
    -minPruning 3 
    -stand_call_conf 30 
    -stand_emit_conf 30
    -G Standard -A AlleleBalance -A Coverage            
    -A HomopolymerRun -A QualByDepth

Compute: One full node (“256GB RAM, 20 cores” per node) per single sample WGS gvcf job.
GATK version being used is "3.1”

P.S. I am also testing out the latest version of GATK (3.4) without “-dcov” option to see if that resolves the issue.

Thanks,

Shalabh

Best Answers

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @shalabhsuman
    Hi Shalabh,

    I suspect there are some regions that have a very high depth which are causing the program to stall. -dcov does not work in Haplotype Caller. Can you check the regions where the program is stalling and see if they are indeed in regions of high depth?

    -Sheila

  • This is the error log:

    INFO  17:12:19,207 ProgressMeter -      1:16305426        4.54e+07   47.5 m       62.0 s      0.5%         6.0 d     6.0 d 
    INFO  17:13:19,208 ProgressMeter -      1:16505667        4.54e+07   48.5 m       64.0 s      0.6%         6.1 d     6.0 d 
    WARN  17:13:20,503 ExactAFCalc - this tool is currently set to genotype at most 6 alternate alleles in a given context, but the context at 1:16506095 has 8 alternate alleles so only the top alleles will be used; see the --max_alternate_alleles argument 
    WARN  17:13:36,398 ExactAFCalc - this tool is currently set to genotype at most 6 alternate alleles in a given context, but the context at 1:16519293 has 13 alternate alleles so only the top alleles will be used; see the --max_alternate_alleles argument 
    INFO  17:14:19,209 ProgressMeter -      1:16835298        4.54e+07   49.5 m       65.0 s      0.6%         6.1 d     6.0 d 
    INFO  17:15:03,681 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused 
    INFO  17:15:03,682 HttpMethodDirector - Retrying request 
    INFO  17:15:03,684 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused 
    INFO  17:15:03,684 HttpMethodDirector - Retrying request 
    INFO  17:15:03,686 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused 
    INFO  17:15:03,686 HttpMethodDirector - Retrying request 
    INFO  17:15:03,688 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused 
    INFO  17:15:03,688 HttpMethodDirector - Retrying request 
    INFO  17:15:03,690 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused 
    INFO  17:15:03,690 HttpMethodDirector - Retrying request 
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR stack trace 
    java.lang.NullPointerException
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.computeDiploidHaplotypeLikelihoods(PairHMMLikelihoodCalculationEngine.java:443)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.computeDiploidHaplotypeLikelihoods(PairHMMLikelihoodCalculationEngine.java:417)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.calculateGLsForThisEvent(GenotypingEngine.java:385)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.assignGenotypeLikelihoods(GenotypingEngine.java:222)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:880)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:141)
        at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:708)
        at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:704)
        at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:722)
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A GATK RUNTIME ERROR has occurred (version 3.1-1-g07a4bf8):
    ##### ERROR
    ##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ##### ERROR Visit our website and forum for extensive documentation and answers to 
    ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ##### ERROR
    ##### ERROR MESSAGE: Code exception (see stack trace for error itself)
    ##### ERROR ------------------------------------------------------------------------------------------
    
    
  • KurtKurt Member

    BTW. Since I've taken out -nct, i've never had this issue again (and I use scatter/gather for whole genomes).

  • Thanks, Kurt. I am trying that (WGS gVcf calling without -nct) right now!

Sign In or Register to comment.