Holiday Notice:
The Frontline Support team will be offline December 17-18 due to an institute-wide retreat and offline December 22- January 1, while the institute is closed. Thank you for your patience during these next few weeks as we get to all of your questions. Happy Holidays!

HaploytpeCaller gVCF calling for WGS

Hi,

I am doing gVCF calls for whole genome samples and I would notice that the gvcf-calling jobs for some of the samples would fail at random genomic locations and if I resubmit those failed jobs, they would either finish successfully or fail again at a different genomic location ('genomic location' info from "ProgressMeter" line inside logs).

  • I am doing one gVCF job per WGS sample. Right now there are more than 70% of jobs that are failing. Is there anything that should be changed on the parameters?
  • Do you have something like a SOP for best practises on doing HaplotypeCaller calling for WGS samples? I understand the process is very similar to exome sequencing gVCF calling but somehow I see many more job failures with gVCF calling on WGS samples.

I am using the following parameters for gVCF call:

java -Xmx128g -XX:+UseConcMarkSweepGC -XX:-UseGCOverheadLimit -jar GenomeAnalysisTK.jar         
    -T HaplotypeCaller  
    -I file.bam 
    -nct 8 
    -R human_g1k_v37.fasta 
    -o /ttemp/file.g.vcf            
    -L b37_wgs.intervals
    —emitRefConfidence GVCF 
    --variant_index_type LINEAR --variant_index_parameter 128000            
    -dcov 250 
    -minPruning 3 
    -stand_call_conf 30 
    -stand_emit_conf 30
    -G Standard -A AlleleBalance -A Coverage            
    -A HomopolymerRun -A QualByDepth

Compute: One full node (“256GB RAM, 20 cores” per node) per single sample WGS gvcf job.
GATK version being used is "3.1”

P.S. I am also testing out the latest version of GATK (3.4) without “-dcov” option to see if that resolves the issue.

Thanks,

Shalabh

Best Answers

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @shalabhsuman
    Hi Shalabh,

    I suspect there are some regions that have a very high depth which are causing the program to stall. -dcov does not work in Haplotype Caller. Can you check the regions where the program is stalling and see if they are indeed in regions of high depth?

    -Sheila

  • shalabhsumanshalabhsuman NIHMember

    This is the error log:

    INFO  17:12:19,207 ProgressMeter -      1:16305426        4.54e+07   47.5 m       62.0 s      0.5%         6.0 d     6.0 d 
    INFO  17:13:19,208 ProgressMeter -      1:16505667        4.54e+07   48.5 m       64.0 s      0.6%         6.1 d     6.0 d 
    WARN  17:13:20,503 ExactAFCalc - this tool is currently set to genotype at most 6 alternate alleles in a given context, but the context at 1:16506095 has 8 alternate alleles so only the top alleles will be used; see the --max_alternate_alleles argument 
    WARN  17:13:36,398 ExactAFCalc - this tool is currently set to genotype at most 6 alternate alleles in a given context, but the context at 1:16519293 has 13 alternate alleles so only the top alleles will be used; see the --max_alternate_alleles argument 
    INFO  17:14:19,209 ProgressMeter -      1:16835298        4.54e+07   49.5 m       65.0 s      0.6%         6.1 d     6.0 d 
    INFO  17:15:03,681 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused 
    INFO  17:15:03,682 HttpMethodDirector - Retrying request 
    INFO  17:15:03,684 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused 
    INFO  17:15:03,684 HttpMethodDirector - Retrying request 
    INFO  17:15:03,686 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused 
    INFO  17:15:03,686 HttpMethodDirector - Retrying request 
    INFO  17:15:03,688 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused 
    INFO  17:15:03,688 HttpMethodDirector - Retrying request 
    INFO  17:15:03,690 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused 
    INFO  17:15:03,690 HttpMethodDirector - Retrying request 
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR stack trace 
    java.lang.NullPointerException
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.computeDiploidHaplotypeLikelihoods(PairHMMLikelihoodCalculationEngine.java:443)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.computeDiploidHaplotypeLikelihoods(PairHMMLikelihoodCalculationEngine.java:417)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.calculateGLsForThisEvent(GenotypingEngine.java:385)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.assignGenotypeLikelihoods(GenotypingEngine.java:222)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:880)
        at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:141)
        at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:708)
        at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:704)
        at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:722)
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A GATK RUNTIME ERROR has occurred (version 3.1-1-g07a4bf8):
    ##### ERROR
    ##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ##### ERROR Visit our website and forum for extensive documentation and answers to 
    ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ##### ERROR
    ##### ERROR MESSAGE: Code exception (see stack trace for error itself)
    ##### ERROR ------------------------------------------------------------------------------------------
    
    
  • KurtKurt Member ✭✭✭

    BTW. Since I've taken out -nct, i've never had this issue again (and I use scatter/gather for whole genomes).

  • shalabhsumanshalabhsuman NIHMember

    Thanks, Kurt. I am trying that (WGS gVcf calling without -nct) right now!

Sign In or Register to comment.