HaploytpeCaller gVCF calling for WGS
I am doing gVCF calls for whole genome samples and I would notice that the gvcf-calling jobs for some of the samples would fail at random genomic locations and if I resubmit those failed jobs, they would either finish successfully or fail again at a different genomic location ('genomic location' info from "ProgressMeter" line inside logs).
- I am doing one gVCF job per WGS sample. Right now there are more than 70% of jobs that are failing. Is there anything that should be changed on the parameters?
- Do you have something like a SOP for best practises on doing HaplotypeCaller calling for WGS samples? I understand the process is very similar to exome sequencing gVCF calling but somehow I see many more job failures with gVCF calling on WGS samples.
I am using the following parameters for gVCF call:
java -Xmx128g -XX:+UseConcMarkSweepGC -XX:-UseGCOverheadLimit -jar GenomeAnalysisTK.jar -T HaplotypeCaller -I file.bam -nct 8 -R human_g1k_v37.fasta -o /ttemp/file.g.vcf -L b37_wgs.intervals —emitRefConfidence GVCF --variant_index_type LINEAR --variant_index_parameter 128000 -dcov 250 -minPruning 3 -stand_call_conf 30 -stand_emit_conf 30 -G Standard -A AlleleBalance -A Coverage -A HomopolymerRun -A QualByDepth
Compute: One full node (“256GB RAM, 20 cores” per node) per single sample WGS gvcf job.
GATK version being used is "3.1”
P.S. I am also testing out the latest version of GATK (3.4) without “-dcov” option to see if that resolves the issue.