Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
HaploytpeCaller gVCF calling for WGS
I am doing gVCF calls for whole genome samples and I would notice that the gvcf-calling jobs for some of the samples would fail at random genomic locations and if I resubmit those failed jobs, they would either finish successfully or fail again at a different genomic location ('genomic location' info from "ProgressMeter" line inside logs).
- I am doing one gVCF job per WGS sample. Right now there are more than 70% of jobs that are failing. Is there anything that should be changed on the parameters?
- Do you have something like a SOP for best practises on doing HaplotypeCaller calling for WGS samples? I understand the process is very similar to exome sequencing gVCF calling but somehow I see many more job failures with gVCF calling on WGS samples.
I am using the following parameters for gVCF call:
java -Xmx128g -XX:+UseConcMarkSweepGC -XX:-UseGCOverheadLimit -jar GenomeAnalysisTK.jar -T HaplotypeCaller -I file.bam -nct 8 -R human_g1k_v37.fasta -o /ttemp/file.g.vcf -L b37_wgs.intervals —emitRefConfidence GVCF --variant_index_type LINEAR --variant_index_parameter 128000 -dcov 250 -minPruning 3 -stand_call_conf 30 -stand_emit_conf 30 -G Standard -A AlleleBalance -A Coverage -A HomopolymerRun -A QualByDepth
Compute: One full node (“256GB RAM, 20 cores” per node) per single sample WGS gvcf job.
GATK version being used is "3.1”
P.S. I am also testing out the latest version of GATK (3.4) without “-dcov” option to see if that resolves the issue.