We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Are you going to have a working version in few time of the complete GATK Best Practices in Spark?

I am trying to use your Spark commands in GATK 4 release in order to execute the GATK Best Practices with Spark and it looks working fine until the BQSRPipelineSpark (I am using gatk_4/build/libs/gatk-package-4.beta.5-67-ge919f85-SNAPSHOT-local.jar with openjdk version "1.8.0_131" on a single node with 27.5 GB RAM and 4 execution threads); but when I try to execute HaplotypeCallerSpark I noticed that there are some unresolved issues (reported on other questions in GATK forums too).
For example, as personal experience I tried to execute

./gatk-launch ReadsPipelineSpark --input ERR000589_aligned.bam --reference ucsc.hg19.2bit --disableSequenceDictionaryValidation true --knownSites dbsnp_138.hg19.vcf --knownSites Mills_and_1000G_gold_standard.indels.hg19.vcf --knownSites 1000G_phase1.indels.hg19.sites.vcf --emitRefConfidence GVCF --output ERR000589_raw_variants.g.vcf

that according to the output of gatk-launch --list

ReadsPipelineSpark                           (BETA Tool) Takes aligned reads (likely from BWA) and runs MarkDuplicates, BQSR, and HaplotypeCaller. The final result is analysis-ready variants

contains HaplotypeCaller; in particular, I guess that the execution never ends (or it takes too much time) because I executed the program all night long, without reaching the end of the execution of this command; I attached part of the command output and I even tried to google some WARN line, but I was not able to interpret them.
There was the same behavior even when I tried to execute HaplotypeCallerSpark.

So is the problem mine, in the execution of the command? Or it is a problem of HaplotypeCallerSpark which is not again mature? And in case this second option is true, I am interested to know if GATK staff is working on it and if we will have a working version in few time or we will have to wait for much time?
Because in case I could consider to look at other "Sparkified" solutions for the GATK Best Practices following steps (Haplotype Caller, Genotype GVCF, Variant Annotation...).

Thanks for your time,

Issue · Github
by Sheila

Issue Number
Last Updated
Closed By


Sign In or Register to comment.