Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office on November 11th and 13th 2019, due to the U.S. holiday(Veteran's day) and due to a team event(Nov 13th). We will return to monitoring the GATK forum on November 12th and 14th respectively. Thank you for your patience.

gatk3.8 vs gatk4 va gatk4spark ,the newer the slower!!

I use the gatk3.8 gatk4.0.0and gatkspark to test my data . I received a suprising result. gatk4 is slower than gatk3.8 ,and gatkspark is slower than them. The times are 17.3 vs 19.2 vs 24 min . The codes are basic and as follows:

gatk3.8

java -jar /GenomeAnalysisTK-3.8-0-ge9d806836/GenomeAnalysisTK.jar -T HaplotypeCaller -R cr.fa -I 10_dedup_reads.bam -o testgatk3.raw.variants.vcf

gatk4.0.0

/gatk-4.0.0.0/gatk HaplotypeCaller -R cr.fa -I 10_dedup_reads.bam -O 10.g.vcf.gz

gatkspark

/gatk-4.0.0.0/gatk HaplotypeCallerSpark -R cr.2bit -I 10_dedup_reads.bam -O 10.g.vcf.gz
And I am sure that the IO ,the cpus,and the memory are not reach the limit, so did I do something wrong ? Thanks a lot for reading or replying my quesion!!!

Issue · Github
by Sheila

Issue Number
2880
State
closed
Last Updated
Assignee
Array
Closed By
vdauwera

Answers

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    The default native core request for pairHMM lib is 4 for GATK4.0 but 1 for 3.8. Can you check the speed by changing the native core request for GATK4 to 1 and try again? At best the difference will be marginal but also take heed that some of the optimizations made for 3.8 are no longer there for gatk4 for a better reason I believe. Still I am holding onto my legacy scripts with 3.8 just to be sure about what I do is consistent.

  • sacubasacuba Member

    @SkyWarrior said:
    The default native core request for pairHMM lib is 4 for GATK4.0 but 1 for 3.8. Can you check the speed by changing the native core request for GATK4 to 1 and try again? At best the difference will be marginal but also take heed that some of the optimizations made for 3.8 are no longer there for gatk4 for a better reason I believe. Still I am holding onto my legacy scripts with 3.8 just to be sure about what I do is consistent.

    thank you for your reply,i did some test by adding pairHMM attribute ,but the difference is marginal as you sad . I think since gatk4 is released ,it should be the right and stable,and more quickly as what it sad.

  • SheilaSheila Broad InstituteMember, Broadie admin

    @sacuba
    Hi,

    I will have someone else from the team get back to you.

    -Sheila

  • Hi Scheila,

    could you please point me to this team member? since I am interested in running GATK-4.0 with Spark tools, and as @sacuba, I have noticed that the gatkspark is slower.

    Thanks

  • SheilaSheila Broad InstituteMember, Broadie admin

    @SergioBenvenuti
    Hi Giuseppe,

    Geraldine @Geraldine_VdAuwera should get back to you all soon.

    -Sheila

  • Hi Sheila,

    thanks a lot! Looking forward to it!

    Giuseppe

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @sacuba and @SergioBenvenuti , sorry for the delayed response.

    For the *Spark tools, be aware that you need to specify spark arguments in order to get the spark parallelism to kick in; see this doc for more info.

    For the rest, between 3.8 and 4.0, we're getting scattered reports from people who are not seeing much difference between them, with quite a bit of variation depending on the hardware they use. It seems there's some variability from run to run as well -- did you average over multiple runs or do single runs only?

  • Hi @Geraldine_VdAuwera,

    thank you for the answer.
    You are right, and in fact when I run a GATK Spark tool I pay attention to specifying the spark arguments, using e.g a similar following command line:

    ./gatk BaseRecalibratorSpark \
            --input  inputfile.bam \
            --reference referencefile.2bit \
            --known-sites file.vcf \
            --intervals intervalfile.bed \
            --output recalibration_spark.table \
            -- --spark-runner SPARK --spark-master spark://${MASTER} \
            --driver-memory 80g \
            --num-executors 16 \
            --executor-memory 20g
    

    Please, see also my post here GATK - 4.0.0.0 [BaseRecalibratorSpark low performance].

    Finally, my performance results are based on a series of runs made with the aforementioned Spark tool, all them giving always long computation times.

    Kind regards,
    Giuseppe

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Ah, got it. Looks like some of what you're observing may be a known bug that is getting fixed as we speak. In general the Spark tools are still extremely new so you can expect some instability there; as always getting feedback like yours is very important so please do continue to let us know how they behave in your hands.Thanks!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I should add -- we have some important updates coming soon to a subset of the Spark tools, and as part of that we're going to make some benchmarks available to help set expectations more clearly, since that has been a sticking point of late.

  • Dear Geraldine,

    thank you for the answer and my apologies for the very late reply.
    Looking forward to seeing your benchmarks!

    Best,
    Giuseppe

Sign In or Register to comment.