Impact on GATK performance due to Vulnerabilities - Meltdown and Spectre ?

Dear GATK Team,

I developed an exome pipeline with standard GATK Best Practices guidelines (BWA-MEM align + Realign + BQSR + HaplotypeCaller). Our pipeline implements GATK 3.7 and it runs on a local Linux Server(HP) that runs "Red Hat Enterprise Linux 7.3"

I was recently informed by our organizational IT staff about the recent news on a hardware-memory vulnerability called "Meltdown & Spectre" that affects servers containing Intel microprocessors. Red Hat team has identified this as an industry-wide issue, and has classified this threat as an "Important Impact" and hence strongly recommends to apply patch release of their BIOS and OS patches.

RedHat has indicated that this patch would possibility cause a performance degrade (up to 30%), hence I am wondering if Broad is encountering this issue ? I am wondering if you have evaluated how much of an impact does this patch upgrade causes on performance of GATK ?

Our IT team wants to apply the following patch

https://access.redhat.com/security/vulnerabilities/speculativeexecution?sc_cid=701f2000000tsLNAAY&;

The 3 vulnerabilities are : CVE-2017-5754, CVE-2017-5753, & CVE-2017-5715

More information about these vulnerabilities are here :

https://www.wired.com/story/meltdown-and-spectre-patches-take-toll/

https://www.networkworld.com/article/3245750/linux/red-hat-responds-to-the-intel-processor-flaw.html

Thanks,
mglclinical

Issue · Github
by Sheila

Issue Number
2854
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
vdauwera

Best Answer

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    Accepted Answer

    Hi @mglclinical,

    We are aware of this issue, though we have not yet fully evaluated its impact on the runtime of our pipelines. Our IT staff are following security recommendations to patch all affected machines in our datacenter, and I believe Google has been rolling out patches to their cloud datacenters (which we use for production processing). So we should get a clearer picture in the near future. I'll see if we can share our findings at that time.

    Others in this space have shared their own preliminary evaluations; this DNAnexus blogpost suggests that the impact on GATK pipelines is on the lower end of the spectrum in terms of performance degradation. So that's encouraging, though it's difficult to tell how closely this transfers to other platforms since different pipeline implementations and hardware configurations may be affected to different degrees.

Answers

  • SkyWarriorSkyWarrior TurkeyMember
    edited January 15

    It is 5 to 10 percent impact here and there but not much for single sample. Impact grows higher (up to 30%) if you are processing multiple samples simultaneously. The major impact comes from the IO ops. The more non-volatile storage you use for intermediate tasks the more you are hit. My NVMe disk read throughput is affected mostly where I keep the known vcfs and reference files. Spinners don't see much of an impact. But I suggest you apply the patches unless the server is totally isolated from the rest of the world (a.k.a. offline).

  • @SkyWarrior , thank you for your sharing your observation on this. I process 1 sample at a time, but I use all the cores simultaneously on my server at the haplotypecaller step, i.e. by splitting bam file into individual chromosome specific bam files and then create multiple threads.

    I am not sure if how much of non-volatile storage/memory our server has, I need to find that from our IT team.

  • mglclinicalmglclinical USAMember
    edited January 16

    I am eager to hear Broad Institute's opinion/observation in this issue.

    Post edited by mglclinical on
  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @mglclinical
    Hi mglclinical,

    I am checking with the team and will get back to you.

    -Sheila

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    Accepted Answer

    Hi @mglclinical,

    We are aware of this issue, though we have not yet fully evaluated its impact on the runtime of our pipelines. Our IT staff are following security recommendations to patch all affected machines in our datacenter, and I believe Google has been rolling out patches to their cloud datacenters (which we use for production processing). So we should get a clearer picture in the near future. I'll see if we can share our findings at that time.

    Others in this space have shared their own preliminary evaluations; this DNAnexus blogpost suggests that the impact on GATK pipelines is on the lower end of the spectrum in terms of performance degradation. So that's encouraging, though it's difficult to tell how closely this transfers to other platforms since different pipeline implementations and hardware configurations may be affected to different degrees.

  • SkyWarriorSkyWarrior TurkeyMember

    Wow very reproducible. I have seen the same impact here with GATK3 HC workflow.

  • @Geraldine_VdAuwera ,

    Thank you for the reference to NDAnexus blogpost and thank you for your answer

Sign In or Register to comment.