We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

HaplotypeCaller generates diff results on different CPUs

ericmericm Member
edited August 2016 in Ask the GATK team

I encountered an interesting problem: running HaplotypeCaller on different machines generates different result for 2 sites. The machines use the same input BAM file, same reference, same java (jdk1.8.0_73) and same GATK jar (3.5.0).

The reference vcf was generated on a server with a Xeon CPU E5-2650 v3 (result A). On another srever with a Xeon CPU E5-2670 CPU (result B ), the vcf has 2 diff sites:

diff B A

< chr6  84563147        .       G       A,<NON_REF>     410.77  .       BaseQRankSum=-0.220;ClippingRankSum=0.641;DP=52;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=0.488;RAW_MQ=187200.00;ReadPosRankSum=0.603       GT:AD:DP:GQ:PL:SB       0/1:30,21,0:51:99:439,0,701,529,764,1293:15,15,11,10
> chr6  84563147        .       G       A,<NON_REF>     409.77  .       BaseQRankSum=-0.220;ClippingRankSum=0.641;DP=52;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=0.488;RAW_MQ=187200.00;ReadPosRankSum=0.603       GT:AD:DP:GQ:PL:SB       0/1:30,21,0:51:99:438,0,701,529,764,1293:15,15,11,10
< chr13 113496566       .       C       G,<NON_REF>     808.77  .       BaseQRankSum=0.483;ClippingRankSum=0.332;DP=88;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=-1.041;RAW_MQ=316800.00;ReadPosRankSum=1.290       GT:AD:DP:GQ:PL:SB       0/1:48,37,0:85:99:837,0,1129,981,1240,2220:28,20,24,13
> chr13 113496566       .       C       G,<NON_REF>     808.77  .       BaseQRankSum=0.483;ClippingRankSum=0.332;DP=88;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=-1.041;RAW_MQ=316800.00;ReadPosRankSum=1.290       GT:AD:DP:GQ:PL:SB       0/1:48,37,0:85:99:837,0,1129,981,1239,2220:28,20,24,13

Re-running the same task on the E5-2670 server generated the same results.

I double checked this by running on another servers

Xeon E5-2670 -> same as B.
Xeon E5-2683 v3 -> same as A.
Core i3-4160 -> same as A.

Why is E5-2670 so special for the results? Does GATK use some CPU specific features that lead to these diff?

Thanks in advance.

Best Answer


  • ericmericm Member
    edited August 2016

    Update: I tried another additional server

    Intel(R) Xeon(R) CPU X5560 -> same as B

    I am guessing the AVX instruction set in different CPU architectures might lead to some differences as GATK is using AVX via the vectorized implementation of PairHMM. But it's just a guess and I am not sure. Any comments or similar experience will be welcome.

  • What kind of differences do you see? I only see some rounding errors for the likelihood values (PL). I don't think this will have any effect, unless you have very very little data. You can also see this in the GQ field, which is not different between different CPU's.

    In general, computers cannot represent an infinite amount of precision for floats, so it is very common to see minor differences in floating point calculations between different computers.

  • ericmericm Member

    Thanks Redmar and Geraldine for the comments. Yes, theses diffs are tiny and likely rounding errors. And it is true they are a only a very small portion. I am just trying to understand the difference a little bit and their sources.

    The hardware independent zero acceleration sounds a good way to verify it. I will find a time to give it a try.

  • LeonorPalmeiraLeonorPalmeira LiegeMember

    Thanks for pointing this out! I experience this exact same behaviour and it's a problem for me, as I am setting up a pipeline where reproducibility is paramount.
    I am running on a cluster with very similar nodes, but indeed different versions of the same CPU :

    Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
    Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz

    Where can I find the "zero acceleration" option? I suspect it's a HaplotypeCaller option for the pairHMM step, but I cannot find it anywhere.

  • valentinvalentin Cambridge, MAMember, Dev ✭✭
    edited March 2017

    @LeonorPalmeira "-pairHMM LOGLESS_CACHING -noFpga" should do.
    Also notice that you might well get into reproducibility problems if you use single process multi-thread parallelism (i.e. -nt N where N > 1) regardless of what pair HMM implementation you are using.

  • LeonorPalmeiraLeonorPalmeira LiegeMember

    @valentin That is perfect! I don't use multi-thread parallelism and my results are now perfectly reproducible across CPU architecture.

  • ericmericm Member

    Hi @LeonorPalmeira , great to know that. I haven't got a chance to do this verification. Thanks for the sharing.

    BTW: did you measure the performance/speed? I am curious how much difference in performance caused by the pair HMM implementation differences.

  • LeonorPalmeiraLeonorPalmeira LiegeMember
    edited March 2017

    Hi @ericm, yes, I do have some measurements of speed and will share them here.

    • the AVX implementation takes a bit less than 30 minutes
    • the noAVX implementation takes 40 minutes

    Please note that I have parallelized everything by chromosome : feeding a .bed per chromosome (this is Exomes) to HaplotypeCaller and then merging all the per-chromosome .gvcf files back into one.
    In my parallelized setting, I don't see much difference between both implementations, so I will go for reproducibility and keep the noAVX implementation.

  • ericmericm Member

    @LeonorPalmeira said:
    Hi @ericm, yes, I do have some measurements of speed and will share them here.

    Thanks for the sharing. The performance difference is much smaller than I expected.

    One note about your way of split->merge: the result will be different from the one from calling HC on all chromosome together, although the difference is not much. The reason is likely caused by the central random number generator used by HC in GATK. I saw a similar problem in http://gatkforums.broadinstitute.org/gatk/discussion/comment/32906/#Comment_32906 .

Sign In or Register to comment.