Performance: Difference between revisions

no edit summary
No edit summary
Line 169: Line 169:
</pre>
</pre>


The scripts could be modified to use <tt>[https://github.com/RRZE-HPC/likwid/wiki likwid]</tt> instead of <tt>numactl</tt> which would allow for better control of affinity groups. Alternatively, one may use <tt>taskset</tt> or <tt>KMP_AFFINITY</tt>.
As an alternative to <tt>numactl</tt>, one may use <tt>taskset</tt> or <tt>KMP_AFFINITY</tt>.
 
If <tt>[https://github.com/RRZE-HPC/likwid/wiki likwid]</tt> would be used instead of <tt>numactl</tt> one could have much better control of affinity groups.
 
In my tests on a 4-socket machine, the difference between runs with the original scripts and the NUMA-aware ones was a reduction of wallclock time by about 8%. With a 2-socket machine, I saw a <1% effect. But this will depend very much on the specific hardware.
 
 
== Multi-socket machines in a cluster ==
 
In that case, I'd suggest to modify e.g. <tt>forkcolspot_cluster</tt> to not run <tt>mcolspot_par</tt> directly on the remote machine, but rather to run a script on that machine that checks the number of nodes, and runs <tt>mcolspot_par</tt> on the right node.
2,652

edits