Performance: Difference between revisions

@@ Line 169: / Line 169: @@
 </pre>
-The scripts could be modified to use <tt>[https://github.com/RRZE-HPC/likwid/wiki likwid]</tt> instead of <tt>numactl</tt> which would allow for better control of affinity groups. Alternatively, one may use <tt>taskset</tt> or <tt>KMP_AFFINITY</tt>.
+As an alternative to <tt>numactl</tt>, one may use <tt>taskset</tt> or <tt>KMP_AFFINITY</tt>.
+If <tt>[https://github.com/RRZE-HPC/likwid/wiki likwid]</tt> would be used instead of <tt>numactl</tt> one could have much better control of affinity groups.
+In my tests on a 4-socket machine, the difference between runs with the original scripts and the NUMA-aware ones was a reduction of wallclock time by about 8%. With a 2-socket machine, I saw a <1% effect. But this will depend very much on the specific hardware.
+== Multi-socket machines in a cluster ==
+In that case, I'd suggest to modify e.g. <tt>forkcolspot_cluster</tt> to not run <tt>mcolspot_par</tt> directly on the remote machine, but rather to run a script on that machine that checks the number of nodes, and runs <tt>mcolspot_par</tt> on the right node.