2,684
edits
(→Xeon Phi (Knights Landing, KNL): updates) |
|||
Line 48: | Line 48: | ||
=== Xeon Phi (Knights Landing, KNL) === | === Xeon Phi (Knights Landing, KNL) === | ||
The benchmark was run on a single KNL7210 processor (256 cores) set to quadrant mode and using the MCDRAM as cache. XDS was compiled with the -xMIC-AVX512 option of ifort. This gives | The benchmark was run on a single KNL7210 processor (256 cores) set to quadrant mode and using the MCDRAM as cache. The environment variable OMP_PROC_BIND was set to false (if this is not done, the scheduler seems to put all threads on one core). XDS was compiled with the -xMIC-AVX512 option of ifort. This gives | ||
COLSPOT: elapsed wall-clock time 48.3 sec | COLSPOT: elapsed wall-clock time 48.3 sec | ||
INTEGRATE: total elapsed wall-clock time 61.2 sec | INTEGRATE: total elapsed wall-clock time 61.2 sec | ||
Line 65: | Line 65: | ||
COLSPOT: elapsed wall-clock time 40.0 sec | COLSPOT: elapsed wall-clock time 40.0 sec | ||
INTEGRATE: total elapsed wall-clock time 51.3 sec | INTEGRATE: total elapsed wall-clock time 51.3 sec | ||
This was running with a 8GB/8GB split MCDRAM. The same run, but with 8 | This was running with a 8GB/8GB split MCDRAM. The same run, but with 8 JOBS and 32 PROCESSORS, takes | ||
INIT.LP: elapsed wall-clock time 25.3 sec | INIT.LP: elapsed wall-clock time 25.3 sec | ||
COLSPOT: elapsed wall-clock time 40.1 sec | COLSPOT: elapsed wall-clock time 40.1 sec | ||
INTEGRATE: total elapsed wall-clock time 53.1 sec | INTEGRATE: total elapsed wall-clock time 53.1 sec | ||
Conclusion: since INIT benefits from more PROCESSORs, one could run XDS twice for fastest turnaround; the first run with JOBS=XYCORR INIT and a high number of processors (99 is maximum). The second run with JOB=COLSPOT IDXREF DEFPIX INTEGRATE CORRECT, and an optimized JOBS/PROCESSORS combination. | |||
== Troubleshooting == | == Troubleshooting == |