Performance

Revision as of 13:21, 19 April 2011 by Kay (talk | contribs) (Created page with "== Considerations == In the order of effect: # XDS scales well (i.e. the wallclock time for data processing goes down when the number of available cores is increased) in the CO...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Considerations

In the order of effect:

  1. XDS scales well (i.e. the wallclock time for data processing goes down when the number of available cores is increased) in the COLSPOT, IDXREF, INTEGRATE and CORRECT steps when using the MAXIMUM_NUMBER_OF_PROCESSORS keyword. This triggers program-level parallelization, using OpenMP threads.
  2. the program scales very well in the COLSPOT and INTEGRATE steps when using the MAXIMUM_NUMBER_OF_JOBS keyword. This triggers a shell-level parallelization.
  3. combining these both keywords gives the highest performance in my experience (see [[1]] for an example). As a rough guide, I'd choose them to be approximately equal; an even number for MAXIMUM_NUMBER_OF_PROCESSORS should be chosen because that fits better with usual hardware.
  4. some overcommitting of resources (i.e. MAXIMUM_NUMBER_OF_PROCESSORS * MAXIMUM_NUMBER_OF_JOBS > number of cores) is beneficial; you'll have to play with these two parameters.
  5. the next thing to consider is DELPHI together with OSCILLATION_RANGE: if DELPHI is an integer multiple of MAXIMUM_NUMBER_OF_PROCESSORS * OSCILLATION_RANGE that would be good because it nicely balances the usage of the threads. For this purpose, you may want to change (raise) the value of DELPHI (default is 5 degrees). If you are doing fine-slicing then mis-balancing of threads is not an issue - but for those users who want to collect 1° frames (which I think is not the best way nowadays ...) it should be a consideration.
  6. performance-wise, I/O also plays a role because as soon as you run 24 or so processes then a single GB ethernet connection may be limiting. OTOH shell-level parallelization smoothes the load.
  7. XDS with the MAXIMUM_NUMBER_OF_JOBS keyword can use several machines. This requires some setup as explained at the bottom of [2].
  8. Hyperthreading (SMT), if available on Intel CPUs, is beneficial. A "virtual" core has only about 20% performance of a "physical" core but it comes at no cost - you just have to switch it on in the BIOS of the machine.
  9. The 64-bit binaries generally are a bit faster than the 32-bit binaries (but that's not specific for XDS).