Best Practices for I/O

Efficient I/O is critical for performance with data-intensive applications since parallel file systems are usually a substantial bottleneck on HPC systems.

Here are some simple guidelines that can be used for almost any type of I/O on KSL HPC systems:

  • Avoid frequently opening and closing the same file during code execution.
  • Avoid creating directory hierarchies with thousands of files which causes significant overhead.
  • Aggregate small amounts of data into larger reads and writes.
  • Avoid using ASCII representations of your data since they usually require much more space to store, and require conversion to/from binary when reading/writing.
  • Don’t re-invent the wheel. Using Parallel I/O libraries such as MPI-IO, HDF5 and netCDF can help you to parallelise, aggregate and efficiently manage I/O operations. HDF5 and netCDF use binary file formats that support complex data models and provide portability across multiple systems.