Best Practices for I/O
Efficient I/O is critical for performance with data-intensive applications since parallel file systems are usually a substantial bottleneck on HPC systems.
Here are some simple guidelines that can be used for almost any type of I/O on KSL HPC systems:
- Avoid frequently opening and closing the same file during code execution.
- Avoid creating directory hierarchies with thousands of files which causes significant overhead.
- Aggregate small amounts of data into larger reads and writes.
- Avoid using ASCII representations of your data since they usually require much more space to store, and require conversion to/from binary when reading/writing.
- Don’t re-invent the wheel. Using Parallel I/O libraries such as MPI-IO, HDF5 and netCDF can help you to parallelise, aggregate and efficiently manage I/O operations. HDF5 and netCDF use binary file formats that support complex data models and provide portability across multiple systems.