How to estimate your memory usage per node
The command time in the Unix operating system is used to determine the duration of execution of a particular command. For example:
@cdl1:~> /usr/bin/time ls real 0m0.017s user 0m0.004s sys 0m0.004s
By calling time with the --verbose option, it also provides you with additional useful information about memory consumption, bytes dumped to or read from the filesystem or exchanged through sockets. For example:
@cdl1:~>/usr/bin/time --verbose ls -r <output of 'ls' command> Command being timed: "ls -r" User time (seconds): 0.01 System time (seconds): 0.01 Percent of CPU this job got: 61% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.01 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 4912 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 375 Voluntary context switches: 1 Involuntary context switches: 1150 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0
You may want to apply this command on a job running in parallel on shaheen. As time outputs on the standard error stream, you just to tell srun to create as many error files as tasks running in order to differentiate the timing observed on each task. For example:
srun --error=job.%t.err -n 4 /usr/bin/time --verbose my_program
will produce 4 error files (job.0.err, job.1.err, job.2.err, and job.3.err), each containing the output of the time command related to each one of the 4 parallel tasks.