When you are facing memory issues, the work around is to reduce the number of tasks per node but you will need more node.
You will gain not only more memory per task but also more memory bandwidth, therfore your application should run a bit faster. If your application makes use of OpenMP threads, you might try to use less tasks per node and use accordingly the option --ntasks-per-node --ntasks-per-socket= within your batch script and srun command and set OMP_NUM_THREADS accordingly, in order to distribute them evenly across the node. Perform some small runs to check the performance gain.
On Shaheen, the memory per node is 128 GB on 6168 nodes, while only 4 nodes have 256 GB of RAM. Neser has 2 nodes with 768GB, 5 nodes with 256GB and 12 nodes with 192GB.