Why is my job not running?

When the estimated start time of your pending job is not available, you can get more details and reasons for your job not running:

By typing squeue --job <jobid >–l , you will get the following output along with the reason for your job not running.

           JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)
           110000   workq 8-tuned_   user1  PENDING       0:00 3-00:00:00      1 (AssocGrpCPUMinutesLimit)

 

Here are the most common reasons. These codes identify the reason that a job is waiting for execution.  A job may be waiting for more than one reason, in which case only one of those reasons is displayed.        

 

AssocGrpCPUMinutesLimit 

This job is waiting for a dependent job to complete.

Cleaning

The job is being requeued and still cleaning up from its previous execution.        

Dependency

This job is waiting for a dependent job to complete.

JobHeldAdmin

The job is held by a system administrator

JobHeldUser          

The job is held by the user

NodeDown

A node required by the job is down.

Priority

One or more higher priority jobs exist for this partition or advanced reservation. Other jobs in the queue have higher priority than yours.

QOSGrpNodeLimit
The maximum number of nodes available to the partition are in use.

QOSUsageThreshold

Required QOS threshold has been breached

ReqNodeNotAvail

No nodes can be found satisfying your limits, for instance because maintenance is scheduled and the job can not finish before it

Reservation

The job is waiting for its advanced reservation to become available.

Resources

The job is waiting for resources (nodes) to become available and will run when Slurm finds enough free nodes.

SystemFailure

Failure of the SLURM system, a file system, the network, etc.

Tags: