Skip to main content

Running Interactive Jobs on Eagle

Find instructions and examples on how to use Slurm commands to request interactive jobs. Interactive jobs provide a shell prompt, which allows users to execute commands and scripts as they would on the login nodes. 

Login nodes are primarily intended to be used for logging in, editing scripts, and submitting batch jobs. Interactive work that involves substantial resources—either memory, CPU cycles, or file system I/O—should be performed on the compute nodes rather than on login nodes.

Interactive jobs may be submitted to any partition and are subject to the same time and node limits as non-interactive jobs. 

Requesting Interactive Access

The srun command is used to start an interactive session on one or more compute nodes. When resources become available, interactive access is provided by a shell prompt. The user may then work interactively on the node for the time specified.

The job is held until the scheduler can allocate a node to you. You will see a series message such as:

$ srun --time=30 --account=<handle> --ntasks=40 --pty $SHELL
salloc: Granted job allocation 512998
srun: Step created for job 512998
[user@r2i2n5 ~]$

This indicates that node r2i2n5 was allocated to the user.

You can see which nodes are assigned to your interactive jobs using one of these methods:

$ echo $SLURM_NODELIST
r2i2n[5-6]
$ scontrol show hostname
r2i2n5
r2i2n6

When requesting multiple nodes using srun use number of --ntasks (or -n) needed instead of number of --nodes (or -N) to avoid mpi mapping issues.

You do NOT need to ssh to the node after it is assigned, but if you requested more than one node, you may ssh to any of the nodes assigned to your job.

You may load modules, run applications, start GUIs, etc., and the commands will execute on that node instead of on the login node.

Type exit when finished using the node.

Interactive jobs are useful for many tasks. For example, to debug a job script, users may submit a request to get a set of nodes for interactive use. When the job starts, the user "lands" on a compute node, with a shell prompt. Users may then run the script to be debugged many times without having to wait in the queue multiple times.

A debug reservation allows a single node to be available with shorter wait times when the system is heavily utilized. This is accomplished by limiting the number of tasks to 36 per job allocation and specifying --partition=debug. For example:

[user@el1 ~]$ srun --time=30 --account=<handle> --partition=debug --ntasks=36 --pty $SHELL

A debug node will only be available for a maximum wall time of 24 hours.

Sample Interactive Job Commands

The following command requests interactive access to one node with at least 150 GB RAM for 20 minutes:

$ srun --time=20 --account=<handle> --ntasks=36 --mem=150G --pty $SHELL

If your single-node job needs a GUI that uses X-windows:

$ ssh -Y eagle.hpc.nrel.gov
...
$ srun --time=20 --account=<handle> --ntasks=36 --pty --x11 $SHELL

If your multi-node job needs a GUI that uses X-windows, the least fragile mechanism is to acquire nodes as above, then in a separate session set up X11 forwarding:

$ srun --time=20 --account=<handle> --ntasks=<# tasks, more than 36> --pty $SHELL
...
[user@r3i5n13 ~]$ (your compute node r3i5n13)

Then from your local workstation:

$ ssh -Y eagle.hpc.nrel.gov
...
[user@ed1 ~]$ (a login node) ssh -Y r3i5n13
...
[user@r3i5n13 ~]$ (your compute node r3i5n13, now X11-capable)
[user@r3i5n13 ~]$ xterm (or another X11 GUI application)

Requesting Interactive GPU Nodes

The following command requests interactive access to GPU nodes

[hpc_user@el2 ~] $ salloc -A hpc_handle -t 5 --gres=gpu:2
salloc: Pending job allocation 1568945
salloc: job 1568945 queued and waiting for resources
salloc: job 1568945 has been allocated resources
salloc: Granted job allocation 1568945
srun: Step created for job 1568945


This second srun command inside the interactive session gives you access to the GPU devices

[hpc_user@r104u33 ~] $ srun -A hpc_handle -t 5 --gres=gpu:2 nvidia-smi
Mon Oct 21 09:03:29 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.72 Driver Version: 410.72 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE... Off | 00000000:37:00.0 Off | 0 |
| N/A 41C P0 38W / 250W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-PCIE... Off | 00000000:86:00.0 Off | 0 |
| N/A 40C P0 36W / 250W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+