Running Interactive Jobs on Eagle
Find instructions and examples on how to use Slurm commands to request interactive jobs. Interactive jobs provide a shell prompt, which allows users to execute commands and scripts as they would on the login nodes.
Login nodes are primarily intended to be used for logging in, editing scripts, and submitting batch jobs. Interactive work that involves substantial resources—either memory, CPU cycles, or file system I/O—should be performed on the compute nodes rather than on login nodes.
Interactive jobs may be submitted to any partition and are subject to the same time and node limits as non-interactive jobs.
Requesting Interactive Access
The srun command is used to start an interactive session on one or more compute nodes. When resources become available, interactive access is provided by a shell prompt. The user may then work interactively on the node for the time specified.
The job is held until the scheduler can allocate a node to you. You will see a series message such as:
$ srun --time=30 --account=<handle> --ntasks=40 --pty $SHELL
salloc: Granted job allocation 512998
srun: Step created for job 512998
This indicates that node r2i2n5 was allocated to the user.
You can see which nodes are assigned to your interactive jobs using one of these methods:
$ echo $SLURM_NODELIST
$ scontrol show hostname
When requesting multiple nodes using srun use number of --ntasks (or -n) needed instead of number of --nodes (or -N) to avoid mpi mapping issues.
You do NOT need to ssh to the node after it is assigned, but if you requested more than one node, you may ssh to any of the nodes assigned to your job.
You may load modules, run applications, start GUIs, etc., and the commands will execute on that node instead of on the login node.
Type exit when finished using the node.
Interactive jobs are useful for many tasks. For example, to debug a job script, users may submit a request to get a set of nodes for interactive use. When the job starts, the user "lands" on a compute node, with a shell prompt. Users may then run the script to be debugged many times without having to wait in the queue multiple times.
A debug reservation allows a single node to be available with shorter wait times when the system is heavily utilized. This is accomplished by limiting the number of tasks to 36 per job allocation and specifying --partition=debug. For example:
[user@el1 ~]$ srun --time=30 --account=<handle> --partition=debug --ntasks=36 --pty $SHELL
A debug node will only be available for a maximum wall time of 24 hours.
Sample Interactive Job Commands
The following command requests interactive access to one node with at least 150 GB RAM for 20 minutes:
$ srun --time=20 --account=<handle> --ntasks=36 --mem=150G --pty $SHELL
If your single-node job needs a GUI that uses X-windows:
$ ssh -Y eagle.hpc.nrel.gov
$ srun --time=20 --account=<handle> --ntasks=36 --pty --x11 $SHELL
If your multi-node job needs a GUI that uses X-windows, the least fragile mechanism is to acquire nodes as above, then in a separate session set up X11 forwarding:
$ srun --time=20 --account=<handle> --ntasks=<# tasks, more than 36> --pty $SHELL
[user@r3i5n13 ~]$ (your compute node r3i5n13)
Then from your local workstation:
$ ssh -Y eagle.hpc.nrel.gov
[user@ed1 ~]$ (a login node) ssh -Y r3i5n13
[user@r3i5n13 ~]$ (your compute node r3i5n13, now X11-capable)
[user@r3i5n13 ~]$ xterm (or another X11 GUI application)