Skip to main content

Using Anaconda Python on the Eagle System

Anaconda Python is our actively supported distribution on the Eagle system.

To use Anaconda, run module load conda. The base package is built on Python 2.7, but users can create custom environments with other versions including Python 3+. In all cases, the numerically intensive routines in numpy and scipy are MKL-enabled for speed.

Custom Environments

Custom environments are available for users to install libraries. Create environments in a new directory with write privileges using the -p PATH option. Do not use the --clone=root option when creating custom environments, it was not designed for use in a shared computing environment. The following example creates a new Python 2 environment with numpy, scipy, and pandas:

[user@HOSTNAME ~]$ module purge; module load conda
[user@HOSTNAME ~]$ conda create -n exampleenv python=2 numpy scipy pandas
Fetching package metadata .......
Solving package specifications: ..........
<SNIP>Extracting packages ... [ COMPLETE ]|###################################| 100%
Linking packages ... [ COMPLETE ]|###################################| 100%#
#
To activate this environment, use:
# $ source activate exampleenv
#
# To deactivate this environment, use:
# $ source deactivate
#
[user@HOSTNAME ~]$ source activate exampleenv
(exampleenv) [user@HOSTNAME ~]$ type python
python is /home/user/.conda-envs/exampleenv/bin/python

Some Common Commands and Options

  • conda create --name <envname> python=3 numpy: Create a custom Python 3 environment named <envname> and install numpy and its dependencies
  • . activate <envname> : Activate environment <envname>. "." is just Bash shorthand for "source".
  • conda search <package> : Look for a package in the conda repo that isn't installed yet.
  • conda install --name <envname> <package> : Install package <package> into custom environment <envname>. The --name <envname> option isn't required if that environment is already your active one.
  • conda list [<package>] : List installed packages, or a particular package that's installed.
  • conda [install | search] -c <channel> <package> : Specify an alternative channel. For example,
    • conda search -c conda-forge qutip (Package not in base repo)
    • conda search -c r r-bivgeo (Specialized repository for lesser known R packages)
  • conda info --envs : See a list of your accessible environments
  • conda remove --name <envname> --all : Delete custom environment <envname>

Using Environment.yml Files

Anaconda supports creating environments from a yml-formatted file. This allows the same environment to be used wherever the dependent code is run. For example an environment.yml file can be created on the developer's laptop and used on the laptop and the HPC to create the environment that will allow the code to run. The conda module has its own environment.yml that is used to create the root environment. This file can be downloaded and modified to produce the environment needed for custom code.

If you require something not available in the default environment and are unable to create a custom environment, contact us to determine if the default environment can be adapted to your needs.

Example Environment Using High-Performance openmpi

In this example, a new conda environment containing numpy, mpi4py and high-performance openmpi built with infiniband support. The first create is done from a single conda create command, while the second utilizes an environment.yml file along with conda env create. $ represents the user's shell prompt, <snip> represents removed unimportant output, everything else is output from the command:

$ conda create -p /scratch/$USER/conda/myhpompi -c local python=2 mpi4py numpy
Solving environment: done

## Package Plan ##

  environment location: /scratch/hsorense/conda/myhpompi

  added / updated specs:
    - mpi4py
    - numpy
    - python=2


The following packages will be downloaded:
<snip>

Proceed ([y]/n)? y

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
# $ conda activate /scratch/$USER/conda/myhpompi
#
# To deactivate an active environment, use
#
# $ conda deactivate

environment.yml:

channels:
- local
- defaults
dependencies:
- numpy
- openmpi
- mpi4py
- git

Creating from environment.yml:

$ conda env create -p /scratch/$USER/conda/myhpompi
Solving environment: done
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
# $ conda activate /scratch/$USER/conda/myhpompi
#
# To deactivate an active environment, use
#
# $ conda deactivate

Testing high-performance openmpi:

$ . activate /scratch/$USER/conda/myhpompi
(/scratch/$USER/conda/myhpompi) $ git clone git@github.nrel.gov:hsorense/mpi-hpc.git
Cloning into 'mpi-hpc'...
remote: Counting objects: 47, done.
remote: Compressing objects: 100% (40/40), done.
remote: Total 47 (delta 12), reused 24 (delta 5), pack-reused 0
Receiving objects: 100% (47/47), 6.70 KiB | 3.35 MiB/s, done.
Resolving deltas: 100% (12/12), done.
(/scratch/$USER/conda/myhpompi) $ cd mpi-hpc/benchmarks
(/scratch/$USER/conda/myhpompi) $ salloc -N 2 -t 30 mpirun -np 2 -npernode 1 --report-bindings python sendrecv.py
salloc: Granted job allocation 527422
[r1i0n14:250615] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././.][./././././././././././././././././.]
[r1i0n15:250507] MCW rank 1 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././.][./././././././././././././././././.]
#---------------------------------------------------
# Benchmarking Sendrecv
# #processes = 2
#---------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec
            0         1000         2.14         3.28         2.53         0.00
            1         1000         2.15         2.16         2.15         0.93
            2         1000         2.13         2.15         2.14         1.87
            4         1000         2.13         2.14         2.14         3.74
            8         1000         2.23         2.25         2.24         7.15
           16         1000         2.25         2.25         2.25        14.23
           32         1000         2.24         2.26         2.25        28.49
           64         1000         2.38         2.39         2.39        53.64
          128         1000         2.90         2.91         2.91        87.94
          256         1000         2.99         3.02         3.00       170.48
          512         1000         3.12         3.14         3.13       327.29
         1024         1000         3.39         3.41         3.40       601.99
         2048         1000         3.83         3.84         3.83      1068.68
         4096         1000         4.87         4.89         4.88      1679.05
         8192         1000         6.35         6.36         6.35      2578.23
        16384         1000         9.42         9.48         9.44      3470.23
        32768         1000        11.67        11.95        11.80      5552.93
        65536          640        15.16        15.30        15.24      8598.36
       131072          320        23.76        23.94        23.83     10999.60
       262144          160        38.01        38.96        38.33     13677.31
       524288           80        64.46        65.67        65.01     16130.67
      1048576           40       111.10       120.06       115.10     18220.05
      2097152           20       210.58       243.21       227.87     18406.46
      4194304           10       473.91       505.02       493.01     17015.03
salloc: Relinquishing job allocation 527422

Python Interactive Shell /Jupyter

Note: Since the CentOS 7 transition, it has become necessary to unset a Jupyter-related variable by hand in order to get the workflow below to succeed. We are working to resolve this, but until we can, please add the following instruction to any job scripts using Jupyter, or interactive workflows before launching Jupyter:

[user@HOSTNAME ~]$ unset XDG_RUNTIME_DIR

Jupyter is supported in the default conda environment. However the HPC firewall prohibits direct connections to the server due to security concerns. An ssh pipe is required to access the notebook server.

  1. Request a compute node as an interactive shell and start the notebook server:
[user@HOSTNAME ~]$ salloc -N 1 -t 60
[user@NODENAME ~]$ module purge; module load conda
[user@NODENAME ~]$ jupyter notebook --no-browser --ip=* (may require --ip=0.0.0.0)
[I 10:00:14.868 NotebookApp] Serving notebooks from local directory: /home/user
[I 10:00:14.868 NotebookApp] 0 active kernels
[I 10:00:14.869 NotebookApp] The Jupyter Notebook is running at: http://127.0.0.1:8888/?token=somerandomstring
[I 10:00:14.869 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 10:00:14.870 NotebookApp]

   Copy/paste this URL into your browser when you connect for the first time,    to login with a token:
        http://127.0.0.1:8888/?token=somerandomstring
  1. On your local machine start a ssh tunnel:
laptop:~ user$ ssh -L 8888:NODENAME:8888 HOSTNAME
[user@HOSTNAME ~]$

* Replace the HOSTNAME with the login node and NODENAME with compute node

   e.g., HOSTNAME=ed1.hpc.nrel.gov and NODENAME=r4i2n27

        

  1. On your local machine start a web browser and go to http://127.0.0.1:8888/?token=somerandomstring, using the token as shown in the notebook startup

See quick reference sheet for iPython.

Python Debugging

Knowing how to use the Python pdb / pdb debugging module can speed development.

See quick reference sheet for pdb.