High Performance Computing

High performance computing can provide a powerful solution when working with incredibly large datasets as they allow you and your collaborators to run scripts and programs over those datasets without facing limitations in your hardware and network speeds. UBC’s Advanced Research Computing provides access to high performance computing via their Sockeye compute cluster along with secure and redundant digital object storage via Chinook. Additionally, the Canada-based organization, Digital Research Alliance, provides compute clusters across Canada, including one located at Simon Fraser University.

UBC Advanced Research Computing Sockeye

Documentation: https://confluence.it.ubc.ca/display/UARC/Using+Sockeye

Supported Software: https://confluence.it.ubc.ca/display/UARC/Software

Note

Linux (Rocky Linux) software can be installed along with conda packages or any software available within an Apptainer container.

Digital Research Alliance Clusters

Documentation: https://docs.alliancecan.ca/wiki/Getting_started

Supported Software: https://docs.alliancecan.ca/wiki/Available_software

Note

Additionally, any software that can be installed within an Apptainer container.

QGIS

QGIS can run on either UBC Sockeye or Alliance clusters. To interact with QGIS through a graphical user interface, you’ll need to run it within an interactive job with X11 forwarding enabled. You can find more information on creating graphical interactive jobs on UBC Sockeye here, while QGIS-specific documentation for Alliance clusters is listed below.

Jupyter

A handful of Alliance clusters provide access to hardware via a JupyterHub instance. While hardware resources are significantly restricted, using Jupyter can provide an easy method for familiarizing oneself with the software and capabilities of an HPC environment. JupyterHub provides an easy-to-follow form for setting up an interactive job on a given cluster, where your Jupyter server will be running from.

Note

Other kernels can be supported using Apptainer containers. See Running a Jupyter Kernel from a Container for more details.

  • Server Resource Options:

    • Time (Session Length): 30 minutes - 5 hours

    • Number of (CPU) Cores: 1 - 8

    • Memory (Total Session Limit): 1000MB - 63000MB

    • (Optional) GPU Configuration: none - 4 x V100L

Note

You can only install packages from Alliance’s maintained set of Python wheels. To run JupyterLab with conda or external libraries held on PYPI or CRAN, you’ll need to run it using a pre-built Apptainer container as documented in the alternative approaches below.

Running a Jupyter Kernel from a Container

Containers enable you to pre-build customized environments from which you can run software that is not supported on the Alliance clusters, and they can be leveraged to extend the Alliance’s JupyterHub clusters by running them as a custom Jupyter kernels. This can be extraordinarily useful considering Conda is not supported on the cluster.

Before using this approach, double-check that the software you need is not already available on the cluster either as modules or Python wheels. Software installed within a container is going to perform less efficiently than equivalent modules on Alliance, which have been optimized for HPC.

Creating the container environment with Micromamba (Conda)

While this can be done within the Alliance login node, it is highly recommended that the containers are built from another Linux environment and copied onto the cluster. This will ensure that the build runs a bit faster and you aren’t competing for resources with other researchers on the login node.

Apptainer will only run on Linux, so Windows and MacOS users will need to install a Linux virtual machine either via Windows Subsystem for Linux (WSL) or Lima to create and run Apptainer containers locally. Apptainer includes instructions in their documentation at this link.

Note

Whenever you start a new Jupyter server on the cluster, you’ll need to load the apptainer module again in order for the kernel to run.

  1. Ensure Apptainer is installed in the Linux environment

    $ apptainer --version

    If Apptainer is not installed, you can use the following steps on Debian or Ubuntu

    $ sudo apt update
    $ sudo apt install -y software-properties-common
    $ sudo add-apt-repository -y ppa:apptainer/ppa
    $ sudo apt update
    $ sudo apt install -y apptainer
  2. Pull the mambaorg/micromamba Docker container from Docker Hub and build a sandbox container from it. This will enable you to use Micromamba, an alternative to Conda and Mamba that runs more efficiently in environments where resources are limited.

    $ apptainer build --sandbox <container_name>/ docker://mambaorg/micromamba:bookworm-slim
  3. Start a Bash shell within the sandbox container with write privileges.

    $ apptainer shell -C --shell /bin/bash --writable <container_name>
  4. Update Micromamba.

    Apptainer> micromamba self-update
  5. Use Micromamba to install the kernel package needed for your preferred programming language (Python: ipykernel; R: r-irkernel) and any other packages you may need.

    Apptainer> micromamba install -y -q -n base -c conda-forge <kernel_package> <other_packages>
  6. Clean up any extra files that are no longer needed

    Apptainer> micromamba clean --all -y
  7. Exit the sandbox container shell

    Apptainer> exit
  8. Build your Apptainer container from the sandbox container. Including a time stamp with <container_name> is recommended when building a container as you may want to rebuild the container with updated packages in the future.

    $ apptainer build <container_name>_<timestamp>.sif <container_name>

If you need flexibility in your container to add and remove packages, you can create a sandbox environment from your uploaded container and modify the sandbox container while accessing the cluster on a login node.

Installing a Micromamba Python-Based Container as a Kernel

  1. Start a server on the JupyterHub cluster.

  2. Start a new terminal.

  3. Load the apptainer module.

    $ module load apptainer
  4. Close the terminal and start a new one.

  5. Install the container as a Python kernel

    $ python -m ipykernel install --user --name <container_name> --display-name="Python (<container_name>)"
  6. The previous step provided an initial configuration for the kernel, but we will need to slightly modify it to ensure the kernel executes from the container.

    $ nano /home/<username>/.local/share/jupyter/kernels/<container_name>/kernel.json

    Modify kernel.json file to match the following:

    kernel.json
    {
      "argv": [
        "apptainer",
        "exec",
        "/home/<username>/<container_name>.sif",
        "micromamba",
        "run",
        "python",
        "-m",
        "ipykernel_launcher",
        "-f",
        "{connection_file}"
      ],
      "display_name": "Python (<container_name>)",
      "language": "python",
      "metadata": {
        "debugger": true
      }
    }
  7. Close the terminal and wait a few seconds for the kernel to register in the JupyterLab launcher.

Installing an R-Based Container as a Kernel

  1. Start a server on the JupyterHub cluster.

  2. Start a new terminal.

  3. Load the apptainer module.

    $ module load apptainer
  4. Close the terminal and start a new one.

  5. Using the nano text editor, create a new directory and file to store the custom Jupyter kernel configuration

    $ nano <container_name>/kernel.json

    Within the kernel.json file, enter the following:

    kernel.json
    {
      "argv": [
        "apptainer",
        "exec",
        "/home/<username>/<container_name>.sif",
        "micromamba",
        "run",
        "R",
        "--slave",
        "-e",
        "IRkernel::main()",
        "--args",
        "{connection_file}"
      ],
      "display_name": "R (<container_name>)",
      "language": "R",
      "metadata": {
        "debugger": true
      }
    }
  6. Install the container as a custom kernel using the kernel.json file.

    $ jupyter kernelspec install <container_name> --user
  7. Close the terminal and wait a few seconds for the kernel to register in the JupyterLab launcher.

RStudio Server

via JupyterHub

The easiest way to start an RStudio session in a HPC environment is to launch it within a JupyterLab session. See the entry above for session limits and documentation on starting a JupyterLab session from a JupyterHub cluster. Once your JupyterLab session is running, select the Software tab in the sidebar and find/load the rstudio-server module. You can then click the RStudio launcher, to open a new RStudio session. While you can install any R library from CRAN, you can’t use package managers like conda and can only use software loaded as modules from the Alliance. If you need more flexibility for your RStudio session, setup and run your environment within an Apptainer container as documented below.

via Container

Note

the linked instructions above cover running RStudio Server on the UBC ARC Sockeye cluster using Apptainer, but the instructions should be extremely similar for running on an Alliance cluster.

  • Instructions for creating an Apptainer container with pre-installed dependencies:

    • You can either build your container on a local Linux system and copy the container to the cluster or build it directly on a login node.

    • Example of building an RStudio container on a local Linux (Ubuntu) system

      # Install Apptainer if not already done so
      $ sudo apt update
      $ sudo apt install -y software-properties-common
      $ sudo add-apt-repository -y ppa:apptainer/ppa
      $ sudo apt update
      $ sudo apt install -y apptainer
      # Build a new sandbox container from one of the Rocker Project images
      # The geospatial images includes a range of geospatial packages
      $ apptainer build --sandbox rstudio/ docker://rocker/geospatial
      # Run a shell within the sandbox container with write and sudo privileges
      $ apptainer shell --writable --fakeroot rstudio/
      # Start R and install any additional packages that may be needed
      Apptainer> R
      R version 4.4.1 (2024-06-14) -- "Race for Your Life"
      ...
      > install.packages('climatol')
      ...
      > q()
      Apptainer> exit
      # Convert the sandbox container into a SIF container to be transferred to the HPC cluster
      $ apptainer build rstudio.sif rstudio

Python

  • Alliance Documentation - Python

  • Supported Versions: 2.7, 3.6, 3.7, 3.8, 3.9, 3.10, 3.11 and 3.12

  • Supported Virtual Environment Managers:

    • UBC ARC Sockeye:

    • Alliance: virtualenv or venv. conda is not supported, but it can be run using an Apptainer container. In general, Alliance recommends avoiding conda-forge and Anaconda packages if possible. Go here for more details. See instructions below for creating and running micromamba, a lightweight, drop-in replacement for conda, from an Apptainer container:

      • Creating an Apptainer container with conda and Anaconda packages:

        # Pull the micromamba Docker container and create a new sandbox container from it
        $ apptainer build --sandbox container/ docker://mambaorg/micromamba:bookworm-slim
        # Run a shell with write access in the sandbox container
        $ apptainer shell --writable -C --shell /bin/bash
        # Update micromamba
        Apptainer> micromamba self-update
        # Install some packages from conda-forge
        Apptainer> micromamba install -n base python=3.11 <package_names>
        Apptainer> exit
        # Convert the sandbox container into a SIF container
        $ apptainer build <container_name>_<timestamp>.sif container/
      • SLURM job script for running a Python script within the container created above:

        #!/bin/bash
        # Include the shebang at the top of the file, to clearly identify that the following script is meant for the Bash Shell
        # Provide flags as comments in the script. These will be read by SLURM to set option flags
        # Identify your user account
        #SBATCH --account=def-someuser
        # Identify the amount of memory to use per CPU
        #SBATCH --mem-per-cpu=1.5G  # In this case the job will only use one CPU with 1.5 GB of memory
        #SBATCH --time=1:00:00 # And I expect that the job will take a little less than an hour
        # At the beginning of your job load the Apptainer module
        $ module load apptainer
        # Run a Python script using an environment within your conda container
        $ apptainer run -C -B <directory holding Python script> -W $SLURM_TMPDIR <container_name>_<timestamp>.sif micromamba run python <script>
  • Supported Packages (Alliance): https://docs.alliancecan.ca/wiki/Available_Python_wheels

    • Note: include --no-index flag with pip to only install Alliance wheels. The Alliance Python wheels have been specifically compiled and optimized to run as effectively on HPC clusters as possible.
  • Example Job Script:

    #!/bin/bash
    # Include the shebang at the top of the file, to clearly identify that the following script is meant for the Bash Shell
    # Provide flags as comments in the script. These will be read by SLURM to set option flags
    # Identify your user account
    #SBATCH --account=def-someuser
    # Identify the amount of memory to use per CPU
    #SBATCH --mem-per-cpu=1.5G  # In this case the job will only use one CPU with 1.5 GB of memory
    #SBATCH --time=1:00:00 # And I expect that the job will take a little less than an hour
    # At the beginning of your job load any software dependencies needed for your job
    module load python/3.10
    # If running Python, create and activate a virual environment
    virtualenv --no-download $SLURM_TMPDIR/venv
    source $SLURM_TMPDIR/venv/bin/activate
    # Upgrade pip and install Python dependencies using Alliance wheels listed as dependencies in requirements.txt
    python -m pip install --no-index --upgrade pip
    python -m pip install --no-index -r requirements.txt
    # Run the script
    python -m script.py
    # Deactivate the virtual environment
    deactivate
  • Interactive Sessions:

    • Sockeye

    • Alliance

      • Use the Slurm allocation (salloc) command

      • Example command for starting a single core IPython session on Alliance:

        $ salloc --time=00:15:00
        salloc: Pending job allocation 1234567
        ...
        salloc: Nodes cdr<###> are ready for your job
        $ module load gcc/9.3.0 python/3.10
        $ virtualenv --no-download venv
        $ source venv/bin/activate
        (venv) $ python -m pip install --no-index ipython
        (venv) $ ipython
        Python 3.10.2 ...
        ...
        In [1]: <python code>
        ...
        In [2]: exit
        (venv) $ deactivate
        $ exit
        salloc: Relinquishing job allocation 1234567

Other Resources

R

  • Alliance Documentation - R

  • Supported Versions: 4.0, 4.1, 4.2, 4.3, and 4.4

  • Example Job:

    #!/bin/bash
    #SBATCH --account=def-someuser
    #SBATCH --mem-per-cpu=1.5G
    #SBATCH --time=1:00:00
    ####################################################
    module load gcc/12.3 r/4.4.0
    Rscript script.r
  • Interactive Sessions:

    • Sockeye

    • Alliance

      • Use the Slurm allocation (salloc) command

      • Example command for starting session:

        $ salloc --time=00:15:00
        salloc: Pending job allocation 1234567
        ...
        salloc: Nodes cdr<###> are ready for your job
        $ module load gcc/12.3 r/4.4.0
        $ R
        R version 4.4.0 (2024-04-24) -- "Puppy Cup"
        ...
        > <r code here>
        > q()
        Save workspace image? [y/n/c]:
        $ exit
        salloc: Relinquishing job allocation 1234567

Other Resources:

Other Supported Languages

  • C, C++, Objective-C, and other GCC supported languages

  • Elixir (Alliance)

  • Go (Alliance)

  • Java

  • JavaScript/TypeScript via node.js

  • Julia

  • MATLAB

  • Octave

  • Perl

  • Ruby (Alliance)

  • Rust (Alliance)

Containers on HPC

  • Use Cases:

    • Running software that is not already included as an HPC module

    • Building runtime environments from existing projects and easily reproducing research

  • Documentation

  • Documentation for HPC Container Platform - Apptainer

Photogrammetry

Photogrammetry software can be incredibly resource intensive, so running it on an HPC cluster can save you a lot of time and resources when working with extraordinarily large datasets.

For an assortment of reasons, most HPC clusters only install free and open-source software. This includes UBC ARC’s Sockeye cluster and the Digital Research Alliance’s clusters. This means that software like Agisoft Metashape, ArcGIS Drone2Map, and Pix4D are unfortunately not available for use on HPC clusters.

To leverage HPC clusters for processing drone images or video into orthophotos, 3D models, point clouds, or elevation models, Open Drone Map (ODM) will likely be your best option.

ODM is most often distributed within a Docker container, and it can just as easily run within alternative container engines, like Apptainer or Podman. Prior to running ODM on an HPC cluster, it is highly recommended that you take a few test runs on a local machine with a subset of your data. This will give you an opportunity to explore and optimize any option flags to produce the best results for your dataset. You can find information on installing Apptainer here.

To test ODM on a local machine that has Apptainer installed, make a directory that will be bound to the container for ODM’s input and output. Then nest another directory named ‘images’ with a subset of your data stored within it. The example below uses a directory named ‘odm_test’.

$ apptainer run -B odm_test:/project/code docker://opendronemap/odm:latest --project-path /project

After processing your dataset, ODM creates a report at ‘odm_test/odm_report/report.pdf’, which can be used to quickly analyze your results and start calculating a total job time estimate for your HPC job script.

Once you’ve identified any flags that you want to add to the command, you will need to convert the ODM Docker container into an Apptainer SIF container with the following command on your local machine:

$ apptainer build odm_latest.sif docker://opendronemap/odm:latest
# For a container with GPU support, run:
# apptainer build odm_gpu.sif docker://opendronemap/odm:gpu

Then copy the Apptainer SIF file from your local machine to your HPC home directory and ensure your dataset is stored in a project directory.

you can use the following in a SLURM job script on the HPC cluster:

#!/bin/bash
# Include the shebang at the top of the file, to clearly identify that the following script is meant for the Bash Shell
# Provide flags as comments in the script. These will be read by SLURM to set option flags
# Identify your user account
#SBATCH --account=def-someuser
# Identify the amount of CPU cores and memory to allocate the job
#SBATCH --cpus-per-task=20
#SBATCH --mem=64G
#SBATCH --time=5:00:00 # You can use the test runs on your local machine to help you estimate this
# At the beginning of your job load the Apptainer module
module load apptainer
# CPU Only
apptainer run -C -W $SLURM_TMPDIR -B /project/images:/project/code/images,/scratch:/project odm_latest.sif --project-path /project
# For a container with GPU support add the following to the top of your job script along with the following command
#SBATCH --gpu-per-node=1
# apptainer run -C -W $SLURM_TMPDIR -B /project/images:/project/code/images,/scratch:/project --nv odm_gpu.sif --project-path /project

The command in the above job script may need to be slightly adjusted based on the structure of the HPC cluster that you are using with the job script parameters also being modified to account for the overall size of your dataset. In general, the above parameters should work well for a dataset of approximately 1000 images1. You can also review the following table to help estimate the amount of memory to allocate your job based on the size of your dataset.

Number of Images RAM or RAM + Swap
40 4
250 16
500 32
1500 64
2500 128
3500 192
5000 256

Source: https://docs.opendronemap.org/installation/#hardware-recommendations

Based on best practices, you will want to store your dataset within your project folder and bind the directory holding your dataset to a directory within the container with the path ‘/project/code/images’. You will also want to set ODM to output to your scratch directory by binding it to the ODM project directory. Finally, flag ODM to use the bound scratch directory as the project output directory.

Once the job has been completed, you will need to review ODMs output within the scratch directory and transfer any relevant files back into your project directory before cleaning up the scratch directory.

Footnotes

  1. Gbagir, A. G., Ek, K., & Colpaert, A. (2023). OpenDroneMap: Multi-platform performance analysis. Geographies, 3(3), 446-458. https://doi.org/10.3390/geographies3030023↩︎