Skip to content
Kaihua Ding edited this page Nov 2, 2021 · 54 revisions

Quick Links: Q&A List of Modules

User Guide for Skyway - RCC Cloud Solution

What is Skyway?

Skyway refers the name of (1) a software package that supports dispatching computing tasks to cloud resources from the SLURM scheduler, (2) and a computer cluster with multiple partitions from cloud virtual machines managed by SLURM.

Skyway is an integrated solution for RCC users for bursting computing tasks from the on-premise cluster (i.e., Midway) to remote virtual computational resources (i.e., Amazon AWS, Microsoft Azure, Google GCP, et al). The Skyway platform enables computing tasks in the cloud from Midway in a seamless manner, so that users do not need to setup or manage cloud resources themselves in order to take advantage of them.

Skyway has a SLURM cluster setup that's almost the same as the RCC on-premise cluster, Midway, with very similar job scheduler configurations, software modules, and file storage systems.

Requirements of using Skyway

  • Have an active RCC user account
  • Experience using the Midway cluster
  • Experience using the SLURM resource scheduler

Login to Skyway

First, Login to the Midway cluster.

ssh [CNetID]@midway2.rcc.uchicago.edu

Then, login to Skyway from Midway

ssh skyway.rcc.uchicago.edu

File Systems

1. /home/[CNetID]

This is the temporary home directory (no backup) for users on Skyway. Note, this is NOT the home file system on Midway, so you won't see any contents from your home directory on midway. Please do NOT store any sizable or important data here.

TO DO: Add note here about changing $HOME environment variable to /cloud/rcc-aws/[CNetID].

2. /project and /project2

This is the RCC high-performance capacity storage file systems from Midway, mounted on Skyway, with the same quotas and usages as on Midway. Just as with running jobs on Midway, /project and /project2 should be treated as the location for users to store the data they intend to keep. This also acts as a way to make data accessible between Skyway and midway as the /project and /project2 filesystems are mounted on both systems.

TO DO: /project does not exist as of 11-02-2021.

Run cd /project/<labshare> or /project2/<labshare>, where <labshare> is the name of the lab account, to access the files by your lab or group. This will work even if the lab share directory does not appear in a file listing, e.g., ls /project.

3. /cloud/[account]/[CNetID]

Options of [account]: rcc-aws or other cloud accounts

This is the cloud scratch folder (no backup), which is intended for read/write of cloud compute jobs. For example, with Amazon cloud resources (AWS) The remote cloud S3 AWS bucket storage is mounted to Skyway at this path. Before submitting jobs to the cloud compute resources, users must first stage the data, scripts and executables their cloud job will use to the /cloud/rcc-aws/[CNetID] folder. After running their cloud compute job, users should then copy the data they wish to keep from the /cloud/rcc-aws/[CNetID] folder back to their project folder. Similarly, if users are using Google Cloud Platform (GCP), the scratch folder /cloud/rcc-gcp/[CNetID] should be used.

Software Modules

Skyway uses the same module version as is used on the Midway cluster to manage software packages, but the software modules are not the same. To check the available software modules on Skyway, issue the command "module avail". For more information on using the module commands, see the module user manual. If there is a particular software package missing that your workflow requires, please write to [email protected] to request it be added to Skyway.

Current list of software modules installed on Skyway includes the following:

  • anaconda3 -- Python3 Anaconda distribution
  • cmake

How to prepare executable binaries?

It is not recommended to compile or install software packages directly on Skyway. Users should compile and install their own codes on the Midway2 cluster. Midway2 and Skyway have the same system architecture so any codes compiled on Midway2 will also likely run on Skyway without any recompilation.

Note that the /project and /project2 folders are only visible from skyway (skyway login). They are not visible from the cloud compute nodes, which is why users must copy the executables and other data required of their job, into the scratch space (/cloud/rcc-aws) in order for it to be accessible from the cloud compute nodes.

Submit and Manage Jobs via SLURM

Skyway uses SLURM to submit jobs the same as on the Midway cluster. Some commonly used commands are:

  • sinfo - Show compute nodes status
  • sbatch - Submit computing jobs
  • scancel - Cancel submitted jobs
  • sacct - Check logs of recent jobs

When submitting jobs, include following two options in the job script:

  • --partition=rcc-aws
  • --account=rcc-aws

Specify the cloud compute resource:

To submit jobs to cloud, you must specify a type of virtual machine (VM) by the option --constraint=[VM Type]. The VM types currently supported through Skyway can be found in the table below. You can also get an up-to-date listing of the machine types by running command sinfo-node-types on a skyway login node.

VM Type Description Instance Type
rcc-aws c1 1 core, 4G Mem (for serial jobs) AWS c5.large
rcc-aws c36 36 cores, 144G Mem (for large memory jobs) AWS c5.18xlarge
rcc-aws g1 2x V100 GPU AWS p3.2xlarge
msca-gcp c1 1 core, 1.35G Mem (for serial jobs) GCP
msca-gcp c30p 30 cores, GCP
msca-gcp v1 6x V100 GPUs GCP
msca-gcp a1 6x V100 GPUs GCP

TO DO: the above table is outdated as of 11-02-2021, c1 type does not exist on AWS EC2 website. How can one find out what machine these are.

To see more information about these types, please visit AWS EC2 Website and GCP machine types. Please note that we are using the C5 compute optimized for Skyway at this moment, and the cores for each type is half (physical cores) as the numbers listed as vCPU (with hyper-threaded cores) on the website.

To submit a Slurm job from skyway, users must specify --time=XX:XX:XX and --account=XXX. Otherwise, your job may not run.

A sample job script: sample.sbatch

#!/bin/sh

#SBATCH --job-name=TEST
#SBATCH --partition=rcc-aws
#SBATCH --account=rcc-aws
#SBATCH --exclusive
#SBATCH --ntasks=1
#SBATCH --constraint=c1 # Specifies you would like to use a t2 instance
#SBATCH --time=00:15:00

cd $SLURM_SUBMIT_DIR

hostname
lscpu
lscpu --extended
free -h

Interactive Jobs

Example of a testing node

sinteractive --account=rcc-aws --partition=rcc-aws --constraint=c1 --ntasks=1 --time=00:15:00

Example of GPU jobs:

sinteractive --account=rcc-aws --partition=rcc-aws --constraint=g1 --ntasks=1 --gres=gpu:1 --time=00:15:00

User Packages for R and Python

The popular scripting languages, Python and R, manage their own packages/modules library. As usually the system location for these software and library are read-only, regular users usually install local packages by their own in the home folders (i.e., /home/[username]) by default. However, this is not recommended on Skyway, as it is expected that all user contents are stored at the cloud scratch space at /cloud/rcc-aws/[username]. Therefore, there are some extra steps to modify the default packages path for using Python and R.

Setting user local packages path for R

TO DO: as of 11-02-2021, R module is removed, but why?

You can run following commands before running R or put them into ~/.bashrc

export R_LIBS_USER=/cloud/rcc-aws/${USER}/pkgs-R
if [ ! -d "${R_LIBS_USER}" ]; then mkdir ${R_LIBS_USER}; fi

After launching R, you can check if the default (first) path for packages is correct. Example

[yuxing@rcc-aws-t2-micro-001 ~]$ module load R
[yuxing@rcc-aws-t2-micro-001 ~]$ R
...
> .libPaths()
[1] "/cloud/rcc-aws/yuxing/pkgs-R"         "/software/r-3.5/lib64/R/library"

Setting user local packages path for IPython

PIP tool is used to install/manage Python packages. To install packages in a different location, you need to specify the "prefix" option. Example:

pip install --install-option="--prefix=${PKGS_PYTHON}" package_name

${PKGS_PYTHON} should point to the path for the package installations, and you need to also add it into PYTHONPATH before running IPython in order to load modules successfully. Example:

The following commands can be added to ~/.bashrc

export PKG_PYTHON=/cloud/rcc-aws/${USER}/pkgs-python
if [ ! -d "${PKG_PYTHON}" ]; then mkdir ${PKG_PYTHON}; fi
export PYTHONPATN=${PKG_PYTHON}:${PYTHONPATH}

Setting user local packages path for Conda

The best way to manage your local packages for Python within Conda software (Anaconda or Miniconda) is using the virtual environment. You can create your own virtual environment under the cloud scratch space. Example:

conda create --prefix=/cloud/rcc-aws/${USER}/conda

You should see the result like:

Solving environment: done

## Package Plan ##

  environment location: /cloud/rcc-aws/yuxing/conda

Proceed ([y]/n)? y

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use:
# > source activate /cloud/rcc-aws/yuxing/conda
#
# To deactivate an active environment, use:
# > source deactivate
#