-
Notifications
You must be signed in to change notification settings - Fork 1
Home
Quick Links: Q&A List of Modules
Skyway refers the name of (1) a software package that supports dispatching computing tasks to cloud resources from the SLURM scheduler, (2) and a computer cluster with multiple partitions from cloud virtual machines managed by SLURM.
Skyway is an integrated solution for RCC users for bursting computing tasks from the on-premise cluster (i.e., Midway) to remote virtual computational resources (i.e., Amazon AWS, Microsoft Azure, Google GCP, et al). The Skyway platform enables computing tasks in the cloud from Midway in a seamless manner, so that users do not need to setup or manage cloud resources themselves in order to take advantage of them.
Skyway has a SLURM cluster setup that's almost the same as the RCC on-premise cluster, Midway, with very similar job scheduler configurations, software modules, and file storage systems.
- Have an active RCC user account
- Experience using the Midway cluster
- Experience using the SLURM resource scheduler
First, Login to the Midway cluster.
ssh [CNetID]@midway2.rcc.uchicago.edu
Then, login to Skyway from Midway
ssh skyway.rcc.uchicago.edu
This is the temporary home directory (no backup) for users on Skyway. Note, this is NOT the home file system on Midway, so you won't see any contents from your home directory on midway. Please do NOT store any sizable or important data here.
TO DO: Add note here about changing $HOME
environment variable to /cloud/rcc-aws/[CNetID]
.
This is the RCC high-performance capacity storage file systems from Midway, mounted on Skyway, with the same quotas and usages as on Midway. Just as with running jobs on Midway, /project and /project2 should be treated as the location for users to store the data they intend to keep. This also acts as a way to make data accessible between Skyway and midway as the /project and /project2 filesystems are mounted on both systems.
TO DO: /project does not exist as of 11-02-2021.
Run cd /project/<labshare>
or /project2/<labshare>
, where <labshare>
is the name of the lab account, to access the files by your lab or group. This will work even if the lab share directory does not appear in a file listing, e.g., ls /project
.
Options of [account]: rcc-aws
or other cloud accounts
This is the cloud scratch folder (no backup), which is intended for read/write of cloud compute jobs. For example, with Amazon cloud resources (AWS) The remote cloud S3 AWS bucket storage is mounted to Skyway at this path. Before submitting jobs to the cloud compute resources, users must first stage the data, scripts and executables their cloud job will use to the /cloud/rcc-aws/[CNetID] folder. After running their cloud compute job, users should then copy the data they wish to keep from the /cloud/rcc-aws/[CNetID] folder back to their project folder. Similarly, if users are using Google Cloud Platform (GCP), the scratch folder /cloud/rcc-gcp/[CNetID] should be used.
Skyway uses the same module version as is used on the Midway cluster to manage software packages, but the software modules are not the same. To check the available software modules on Skyway, issue the command "module avail". For more information on using the module commands, see the module user manual. If there is a particular software package missing that your workflow requires, please write to [email protected] to request it be added to Skyway.
Current list of software modules installed on Skyway includes the following:
- anaconda3 -- Python3 Anaconda distribution
- cmake
It is not recommended to compile or install software packages directly on Skyway. Users should compile and install their own codes on the Midway2 cluster. Midway2 and Skyway have the same system architecture so any codes compiled on Midway2 will also likely run on Skyway without any recompilation.
Note that the /project and /project2 folders are only visible from skyway (skyway login). They are not visible from the cloud compute nodes, which is why users must copy the executables and other data required of their job, into the scratch space (/cloud/rcc-aws) in order for it to be accessible from the cloud compute nodes.
Skyway uses SLURM to submit jobs the same as on the Midway cluster. Some commonly used commands are:
- sinfo - Show compute nodes status
- sbatch - Submit computing jobs
- scancel - Cancel submitted jobs
- sacct - Check logs of recent jobs
When submitting jobs, include following two options in the job script:
- --partition=rcc-aws
- --account=rcc-aws
To submit jobs to cloud, you must specify a type of virtual machine (VM) by the option --constraint=[VM Type]
. The VM types currently supported through Skyway can be found in the table below. You can also get an up-to-date listing of the machine types by running command sinfo-node-types
on a skyway login node.
VM Type | Description | Instance Type |
---|---|---|
rcc-aws c1 | 1 core, 4G Mem (for serial jobs) | AWS c5.large |
rcc-aws c36 | 36 cores, 144G Mem (for large memory jobs) | AWS c5.18xlarge |
rcc-aws g1 | 2x V100 GPU | AWS p3.2xlarge |
msca-gcp c1 | 1 core, 1.35G Mem (for serial jobs) | GCP |
msca-gcp c30p | 30 cores, | GCP |
msca-gcp v1 | 6x V100 GPUs | GCP |
msca-gcp a1 | 6x V100 GPUs | GCP |
TO DO: the above table is outdated as of 11-02-2021, c1 type does not exist on AWS EC2 website. How can one find out what machine these are.
To see more information about these types, please visit AWS EC2 Website and GCP machine types. Please note that we are using the C5 compute optimized for Skyway at this moment, and the cores for each type is half (physical cores) as the numbers listed as vCPU (with hyper-threaded cores) on the website.
To submit a Slurm job from skyway, users must specify --time=XX:XX:XX and --account=XXX. Otherwise, your job may not run.
#!/bin/sh
#SBATCH --job-name=TEST
#SBATCH --partition=rcc-aws
#SBATCH --account=rcc-aws
#SBATCH --exclusive
#SBATCH --ntasks=1
#SBATCH --constraint=c1 # Specifies you would like to use a t2 instance
#SBATCH --time=00:15:00
cd $SLURM_SUBMIT_DIR
hostname
lscpu
lscpu --extended
free -h
Example of a testing node
sinteractive --account=rcc-aws --partition=rcc-aws --constraint=c1 --ntasks=1 --time=00:15:00
Example of GPU jobs:
sinteractive --account=rcc-aws --partition=rcc-aws --constraint=g1 --ntasks=1 --gres=gpu:1 --time=00:15:00
The popular scripting languages, Python and R, manage their own packages/modules library. As usually the system location for these software and library are read-only, regular users usually install local packages by their own in the home folders (i.e., /home/[username]) by default. However, this is not recommended on Skyway, as it is expected that all user contents are stored at the cloud scratch space at /cloud/rcc-aws/[username]. Therefore, there are some extra steps to modify the default packages path for using Python and R.
TO DO: as of 11-02-2021, R module is removed, but why?
You can run following commands before running R or put them into ~/.bashrc
export R_LIBS_USER=/cloud/rcc-aws/${USER}/pkgs-R
if [ ! -d "${R_LIBS_USER}" ]; then mkdir ${R_LIBS_USER}; fi
After launching R, you can check if the default (first) path for packages is correct. Example
[yuxing@rcc-aws-t2-micro-001 ~]$ module load R
[yuxing@rcc-aws-t2-micro-001 ~]$ R
...
> .libPaths()
[1] "/cloud/rcc-aws/yuxing/pkgs-R" "/software/r-3.5/lib64/R/library"
PIP tool is used to install/manage Python packages. To install packages in a different location, you need to specify the "prefix" option. Example:
pip install --install-option="--prefix=${PKGS_PYTHON}" package_name
${PKGS_PYTHON} should point to the path for the package installations, and you need to also add it into PYTHONPATH before running IPython in order to load modules successfully. Example:
The following commands can be added to ~/.bashrc
export PKG_PYTHON=/cloud/rcc-aws/${USER}/pkgs-python
if [ ! -d "${PKG_PYTHON}" ]; then mkdir ${PKG_PYTHON}; fi
export PYTHONPATN=${PKG_PYTHON}:${PYTHONPATH}
The best way to manage your local packages for Python within Conda software (Anaconda or Miniconda) is using the virtual environment. You can create your own virtual environment under the cloud scratch space. Example:
conda create --prefix=/cloud/rcc-aws/${USER}/conda
You should see the result like:
Solving environment: done
## Package Plan ##
environment location: /cloud/rcc-aws/yuxing/conda
Proceed ([y]/n)? y
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use:
# > source activate /cloud/rcc-aws/yuxing/conda
#
# To deactivate an active environment, use:
# > source deactivate
#