RHPC.Rmd

---
title: "Using R in HPC"
author: "Pedro Ojeda"
date: "Feb., 2021"
output: 
    ioslides_presentation:
       widescreen: true
       css: custom.css
       logo: Images/logo.png
---

## Desktop PC vs. HPC architectures

![](Images/kebne_arch.png){width=700px}


## Types of programs 

Programs could be of different types:

* Compute bound if they use mainly CPU power (more cores can help)

* Memory bound if the bottlenecks are allocating memory, copying/duplicating
objects (large memory nodes on Kebnekaise)

## Parallelization levels

* Implicit parallelism is included in some packages, one only needs to assign 
the number of workers (threads)

* Explicit parallelism requires the intervention of the user to write the proper
parallelization instruction (**Rmpi** for instance)

## Data dependency

This loop can be easily parallelized:
```{r eval=FALSE}
for(i in 1:100){
    b[i] = 4
    a[i] = 2*b[i] + 1    
}
```

but this one cannot because it has dependency on the values of the **b** vector
```{r eval=FALSE}
for(i in 1:100){
    b[i] = 4
    a[i] = 2*b[i-1] + 1  
}
```

## Using R in HPC

There are several versions of R installed on Kebnekaise

```{r eval=FALSE}
ml spider R
#     Versions:
#        R/3.3.1
#        R/3.4.4-X11-20180131
#        R/3.5.1-Python-2.7.15
#        R/3.5.1
#        R/3.6.0
#        R/3.6.2
#        R/4.0.0
        
ml spider R/3.6.0   #search for the modules needed by R
#  You will need to load all module(s) on any one of the lines below before the "R/3.6.0" 
#      GCC/8.2.0-2.31.1  OpenMPI/3.1.3
```

## Using R in HPC

```{r eval=FALSE}
R --help

#Usage: R [options] [< infile] [> outfile]
#   or: R CMD command [arguments]

#Start R, a system for statistical computation and graphics, with the
#specified options, or invoke an R tool via the 'R CMD' interface.

#Options:
#  -h, --help            Print short help message and exit
#  --version             Print version info and exit
#  --encoding=ENC        Specify encoding to be used for stdin
#  --encoding ENC
#  RHOME			Print path to R home directory and exit
#  --save                Do save workspace at the end of the session
#  --no-save             Don't save it
#  --no-environ          Don't read the site and user environment files
```


## Adding your own packages in R

* https://www.hpc2n.umu.se/resources/software/user_installed/r

## SLURM workload manager 

![](Images/slurm.png){width=660px}

## Running serial jobs

![](Images/using-r-hpc-serial.png){width=800px}

## Running your script

* Transfer your files to Kebnekaise
* Submit your job with:  **sbatch job.sh**
* In case sbatch complains about the DOS format use the command:

**dos2unix job.sh** 

before submitting your job.

* More information: https://www.hpc2n.umu.se/resources/software/r

## Running several independent jobs

One can use **job arrays** option in SLURM to run independent instances of a program:

```{r eval=FALSE}
#!/bin/bash
#SBATCH -A Project_ID
#Asking for 10 min.
#SBATCH -t 00:12:00
#SBATCH --array=1-28
##Writing the output and error files
#SBATCH --output=Array_test.%A_%a.out
#SBATCH --error=Array_test.%A_%a.error

ml GCC/8.2.0-2.31.1  OpenMPI/3.1.3
ml R/3.6.0

R --no-save --no-restore -f script.R
```

## Running R in parallel mode

![](Images/using-r-hpc-parallel.png){width=800px}

## Monitoring your jobs

* **squeue -a -u username** list your jobs on the queue

* **projinfo** displays the project's usage


## Parallel packages

Some packages like BLAS/LAPACK have an implicit parallelization layer that can be activated by setting a number of threads. 

On Kebnekaise the OpenBLAS libraries are available and can use implicit parallelism:

```{r eval=FALSE}
sessionInfo()

R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS

Matrix products: default
BLAS/LAPACK: /cvmfs/.../8.2.0-2.31.1/OpenBLAS/0.3.5/lib/libopenblas_haswellp-r0.3.5.so
```

## Parallel packages

The number of threads can be controlled with the *RhpcBLASctl* package and setting the number of threads:

```{r eval=FALSE}
library(RhpcBLASctl)
n <- 5000; nsim <- 3                 #matrix size nxn; nr. independent simulations
set.seed(123); summa <- 0; x <- 0; blas_set_num_threads(1)  #set the number of threads
for (i in 1:nsim) {
  m <- matrix(rnorm(n^2), n); a <- crossprod(m)   #random matrix and symmetrize it
  timing <- system.time({
    x <- eigen(a, symmetric=TRUE, only.values=TRUE)$values[1]   #compute eigenvalues
  })[3] ;  summa <- summa + timing
} ; times <- summa/nsim
cat(c("Computation of eig. random matrix 5000x5000 (sec): ", times, "\n"))
```

## Parallel packages

Other packages (doParallel, parallel, doMC, doMPI, doFuture) use a common set of instructions to use parallel capabilities as follows:

```{r eval=FALSE}
library("package-name")

cl <- makeCluster(NumberofCores)

register_cluster(cl)

... #code to be run in parallel mode

stopCluster(cl)
```

## Parallel packages: examples

the **foreach** package is used for executing loops:

```{r eval=FALSE}
library(foreach)
r <- foreach(icount(trials), .combine=cbind) %do% {
      ind <- sample(100,100, replace=TRUE)
      result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
      coefficients(result1)
}
```

## Parallel packages: examples

**doParallel** is a backend parallel package for executing code in parallel mode: 

```{r eval=TRUE}
library(doParallel)  #using "doParallel" package
cl <- makeCluster(2)
registerDoParallel(cl)
getDoParWorkers()    #this line tells the nr. of workers
getDoParName()       #this line tell the type of cluster
stopCluster(cl)
```


## Parallel packages: examples

**doParallel** can be used to execute **foreach** loops in parallel:

```{r eval=FALSE}
library(doParallel)  #using "doParallel" package
cl <- makeCluster(2)
registerDoParallel(cl)

r <- foreach(icount(trials), .combine=cbind) %dopar% {
      ind <- sample(100,100, replace=TRUE)
      result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
      coefficients(result1)
}

stopCluster(cl)
```

## Parallel packages: examples

Send slices of data to the workers with the **parallel** package

```{r eval=FALSE}
library(parallel)  #using "parallel" package

detectCores()
P <- detectCores(logical = FALSE)  #only physical cores

myfunc <- function(id) { #function to sum by rows
  arguments <- mydata[id, ]
  arguments$one + arguments$two + arguments$three }

cl <- makeCluster(P)   #distribute the work across cores
clusterExport(cl, "mydata")
res <- clusterApply(cl, 1:N, fun = myfunc)
stopCluster(cl)
```

## Parallel packages: examples

The **doParallel** package is used in the following example to compute the 
eigenvalues of matrices growing in size:

```{r eval=FALSE}
library(doParallel)

my_eigen <- function(x) {
  n <- x*800
  m <- matrix(runif(n^2),n,n)
  m[lower.tri(m)] = t(m)[lower.tri(m)]
  d <- diag(eigen(m)$values)
}

cl <- makeCluster(4)
registerDoParallel(cl)
system.time( res1 <- foreach(n = 1:6) %dopar% my_eigen(n) )[3]
stopCluster(cl)
#Elapsed
#211.25
```

## Parallel packages: examples

We can also use the **future** package in R which runs on several OS and supports
asynchronous calculations:

```{r eval=FALSE}
library(future)
plan(multisession, gc = TRUE, workers = 4)
plan(multisession, workers = 4)
par_future <- function(x) {
  #creating futures
  ft <- lapply( x, function(x) future(my_eigen(x)) )
  #get futures
  get_ft <- lapply(ft, value)
}

x <- 1:6
system.time( res2 <- par_future(x) )[3]
#Elapsed
#210.17
```


## Random Numbers in parallel simulations

The following simulations *res1* and *res2* do not give reproducible results:

```{r eval=TRUE}
library(doParallel)
cl <- makeCluster(2)
registerDoParallel(cl)
set.seed(1)
res1 <- foreach(n = rep(2, 3), .combine=rbind) %dopar% rnorm(n)

set.seed(1)
res2 <- foreach(n = rep(2, 3), .combine=rbind) %dopar% rnorm(n)
stopCluster(cl)
identical(res1,res2)
```

## Random Numbers in parallel simulations

For reproducible parallel simulation a RNG package such as **doRNG** is recommended:

```{r eval=TRUE}
library(doRNG)
cl <- makeCluster(2)
registerDoParallel(cl)
registerDoRNG(1)
res3 <- foreach(n = rep(2, 3), .combine=rbind) %dopar% rnorm(n)

set.seed(1)
res4 <- foreach(n = rep(2, 3), .combine=rbind) %dopar% rnorm(n)
stopCluster(cl)
identical(res3,res4)
```

## Profiling Memory: gc (Parallel)

Memory profiling is crucial upon using parallel packages. Suppose we have a data frame *mydata* which will be processed with the *clusterApply* function

```{r eval=FALSE}
gcinfo(TRUE)   #activate gc 
N <- 5000000
mydata <- data.frame(one=1.0*seq(N),two=2.0*seq(N),three = 3.0*seq(N))
#...
#Garbage collection 23 = 14+2+7 (level 0) ... 
#43.5 Mbytes of cons cells used (66%)
#130.4 Mbytes of vectors used (65%)

gc()
#           used  (Mb) gc trigger  (Mb) max used  (Mb)
#Ncells   572516  30.6    1233268  65.9  1233268  65.9
#Vcells 16492769 125.9   26338917 201.0 19085502 145.7
```

## Profiling Memory: gc (Parallel)
Then, we use a function to partition the data frame by cores

```{r eval=FALSE}
library(parallel)  #using parallel package
detectCores()
P <- detectCores(logical = FALSE)  #only physical cores

myfunc <- function(id) { #function to sum by rows
  arguments <- mydata[id, ]
  arguments$one + arguments$two + arguments$three
}
```

## Profiling Memory: gc (Parallel)

```{r eval=FALSE}
cl <- makeCluster(P)   #distribute the work across cores
clusterExport(cl, "mydata")
res <- clusterApply(cl, 1:N, fun = myfunc)
stopCluster(cl)
#...
#Garbage collection 1196 = 1128+50+18 (level 0) ... 
#312.5 Mbytes of cons cells used (60%)
#206.5 Mbytes of vectors used (59%)

gc()
#           used  (Mb) gc trigger  (Mb) max used  (Mb)
#Ncells  5850436 312.5    9776540 522.2  9776540 522.2
#Vcells 27062930 206.5   45804848 349.5 42982557 328.0
```

the time to execute **myfunc** in parallel mode increases drastically.

## Good practices

* Use the login nodes for lightweight tasks
* Profile your code
* Monitoring your job on the fly:

If you run your script on multiple cores, you can monitor the CPU and memory usage in real time, use the following command on the terminal:

**job-usage   “job_ID”**

Then copy and paste the URL on your local web browser.


## Good practices

![](Images/job-usage.png){width=750px}

## Good practices

* If you have any issue when using *R* (or any software) report the case by 
creating a support ticket at *support@hpc2n.umu.se* (HPC2N users). 
  - It would help if you could provide a folder with the smallest example that 
  shows the reported issue 
  

## Summary 

* Login nodes and Rstudio should be used for lightweight tasks for other tasks
use the SLURM batch system **sbatch script**

* Compute bound or memory bound programs

* Some packages for instance the linear algebra ones include already implicit
parallelism

* It is a good practice to get a profiling analysis (time vs. nr. cores) to 
request the optimal nr. of cores in your batch script

* Monitor the behavior of your batch job with **job-usage** tool

## References
* https://swcarpentry.github.io/r-novice-inflammation/
* https://www.tutorialspoint.com/r/index.htm
* R High Performance Programming. Aloysius, Lim; William, Tjhi. Packt Publishing, 2015.
* http://adv-r.had.co.nz/memory.html
* https://blogs.oracle.com/r/managing-memory-limits-and-configuring-exadata-for-embedded-r-execution
* https://rawgit.com/PPgp/useR2017public/master/tutorial.html
* https://cran.r-project.org/web/packages/future/vignettes/future-1-overview.html

[Return to Index](index.html)