-
Notifications
You must be signed in to change notification settings - Fork 6
/
RHPC.Rmd
448 lines (320 loc) · 11.3 KB
/
RHPC.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
---
title: "Using R in HPC"
author: "Pedro Ojeda"
date: "Feb., 2021"
output:
ioslides_presentation:
widescreen: true
css: custom.css
logo: Images/logo.png
---
## Desktop PC vs. HPC architectures
![](Images/kebne_arch.png){width=700px}
## Types of programs
Programs could be of different types:
* Compute bound if they use mainly CPU power (more cores can help)
* Memory bound if the bottlenecks are allocating memory, copying/duplicating
objects (large memory nodes on Kebnekaise)
## Parallelization levels
* Implicit parallelism is included in some packages, one only needs to assign
the number of workers (threads)
* Explicit parallelism requires the intervention of the user to write the proper
parallelization instruction (**Rmpi** for instance)
## Data dependency
This loop can be easily parallelized:
```{r eval=FALSE}
for(i in 1:100){
b[i] = 4
a[i] = 2*b[i] + 1
}
```
but this one cannot because it has dependency on the values of the **b** vector
```{r eval=FALSE}
for(i in 1:100){
b[i] = 4
a[i] = 2*b[i-1] + 1
}
```
## Using R in HPC
There are several versions of R installed on Kebnekaise
```{r eval=FALSE}
ml spider R
# Versions:
# R/3.3.1
# R/3.4.4-X11-20180131
# R/3.5.1-Python-2.7.15
# R/3.5.1
# R/3.6.0
# R/3.6.2
# R/4.0.0
ml spider R/3.6.0 #search for the modules needed by R
# You will need to load all module(s) on any one of the lines below before the "R/3.6.0"
# GCC/8.2.0-2.31.1 OpenMPI/3.1.3
```
## Using R in HPC
```{r eval=FALSE}
R --help
#Usage: R [options] [< infile] [> outfile]
# or: R CMD command [arguments]
#Start R, a system for statistical computation and graphics, with the
#specified options, or invoke an R tool via the 'R CMD' interface.
#Options:
# -h, --help Print short help message and exit
# --version Print version info and exit
# --encoding=ENC Specify encoding to be used for stdin
# --encoding ENC
# RHOME Print path to R home directory and exit
# --save Do save workspace at the end of the session
# --no-save Don't save it
# --no-environ Don't read the site and user environment files
```
## Adding your own packages in R
* https://www.hpc2n.umu.se/resources/software/user_installed/r
## SLURM workload manager
![](Images/slurm.png){width=660px}
## Running serial jobs
![](Images/using-r-hpc-serial.png){width=800px}
## Running your script
* Transfer your files to Kebnekaise
* Submit your job with: **sbatch job.sh**
* In case sbatch complains about the DOS format use the command:
**dos2unix job.sh**
before submitting your job.
* More information: https://www.hpc2n.umu.se/resources/software/r
## Running several independent jobs
One can use **job arrays** option in SLURM to run independent instances of a program:
```{r eval=FALSE}
#!/bin/bash
#SBATCH -A Project_ID
#Asking for 10 min.
#SBATCH -t 00:12:00
#SBATCH --array=1-28
##Writing the output and error files
#SBATCH --output=Array_test.%A_%a.out
#SBATCH --error=Array_test.%A_%a.error
ml GCC/8.2.0-2.31.1 OpenMPI/3.1.3
ml R/3.6.0
R --no-save --no-restore -f script.R
```
## Running R in parallel mode
![](Images/using-r-hpc-parallel.png){width=800px}
## Monitoring your jobs
* **squeue -a -u username** list your jobs on the queue
* **projinfo** displays the project's usage
## Parallel packages
Some packages like BLAS/LAPACK have an implicit parallelization layer that can be activated by setting a number of threads.
On Kebnekaise the OpenBLAS libraries are available and can use implicit parallelism:
```{r eval=FALSE}
sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS
Matrix products: default
BLAS/LAPACK: /cvmfs/.../8.2.0-2.31.1/OpenBLAS/0.3.5/lib/libopenblas_haswellp-r0.3.5.so
```
## Parallel packages
The number of threads can be controlled with the *RhpcBLASctl* package and setting the number of threads:
```{r eval=FALSE}
library(RhpcBLASctl)
n <- 5000; nsim <- 3 #matrix size nxn; nr. independent simulations
set.seed(123); summa <- 0; x <- 0; blas_set_num_threads(1) #set the number of threads
for (i in 1:nsim) {
m <- matrix(rnorm(n^2), n); a <- crossprod(m) #random matrix and symmetrize it
timing <- system.time({
x <- eigen(a, symmetric=TRUE, only.values=TRUE)$values[1] #compute eigenvalues
})[3] ; summa <- summa + timing
} ; times <- summa/nsim
cat(c("Computation of eig. random matrix 5000x5000 (sec): ", times, "\n"))
```
## Parallel packages
Other packages (doParallel, parallel, doMC, doMPI, doFuture) use a common set of instructions to use parallel capabilities as follows:
```{r eval=FALSE}
library("package-name")
cl <- makeCluster(NumberofCores)
register_cluster(cl)
... #code to be run in parallel mode
stopCluster(cl)
```
## Parallel packages: examples
the **foreach** package is used for executing loops:
```{r eval=FALSE}
library(foreach)
r <- foreach(icount(trials), .combine=cbind) %do% {
ind <- sample(100,100, replace=TRUE)
result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
coefficients(result1)
}
```
## Parallel packages: examples
**doParallel** is a backend parallel package for executing code in parallel mode:
```{r eval=TRUE}
library(doParallel) #using "doParallel" package
cl <- makeCluster(2)
registerDoParallel(cl)
getDoParWorkers() #this line tells the nr. of workers
getDoParName() #this line tell the type of cluster
stopCluster(cl)
```
## Parallel packages: examples
**doParallel** can be used to execute **foreach** loops in parallel:
```{r eval=FALSE}
library(doParallel) #using "doParallel" package
cl <- makeCluster(2)
registerDoParallel(cl)
r <- foreach(icount(trials), .combine=cbind) %dopar% {
ind <- sample(100,100, replace=TRUE)
result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
coefficients(result1)
}
stopCluster(cl)
```
## Parallel packages: examples
Send slices of data to the workers with the **parallel** package
```{r eval=FALSE}
library(parallel) #using "parallel" package
detectCores()
P <- detectCores(logical = FALSE) #only physical cores
myfunc <- function(id) { #function to sum by rows
arguments <- mydata[id, ]
arguments$one + arguments$two + arguments$three }
cl <- makeCluster(P) #distribute the work across cores
clusterExport(cl, "mydata")
res <- clusterApply(cl, 1:N, fun = myfunc)
stopCluster(cl)
```
## Parallel packages: examples
The **doParallel** package is used in the following example to compute the
eigenvalues of matrices growing in size:
```{r eval=FALSE}
library(doParallel)
my_eigen <- function(x) {
n <- x*800
m <- matrix(runif(n^2),n,n)
m[lower.tri(m)] = t(m)[lower.tri(m)]
d <- diag(eigen(m)$values)
}
cl <- makeCluster(4)
registerDoParallel(cl)
system.time( res1 <- foreach(n = 1:6) %dopar% my_eigen(n) )[3]
stopCluster(cl)
#Elapsed
#211.25
```
## Parallel packages: examples
We can also use the **future** package in R which runs on several OS and supports
asynchronous calculations:
```{r eval=FALSE}
library(future)
plan(multisession, gc = TRUE, workers = 4)
plan(multisession, workers = 4)
par_future <- function(x) {
#creating futures
ft <- lapply( x, function(x) future(my_eigen(x)) )
#get futures
get_ft <- lapply(ft, value)
}
x <- 1:6
system.time( res2 <- par_future(x) )[3]
#Elapsed
#210.17
```
## Random Numbers in parallel simulations
The following simulations *res1* and *res2* do not give reproducible results:
```{r eval=TRUE}
library(doParallel)
cl <- makeCluster(2)
registerDoParallel(cl)
set.seed(1)
res1 <- foreach(n = rep(2, 3), .combine=rbind) %dopar% rnorm(n)
set.seed(1)
res2 <- foreach(n = rep(2, 3), .combine=rbind) %dopar% rnorm(n)
stopCluster(cl)
identical(res1,res2)
```
## Random Numbers in parallel simulations
For reproducible parallel simulation a RNG package such as **doRNG** is recommended:
```{r eval=TRUE}
library(doRNG)
cl <- makeCluster(2)
registerDoParallel(cl)
registerDoRNG(1)
res3 <- foreach(n = rep(2, 3), .combine=rbind) %dopar% rnorm(n)
set.seed(1)
res4 <- foreach(n = rep(2, 3), .combine=rbind) %dopar% rnorm(n)
stopCluster(cl)
identical(res3,res4)
```
## Profiling Memory: gc (Parallel)
Memory profiling is crucial upon using parallel packages. Suppose we have a data frame *mydata* which will be processed with the *clusterApply* function
```{r eval=FALSE}
gcinfo(TRUE) #activate gc
N <- 5000000
mydata <- data.frame(one=1.0*seq(N),two=2.0*seq(N),three = 3.0*seq(N))
#...
#Garbage collection 23 = 14+2+7 (level 0) ...
#43.5 Mbytes of cons cells used (66%)
#130.4 Mbytes of vectors used (65%)
gc()
# used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 572516 30.6 1233268 65.9 1233268 65.9
#Vcells 16492769 125.9 26338917 201.0 19085502 145.7
```
## Profiling Memory: gc (Parallel)
Then, we use a function to partition the data frame by cores
```{r eval=FALSE}
library(parallel) #using parallel package
detectCores()
P <- detectCores(logical = FALSE) #only physical cores
myfunc <- function(id) { #function to sum by rows
arguments <- mydata[id, ]
arguments$one + arguments$two + arguments$three
}
```
## Profiling Memory: gc (Parallel)
```{r eval=FALSE}
cl <- makeCluster(P) #distribute the work across cores
clusterExport(cl, "mydata")
res <- clusterApply(cl, 1:N, fun = myfunc)
stopCluster(cl)
#...
#Garbage collection 1196 = 1128+50+18 (level 0) ...
#312.5 Mbytes of cons cells used (60%)
#206.5 Mbytes of vectors used (59%)
gc()
# used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 5850436 312.5 9776540 522.2 9776540 522.2
#Vcells 27062930 206.5 45804848 349.5 42982557 328.0
```
the time to execute **myfunc** in parallel mode increases drastically.
## Good practices
* Use the login nodes for lightweight tasks
* Profile your code
* Monitoring your job on the fly:
If you run your script on multiple cores, you can monitor the CPU and memory usage in real time, use the following command on the terminal:
**job-usage “job_ID”**
Then copy and paste the URL on your local web browser.
## Good practices
![](Images/job-usage.png){width=750px}
## Good practices
* If you have any issue when using *R* (or any software) report the case by
creating a support ticket at *[email protected]* (HPC2N users).
- It would help if you could provide a folder with the smallest example that
shows the reported issue
## Summary
* Login nodes and Rstudio should be used for lightweight tasks for other tasks
use the SLURM batch system **sbatch script**
* Compute bound or memory bound programs
* Some packages for instance the linear algebra ones include already implicit
parallelism
* It is a good practice to get a profiling analysis (time vs. nr. cores) to
request the optimal nr. of cores in your batch script
* Monitor the behavior of your batch job with **job-usage** tool
## References
* https://swcarpentry.github.io/r-novice-inflammation/
* https://www.tutorialspoint.com/r/index.htm
* R High Performance Programming. Aloysius, Lim; William, Tjhi. Packt Publishing, 2015.
* http://adv-r.had.co.nz/memory.html
* https://blogs.oracle.com/r/managing-memory-limits-and-configuring-exadata-for-embedded-r-execution
* https://rawgit.com/PPgp/useR2017public/master/tutorial.html
* https://cran.r-project.org/web/packages/future/vignettes/future-1-overview.html
[Return to Index](index.html)