Optimizing.Rmd

---
title: "Optimizing your code"
author: "Pedro Ojeda"
date: "Feb., 2021"
output: 
    ioslides_presentation:
       widescreen: true
       css: custom.css
       logo: Images/logo.png
---

## Baseline 

There are some tricks you can follow in order to get a faster code, we will use our Pi function from the previous topic
as our baseline:

```{r}
sim <- function(l) {
  c <- rep(0,l); hits <- 0
  pow2 <- function(x) { x2 <- sqrt( x[1]*x[1]+x[2]*x[2] ); return(x2) }
  for(i in 1:l){
             x = runif(2,-1,1)
     if( pow2(x) <=1 ){
                  hits <- hits + 1
        }
        dens <- hits/i; pi_partial = dens*4; c[i] = pi_partial
    }
   return(c)
}
```

## Baseline

the execution time of this function for 100,000 iterations is

```{r}
size <- 100000
system.time(
   res <- sim(size)
)
```

## Vectorization

If we vectorize the code we obtain a better performance:

```{r}
simv <- function(l) {
  set.seed(0.234234)
  x=runif(l); y=runif(l)
  z=sqrt(x^2 + y^2)
  resl <- length(which(z<=1))*4/length(z)
  return(resl)
}
```

## Vectorization
```{r}
size <- 100000
system.time(
  res <- simv(size)
)
```

a message from this example is that loops are expensive in R and vectorization can help to improve the performance of the code.

Common vector operations: $+$, $-$, $/$, $*$, $\% * \%$.

## Memory pre-allocation

```{r}
N <- 1E5
data1 <- 1 
system.time({
    for (j in 2:N) {
      data1 <- c(data1, data1[j-1] + sample(-5:5, size=1))
    }
  })
```

## Memory pre-allocation

```{r}
data2 <- numeric(N)
data2[1] <- 1
system.time({
  for (j in 2:N) {
    data2[j] <- data2[j-1] + sample(-5:5, size=1)
  }
})
```
This example shows that pre-allocating memory reduces the execution time.

## Using Data Frames with caution

```{r}
data1 <- rnorm(1E4*1000)
dim(data1) <- c(1E4,1000)
dataf <- data.frame(data1)
system.time(data2 <- rowSums(dataf))
```

## Using Data Frames with caution

```{r}
data1 <- rnorm(1E4*1000)
dim(data1) <- c(1E4,1000)
system.time(data1 <- rowSums(data1))
```

Then, it is more efficient to use matrices upon doing numerical calculations rather than Data Frames. 


## Different implementations of functions

Principal components analysis

![J. Chem. Inf. Mod., 57, 826-834 (2017)](Images/pca_analysis.png){width=500px}

## Different implementations of functions

```{r}
data <- rnorm(1E5*100)
dim(data) <- c(1E5,100)
system.time(prcomp_data <- prcomp(data))
system.time(princomp_data <- princomp(data))
```

## Compiling your functions

```{r eval=TRUE}
library(microbenchmark)
library(compiler)

sim <- function(l) {
  c <- rep(0,l); hits <- 0
  pow2 <- function(x) { x2 <- sqrt( x[1]*x[1]+x[2]*x[2] ); return(x2) }
  for(i in 1:l){
             x = runif(2,-1,1)
     if( pow2(x) <=1 ){
                  hits <- hits + 1
        }
        dens <- hits/i; pi_partial = dens*4; c[i] = pi_partial
         }
   
   return(c)
}
```

## Compiling your functions

```{r eval=TRUE}
sim.comp0 <- cmpfun(sim, options=list(optimize=0))
sim.comp1 <- cmpfun(sim, options=list(optimize=1))
sim.comp2 <- cmpfun(sim, options=list(optimize=2))
sim.comp3 <- cmpfun(sim, options=list(optimize=3))

size <- 100000
bench <- microbenchmark(sim(size), sim.comp0(size), sim.comp1(size), sim.comp2(size), 
                        sim.comp3(size))
```

## Compiling your functions

```{r eval=TRUE}
bench
```

## Compiling your functions

visualize the results:

```{r eval=FALSE}
library(ggplot2)
autoplot(bench)
```

![Violin Plot](Images/violin.png){width=450px}

## Just in time compilation

```{r eval=TRUE}
library(compiler)
enableJIT(level=3)

bench <- microbenchmark(sim(size))
bench
```


## Rcpp package

**Rcpp** package allows you to write your code in C++ that could be called within a R script:

```{r eval=TRUE, warning=FALSE, message=FALSE}
library(Rcpp)
```

```{r eval=TRUE}
cppFunction('int mul(int a, int b, int c) {
  int mul = a * b * c;
  return mul;
}')
mul
mul(2,4,6)
```

## Following cases work on Kebnekaise only: 

### Calling external functions (Fortran)

```{r eval=FALSE}
subroutine pifunc(n)
implicit none
integer, parameter :: seed = 86456
integer     :: i,n,hits
real        :: x,y,r,pival
call srand(seed)
hits = 0
do i=1,n
   x = rand()
   y = rand()
   r = sqrt(x*x + y*y)
   if(r <= 1) then 
       hits = hits + 1
   endif
enddo
pival = 4.0d0*hits/(1.0*n)
end subroutine pifunc
```


## Following cases work on Kebnekaise only: 

### Calling external functions

One compiles the function using standard compilers (Linux, Kebnekaise):

```{r eval=FALSE}
gfortran -shared -fPIC -o picalc pi.f90
```

```{r eval=FALSE, echo=TRUE}
size <- 100000

dyn.load("picalc")
is.loaded("pifunc")
.Fortran("pifunc", n = as.integer(size))
```


## Following cases that run on Kebnekaise:

### Calling external functions

now we can benchmark our functions:

```{r eval=FALSE}
library(microbenchmark)
bench <- microbenchmark(sim(size), .Fortran("pifunc", n = as.integer(size)))
bench

#Unit: milliseconds
#                        expr        min         lq       mean     median         uq 
#          sim(size) 229.596323 234.312380 240.501156 236.034249 238.871773 316.289453
# .Fortran("pifunc")   4.136534   4.155587   4.239279   4.188102   4.261413   5.747752
```

Vectorized code performance was ~10 ms.

## Following cases that run on Kebnekaise:

### Calling Julia functions

```{r eval=FALSE}
ml GCC/8.2.0-2.31.1  OpenMPI/3.1.3; ml R/3.6.0; ml julia/1.1.0
library(JuliaCall) #install.packages("JuliaCall")
julia_setup()
julia_command("
function sim(l)
  hits = 0
  for i = 1:l
    x = rand()*2 - 1
    y = rand()*2 - 1
    r = x*x + y*y
    if r < 1.0 
      hits += 1
    end
  end
  pi_partial = (hits/l)*4
end")
invisible(julia_call("sim", size))
```

## Following cases that run on Kebnekaise:

### Calling Julia functions

```{r eval=FALSE}
library(microbenchmark)
bench <- microbenchmark(sim(size), julia_call("sim", size))
bench

#Unit: microseconds
#                    expr        min         lq        mean     median         uq  
#               sim(size) 230284.183 237486.927 246621.8263 244825.215 250872.803
# julia_call("sim", size)    400.051    426.745    490.8316    496.087    542.283
```


## Summary

* Profile/benchmark your initial code to have a baseline

* Some techniques that can help you to get a faster code are
   - vectorization
   - pre-allocating objects
   - use matrices instead of Data Frames (Data Tables?)
   - check different functions/package implementations
   - use byte compiled code
   - if extra improvement is needed, it is time to consider *Rccp*
   or external function calls
   - calling *Julia* functions can also be helpful


## References
* R High Performance Programming. Aloysius, Lim; William, Tjhi. Packt Publishing, 2015.
* http://adv-r.had.co.nz/Profiling.html#vectorise
* http://adv-r.had.co.nz/Functionals.html#functionals
* [Pi vectorization](https://helloacm.com/r-programming-tutorial-how-to-compute-pi-using-monte-carlo-in-r/)
* Advanced R, Hadley Wickham, Taylor & Francis Group

[Return to Index](index.html)