Potential optimizers

Inline Expansion

Idea

Replacing a function call with the body of the called function is called “inline expansion”. This eliminates the function calling overhead and also the overhead of return call from a function. It also saves the overhead of variables push/pop on the stack while function calling.

Code Examples

Unoptimized Code

cubed <- function(x) {
  x * x * x
}

inline <- function(n) {
  to_cubes <- 0
  for (i in seq_len(n)) {
    to_cubes <- to_cubes + cubed(i)
  }
}

Proposed Optimized Code

inline_opt <- function(n) {
  to_cubes <- 0
  for (i in seq_len(n)) {
    to_cubes <- to_cubes + (i * i * i) # function inlined
  }
}

Benchmark

n <- 1000
autoplot(microbenchmark(inline(n), inline_opt(n)))

Memory Pre-Allocation

Idea

As a general rule of thumb, in any programming language, we should undertake memory management as much as possible. When we grow a vector inside a loop, the vector asks the processor for extra space in between the running program and then proceeds, once it gets the required memory. This process is repeated for every iteration of the loop. Thus we should pre-allocate the required memory to a vector to avoid such delays.

Code Examples

Unoptimized Code

mem_alloc <- function(n) {
  vec <- NULL
  for (i in seq_len(n)) {
    vec[i] <- i
  }
}

Proposed Optimized Code

mem_alloc_opt <- function(n) {
  vec <- vector(length = n)
  for (i in seq_len(n)) {
    vec[i] <- i
  }
}

Benchmark

n <- 100000
autoplot(microbenchmark(mem_alloc(n), mem_alloc_opt(n)))

Vectorization

Idea

A golden rule in R programming is to access the underlying C/Fortran routines as much as possible; the fewer R function calls required to achieve this, the better. Many R functions are therefore vectorized, that is, the function’s inputs and/or outputs naturally work with vectors, reducing the number of function calls required.

Code Examples

Unoptimized Code

non_vectorized <- function(n) {
  v1 <- seq_len(n)
  v2 <- length(seq.int(n + 2, n + 1000, 2))
  res <- vector(length = length(v1))
  for (i in seq_len(n)) {
    res[i] <- v1[i] + v2[i]
  }
}

Proposed Optimized Code

vectorized <- function(n) {
  v1 <- seq_len(n)
  v2 <- length(seq.int(n + 2, n + 1000, 2))
  res <- v1 + v2
}

Benchmark

n <- 10000
autoplot(microbenchmark(non_vectorized(n), vectorized(n)))

Efficient Column Extraction

Idea

The idea would be to replace the different one-column extraction alternatives by the much faster .subset2 call alternative.

Benchmark

autoplot(microbenchmark(
  mtcars[, 11],
  mtcars$carb,
  mtcars[[c(11)]],
  mtcars[[11]],
  .subset2(mtcars, 11)
))

Drawbacks

  1. For some R classes, the [[ ]] operator and .subset work differently. For instance, they seem to be equivalent for data.frame but are not the same for matrix class.

  2. Moreover, both [[ ]] and .subset2 are functions and in R, any function can be overwritten. Thus the above optimization can be made to fail just by redefining, say, the the .subset2 function.

Efficient Value Extraction

Idea

The idea would be to replace the different one-value extraction alternatives by the much faster .subset2 call alternative.

Benchmark

autoplot(microbenchmark(
  mtcars[32, 11],
  mtcars$carb[32],
  mtcars[[c(11, 32)]],
  mtcars[[11]][32],
  .subset2(mtcars, 11)[32],
  times = 1000L
))

Drawback

  1. For some R classes, the [[ ]] operator and .subset work differently. For instance, they seem to be equivalent for data.frame but are not the same for matrix class.

  2. Moreover, both [[ ]] and .subset2 are functions and in R, any function can be overwritten. Thus the above optimization can be made to fail just by redefining, say, the .subset2 function.