Memory Allocation Optimizer

Background

As a general rule of thumb, in any programming language we should undertake the memory management as much as possible. When we grow a vector inside a loop, the vector asks the processor for extra space in between the running program and then proceeds once it gets the required memory. This process is repeated for every iteration of the loop resulting in massive delays. Thus we should pre-allocate the required memory to a vector to avoid such delays.

This Memory Allocation Optimizer checks the code for vectors that have been initialized without proper memory allocation and whenever possible, changes that allocation to an allocation with a fixed size, taken from the the loops where the vector is being called.

Example

Consider the following example:

code <- paste(
  "square_vec <- NULL",
  "mul_vec <- NULL",
  "for(i in 1:100) {",
  "  square_vec[i] <- i^2",
  "}",
  "for(i in 1:100) {",
  "  mul_vec[i] <- i * i",
  "}",
  "identical(square_vec, mul_vec)",
  sep = "\n"
)
cat(code)
## square_vec <- NULL
## mul_vec <- NULL
## for(i in 1:100) {
##   square_vec[i] <- i^2
## }
## for(i in 1:100) {
##   mul_vec[i] <- i * i
## }
## identical(square_vec, mul_vec)

Then, the automatically optimized code would be:

opt_code <- opt_memory_alloc(list(code))
cat(opt_code$codes[[1]])
## square_vec <- vector(length = 100)
## mul_vec <- vector(length = 100)
## for(i in 1:100) {
##   square_vec[i] <- i^2
## }
## for(i in 1:100) {
##   mul_vec[i] <- i * i
## }
## identical(square_vec, mul_vec)

And if we measure the execution time of each one, and the speed-up:

bmark_res <- microbenchmark({
  eval(parse(text = code))
}, {
  eval(parse(text = opt_code))
})
autoplot(bmark_res)

speed_up(bmark_res)
##            Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
## Expr_2 50.85415 48.00791 50.40635 50.52503 47.73553 136.3603

Implementation

The memory-allocation-optimizer looks for vector assignments in the entire code snippet, that have been assigned without proper allocation of memory. For instance, consider a vector vec, now this vector can be initialized without proper memory allocation in the following ways:

  • vec <- NULL
  • vec = c()
  • vec <- NA
  • logical() -> vec()
  • and so on….

When we spot these improper initializations of vectors, we change the intialization to include a suitable memory allocation in the following way, if possible:

  • vec <- vector(length = 10)

Limitations of the Optimizer

While the highly flexible and accomodating nature of the R Language is a boon for programmers, that very nature has limited the scope of this optimizer to a certain extent. In this optimizer the following points have been kept in mind:

  • Only FOR loops have been considered, owing to the fact that their sizes, most of the times, can be predicted accurately which is not the case with while and repeat loops.
  • Since most of the in-built functions of base-R can be overwritten, we had to ignore the cases where any function was used in either the body or conidition of the FOR loop.
  • Also to be sure of the size of the vector, we have only proceeded with the cases where the index was explicitly mentioned with the vectors inside the loop.