Introduction

In this document, we use four different methods: loop, reduce, recursion, and memorization, to compute ‘factorial’ first. In the end, we use microbenchmark package, which is a standard metric in R, to evaluate their performances in time.

Including Necessary Packages

library(purrr)
library(microbenchmark)

Note that purrr package will be used in reduce method.

Loop:

We use a ‘for loop’ to compute the result directly:

factorial_loop <- function(n){
        if (n ==0){
            1
        }else{
            y <- 1
            for(i in 1:n){
                y <-y*i
            }
            y
        }
       
    }

Reduce:

‘Reduce’ is a method that can propagate the result from right to left.

factorial_reduce <- function(n){
    if(n ==0){
        1
    }else{
        reduce(c(1:n), function(x, y){
            #message("x is ", x)
            #message("y is ", y)
            #message("")
            x * y
        })
    }
}

Recursion:

‘Recusrion’ is a simple way to solve the problem by calling its sub-problem repeatly until the initial condition is matched.

factorial_func <- function(n){
    if (n == 0){
        1
    }else{
        n * factorial_func(n-1)
    }
}

Memorization:

Different from Recursion, memorization will keep the results by extra memory space.

fac_tbl <- c (1, rep(NA, 24))

factorial_mem <- function(n){
    stopifnot(n >=0)
    
    if (n == 0){
        1
    }else{
        if(!is.na(fac_tbl[n])){
            fac_tbl[n]
        } else {
            fac_tbl[n-1] <<- factorial_mem(n-1)
            n * fac_tbl[n-1]
        }
    }
}

Evaluation

We use microbenchmark package to evaluate these four methods’ performance in time. Note that the unit is in micro-second.

evaluate_by_microbenchmark <- function(n){
    record_temp_perf <- summary(microbenchmark(factorial_loop(n), 
                                       factorial_reduce(n),
                                       factorial_func(n),
                                       factorial_mem(n)))
    record_temp_perf.df <- data.frame(record_temp_perf)
    print(record_temp_perf.df)
}

For example, we can use four methods to compute factorial number and evaluate them:

    evaluate_by_microbenchmark(3)
##                  expr    min      lq     mean  median      uq     max
## 1   factorial_loop(n)  1.550  1.9355  2.73572  2.1720  2.3580  45.946
## 2 factorial_reduce(n) 11.631 12.8825 17.27129 14.1715 15.2665 264.492
## 3   factorial_func(n)  2.655  3.0415  4.16900  3.2505  3.5475  24.488
## 4    factorial_mem(n) 18.091 19.3935 21.95915 20.4660 21.8890  63.460
##   neval
## 1   100
## 2   100
## 3   100
## 4   100

Note that the output gives summary statistics (min, lq, mean, median, uq, and max) describing the time it took to run the two function over the 100 iterations of each function call so ‘neval’ is equal to 100. Moreover, ‘lq’ and ‘uq’ means 25% and 75% of the distribution, respectively.

We then compute factorial number from 1 to 12 and evaluate them respectively. Note that in my computer, factorial number 13 will cause overflow.

for (i in 1:12){
    evaluate_by_microbenchmark(i)
}
##                  expr   min     lq     mean  median      uq     max neval
## 1   factorial_loop(n) 1.149 1.5170  2.24940  1.7920  2.1430  10.808   100
## 2 factorial_reduce(n) 8.451 9.4375 19.87388 10.2415 12.3555 620.166   100
## 3   factorial_func(n) 1.209 1.5475  2.21252  1.9060  2.3080  10.048   100
## 4    factorial_mem(n) 8.307 9.6540 16.12930 10.6590 19.1825  77.068   100
##                  expr    min      lq     mean  median      uq    max neval
## 1   factorial_loop(n)  1.423  1.8275  2.61825  2.1400  2.9350  9.890   100
## 2 factorial_reduce(n) 10.874 11.8085 16.65233 12.8725 14.9400 90.638   100
## 3   factorial_func(n)  2.003  2.3915  3.25551  2.5715  2.8195 29.530   100
## 4    factorial_mem(n)  7.955  9.5520 13.31883 10.1725 10.9125 48.513   100
##                  expr    min      lq     mean  median     uq    max neval
## 1   factorial_loop(n)  1.575  2.0110  2.19815  2.1905  2.312  6.990   100
## 2 factorial_reduce(n) 11.791 12.9190 14.31068 14.1230 14.659 59.775   100
## 3   factorial_func(n)  2.642  2.9770  3.20049  3.1700  3.364  5.897   100
## 4    factorial_mem(n) 18.426 19.3735 20.77403 20.2045 20.951 43.628   100
##                  expr    min      lq     mean  median      uq    max neval
## 1   factorial_loop(n)  1.774  2.2095  2.51121  2.4080  2.5565 16.580   100
## 2 factorial_reduce(n) 13.340 14.2705 15.52745 15.3300 15.8095 48.341   100
## 3   factorial_func(n)  3.476  3.8380  4.08287  4.0180  4.2390  7.209   100
## 4    factorial_mem(n) 18.191 19.1935 20.77293 20.1255 21.0680 52.603   100
##                  expr    min      lq     mean  median      uq     max
## 1   factorial_loop(n)  2.043  2.5640  3.75356  2.9550  4.3925  20.916
## 2 factorial_reduce(n) 14.796 16.3675 22.99093 17.7460 20.7055 130.597
## 3   factorial_func(n)  4.165  4.7680  6.02899  5.1355  5.6030  11.485
## 4    factorial_mem(n) 18.782 20.5885 28.41639 21.8765 41.7235  87.003
##   neval
## 1   100
## 2   100
## 3   100
## 4   100
##                  expr    min      lq     mean  median      uq    max neval
## 1   factorial_loop(n)  2.261  2.7120  3.04454  2.9620  3.1965  8.318   100
## 2 factorial_reduce(n) 15.951 17.1070 18.98521 18.2090 18.8830 55.206   100
## 3   factorial_func(n)  4.869  5.3215  5.69003  5.6230  5.8260 14.508   100
## 4    factorial_mem(n) 18.333 19.6220 22.15418 20.5635 21.1710 93.079   100
##                  expr    min      lq     mean  median      uq    max neval
## 1   factorial_loop(n)  2.469  2.9495  3.29283  3.1395  3.3620 16.507   100
## 2 factorial_reduce(n) 17.176 18.6690 19.93469 19.5535 20.2340 51.081   100
## 3   factorial_func(n)  5.745  6.0945  6.48984  6.4150  6.7040 10.069   100
## 4    factorial_mem(n) 17.961 19.3035 20.85047 20.5600 21.1975 51.354   100
##                  expr    min      lq     mean  median      uq     max
## 1   factorial_loop(n)  2.745  3.2870  4.02985  3.5050  3.7765  19.735
## 2 factorial_reduce(n) 18.754 20.6295 23.32073 21.5390 22.4615  80.784
## 3   factorial_func(n)  6.408  6.8370  8.06142  7.2595  7.7770  28.331
## 4    factorial_mem(n) 18.647 19.9920 22.63953 20.9055 21.6000 104.610
##   neval
## 1   100
## 2   100
## 3   100
## 4   100
##                  expr    min      lq     mean  median      uq     max
## 1   factorial_loop(n)  3.020  3.6195  4.92024  3.9350  6.5775   9.732
## 2 factorial_reduce(n) 20.427 22.2935 30.16569 23.3285 34.0840  99.081
## 3   factorial_func(n)  7.134  7.9330 10.87285  8.5605 14.0570  26.381
## 4    factorial_mem(n) 18.517 20.8025 32.16427 22.8065 42.4115 159.118
##   neval
## 1   100
## 2   100
## 3   100
## 4   100
##                  expr    min      lq     mean  median      uq     max
## 1   factorial_loop(n)  3.217  3.8530  4.62227  4.2020  4.4800  17.430
## 2 factorial_reduce(n) 22.560 24.2090 30.21288 25.0180 26.7560 167.815
## 3   factorial_func(n)  7.996  8.7460 10.00581  9.0820  9.3840  34.991
## 4    factorial_mem(n) 18.424 20.8415 26.19596 21.7095 22.7685 180.932
##   neval
## 1   100
## 2   100
## 3   100
## 4   100
##                  expr    min      lq     mean  median      uq     max
## 1   factorial_loop(n)  3.332  3.9100  5.06758  4.3540  4.8475  33.651
## 2 factorial_reduce(n) 22.085 25.3845 30.59537 26.6275 28.2010  84.275
## 3   factorial_func(n)  8.609  9.6160 11.14002 10.0200 10.5135  24.144
## 4    factorial_mem(n) 18.510 20.6015 26.57481 21.7240 23.7470 115.039
##   neval
## 1   100
## 2   100
## 3   100
## 4   100
##                  expr    min      lq     mean  median      uq     max
## 1   factorial_loop(n)  3.645  4.1595  4.52455  4.4165  4.7395   8.207
## 2 factorial_reduce(n) 23.597 26.4670 30.32473 27.5030 28.9965 131.069
## 3   factorial_func(n)  9.234 10.1055 11.20232 10.5680 10.9850  31.583
## 4    factorial_mem(n) 18.211 20.1095 22.68465 21.0175 22.3090  88.959
##   neval
## 1   100
## 2   100
## 3   100
## 4   100

With these data, we finally draw the trends by their means

time_data <- data.frame(factorial_loop=c(2.1395,1.97,2.37,3.215,4.03,3.11,3.74,4.49,4.35,5.25,4.40,4.70),
                        factorial_reduce=c(11.38,12.38,17.51,21.186,25.29,20.62,23.40,26.49,28.44,28.88,26.36,30.48),
                        factorial_func=c(2.414,2.48,3.79,5.41,6.82,6.06,6.89,9.65,9.67,9.85,10.03,11.26),
                        factorial_mem=c(11.2,9.69,23.96,28.39,31.47,22.24,24.87,27.715,25.39,24.73,21.44,23.23),
                        number=c(1,2,3,4,5,6,7,8,9,10,11,12))

and then draw the line graph.

library(ggplot2)
library(reshape2)
data_long <- melt(time_data, id="number")
ggplot(data=data_long,
       aes(x=number, y=value, colour=variable)) +
       geom_line()

Summary

According to the graph shown above, Loop generally has the best performance and its increase is close to constant. The main reason is that it just scans the input one time and then obtains the answer.

Recursion is the next but its increase is dramatic sine it will call its sub-problem and save no intermediate result.

Memorization has the worst performance in the beginning since it is very close to what Recursion does but still has extra I/O time to keep intermediate results. However, its benefit will be gained when n is larger since what it has to do is to obtain the saved result and operate a multiply operation.

Reduce’s spirit are accumulation and propagation. It has to scan 1 to n and accumulates all intermediates. Since it has most operations and costs most space, it will have worst performance when n is large.

The main drawback is that my computer can not compute the result when n is greater than 13. If this problem can be overcome, I think the trend will become obvious.