Week13Quiz

Timing a function for calculating the median in R

For this part of the Week 13 Quiz we will be creating a function to calculate the median of a vector of values and timing the function to compare it with the timings of the same functions in Julia, R, and Python. The first thing that we will do is generate a set of 100,000 random values between 1 and 100.

set.seed(42)
values <- sample(1:100, 100000, replace = TRUE)

Now we generate a function to find the median. This function is based off the function we created in the week 3 assignment.

numMedian <- function(vec){
    # Returns the median of a vector
    # Helper function for numSum()
    # Args:
    #   vec: a numeric vector to be evaluated
    #
    # Return:
    #   The value of the median.
    # Error Check
    if(is.numeric(vec) == FALSE){
        return("The input vector is not numeric.")
    }
    vec.sort <- sort(vec)
    vec.length <- length(na.omit(vec.sort))
    if(vec.length %% 2 == 0){
        median <- (vec.sort[vec.length / 2] + vec.sort[vec.length / 2 + 1]) / 2
    } else {
        median <- vec.sort[vec.length %/% 2 + 1 ]
    }
    return(median)
}

median(values)

## [1] 51

numMedian(values)

## [1] 51

Now we will use the microbenchmark package to generate benchmark times for the function that we have written and the built in function.

library(microbenchmark)

## Warning: package 'microbenchmark' was built under R version 3.1.2

results <- microbenchmark(numMedian(values), median(values), times=1000)

print(results)

## Unit: milliseconds
##               expr    min     lq   mean median     uq   max neval
##  numMedian(values) 14.717 14.868 16.311 15.015 16.831 58.96  1000
##     median(values)  3.329  3.366  4.227  3.384  4.934 46.74  1000

library(ggplot2) #nice log plot of the output
qplot(y=time, data=results, colour=expr) + scale_y_log10()

plot of chunk unnamed-chunk-5

From the above results we can see that the function that we have created is much more inefficient than the built in function with the built in function having a median run time of 0.003 seconds and our function having a median run-time of 0.014 seconds.

Week13Quiz_R

Erik Nylander

Sunday, November 23, 2014

Timing a function for calculating the median in R