For this part of the Week 13 Quiz we will be creating a function to calculate the median of a vector of values and timing the function to compare it with the timings of the same functions in Julia, R, and Python. The first thing that we will do is generate a set of 100,000 random values between 1 and 100.
set.seed(42)
values <- sample(1:100, 100000, replace = TRUE)
Now we generate a function to find the median. This function is based off the function we created in the week 3 assignment.
numMedian <- function(vec){
# Returns the median of a vector
# Helper function for numSum()
# Args:
# vec: a numeric vector to be evaluated
#
# Return:
# The value of the median.
# Error Check
if(is.numeric(vec) == FALSE){
return("The input vector is not numeric.")
}
vec.sort <- sort(vec)
vec.length <- length(na.omit(vec.sort))
if(vec.length %% 2 == 0){
median <- (vec.sort[vec.length / 2] + vec.sort[vec.length / 2 + 1]) / 2
} else {
median <- vec.sort[vec.length %/% 2 + 1 ]
}
return(median)
}
median(values)
## [1] 51
numMedian(values)
## [1] 51
Now we will use the microbenchmark package to generate benchmark times for the function that we have written and the built in function.
library(microbenchmark)
## Warning: package 'microbenchmark' was built under R version 3.1.2
results <- microbenchmark(numMedian(values), median(values), times=1000)
print(results)
## Unit: milliseconds
## expr min lq mean median uq max neval
## numMedian(values) 14.717 14.868 16.311 15.015 16.831 58.96 1000
## median(values) 3.329 3.366 4.227 3.384 4.934 46.74 1000
library(ggplot2) #nice log plot of the output
qplot(y=time, data=results, colour=expr) + scale_y_log10()
From the above results we can see that the function that we have created is much more inefficient than the built in function with the built in function having a median run time of 0.003 seconds and our function having a median run-time of 0.014 seconds.