Big-O notation simulation

Rooholamin Rasooli

April 4, 2019

Time Complexity of Algorithms

let’s assume that we all know what is the subject of this presentation. In the following we are going to watch how deciding on type of our algorithm can make difference in run time of our algorithm. I suggest if you don’t know the main idea just google the main title of this page.

first let’s see a complexity chart

I wanted to hide the code for creating this plot but sometimes I don’t feel like being humble. so :

xd <- data.frame(x = c(0.01, 15))
cl <- brewer.pal(n = 6,name = 'Set2')

g <- ggplot(data = xd, aes(x = x))
g <- g + stat_function(fun = n , size = 1.5, aes(color = cl[1])) +
      stat_function(fun = logn , size = 1.5, aes(color = cl[2])) +
      stat_function(fun = nlogn , size = 1.5, aes(color = cl[3])) +
      stat_function(fun = n2, size = 1.5, aes(color = cl[4])) +
      stat_function(fun = pow2n , size = 1.5, aes(color = cl[5])) +
      stat_function(fun = factn , size = 1.5, aes(color = cl[6]))
g <- g + scale_color_identity(
      name = 'complexity Chart',
      breaks = cl,
      labels = c('n', 'log(n)', 'n*log(n)', 'n^2', '2^n', 'n!'),
      guide = 'legend'
)
g <- g + ylim(c(-10, 200))
g <- g + labs(x = 'Elements',y ='Operations')
g


Search algorithms

our first two algorithms to investigate are binary search and linear search. binary works with log(n) and linear as its name suggests has a big-O of n.

sort algorithms

  • merge sort

  • bubble sort

O(\(nlog n\)) | merge sort algorithm

Merge sort is a sorting technique based on divide and conquer technique. With worst-case time complexity being O(\(n log n\)), it is one of the most respected algorithms.

Merge sort first divides the array into equal halves and then combines them in a sorted manner.

let’s see how it works visually :

let’s go for code part and the simulation : code part includes two main function. mmerge function sorts divided by half elements which mergesort function passes to it. mergesort function is a recursive one which takes an array and splits it into halves until we get single elements.

mmerge<-function(a,b) {
      r<-numeric(length(a)+length(b))
      ai<-1; bi<-1; j<-1;
      for(j in 1:length(r)) {
            if((ai<=length(a) && a[ai]<b[bi]) || bi>length(b)) {
                  r[j] <- a[ai]
                  ai <- ai+1
            } else {
                  r[j] <- b[bi]
                  bi <- bi+1          
            }
      }
      r
}
mmergesort<-function(A) {
      if(length(A)>1) {
            q <- ceiling(length(A)/2)
            a <- mmergesort(A[1:q])
            b <- mmergesort(A[(q+1):length(A)])
            mmerge(a,b)
      } else {
            A
      }
}

as we did in previous steps, we test our code chunk passing an unordered array :

test_arr <- c(4,11,3,0,6,4,3,8,6) 
mmergesort(test_arr)
## [1]  0  3  3  4  4  6  6  8 11

in this step we want to use microbenchmark package of R to benchmark our algorithm.

dfmerget <- microbenchmark(
      n25  = mmergesort(runif(n = 25,min = 1, max = 25)),
      n50  = mmergesort(runif(n = 50,min = 1, max = 50)),
      n75 = mmergesort(runif(n = 75,min = 1, max = 75)),
      n100 = mmergesort(runif(n = 100,min = 1, max = 100)),
      n200 = mmergesort(runif(n = 200,min = 1,max = 200))
      
)

dfmerget
## Unit: microseconds
##  expr      min        lq      mean    median        uq      max neval
##   n25  126.113  130.4550  159.9289  134.4195  170.8565  334.916   100
##   n50  265.818  278.6560  316.7536  283.9420  304.3310  596.202   100
##   n75  417.983  434.4075  531.6464  444.7915  510.1135 2185.441   100
##  n100  575.057  590.9155  678.1216  603.1865  632.4490 2728.025   100
##  n200 1226.007 1267.9180 1458.0309 1290.7620 1379.4940 3239.271   100
Mdfmerge <- dfmerget %>%  group_by(expr) %>%  summarise(median(time/1000)) 
Mdfmerge['ops'] <- c(25,50,75,100,200)
names(Mdfmerge)[2] <- 'median' ; 
ggplot(Mdfmerge,aes(ops,median),size = 1.08) + geom_point(aes(colour = 'data curve'),size=1.1) + geom_line(aes(colour = 'data curve')) + stat_function(fun = nlogn,size = 1.08,aes(colour = 'n*logn curve')) + labs(x = 'length of array',y = 'median of computing time [miliseconds]')

we can see that the curve is similar approximately a good fit to it’s big O-notation.


O(\(n^2\)) | bubble sort

DescriptionBubble sort, sometimes referred to as sinking sort, is a simple sorting algorithm that repeatedly steps through the list, compares adjacent pairs and swaps them if they are in the wrong order. The pass through the list is repeated until the list is sorted. it consists of two main loops which makes its big O equal to O(\(n^2\)).

code part is as fast forward as the above Gif.

bubblesort <- function(arr){
      n <- length(arr)
      for(i in 1:n){
            for(j in 1:(n-1)){
                  if(arr[j] > arr[j + 1]){
                        arr[c(j,j+1)] <- arr[c(j+1,j)]
                  }
            }
      }
      return(arr)
}

again we take a test from our function to see it works or not.

bubblesort(test_arr)
## [1]  0  3  3  4  4  6  6  8 11

last step in this section is doing the simulation. before getting into our simulation we have to remind to ourselves that \(n^2\) is an extreme increase rate which can give us a blue screen if we don’t pay attention to simulation size and array lengths. unlike search algorithms that we even tried \(10^7\) abd yet it was a little number,now we can’t go further 1000!

bst <- microbenchmark(
      n25  = bubblesort(runif(n = 25,min = 1, max = 25)),
      n50  = bubblesort(runif(n = 50,min = 1, max = 50)),
      n75  = bubblesort(runif(n = 75,min = 1, max = 75)),
      n100 = bubblesort(runif(n = 100,min = 1, max = 100)),
      n200 = bubblesort(runif(n = 200,min = 1,max = 200))
      
)
bst
## Unit: microseconds
##  expr      min         lq       mean     median         uq       max neval
##   n25  126.112   157.4520   173.0273   169.9125   183.8825   248.449   100
##   n50  528.992   627.3525   719.1307   660.7680   707.3990  2629.477   100
##   n75 1176.166  1413.4765  1560.3743  1477.4765  1544.4970  3232.852   100
##  n100 2333.076  2529.2290  2786.0411  2652.1315  2808.4510  5098.858   100
##  n200 9650.979 10379.5225 11133.9604 10783.9125 11589.6705 16988.895   100
Mdbst <- bst %>%  group_by(expr) %>%  summarise(median(time/1000)) 
Mdbst['ops'] <- c(25,50,75,100,200)
names(Mdbst)[2] <- 'median'
ggplot(Mdbst,aes(ops,median),size = 1.08) +
geom_point(aes(colour = 'data curve'),size=1.1) +
geom_line(aes(colour = 'data curve')) +
stat_function(fun = n2,size = 1.08,aes(colour = 'n^2')) +
stat_function(fun = n,aes(colour= 'n')) + labs(x = 'length of array',y = 'median of computing time [miliseconds]')

it is abvious that our samples are not worst case and so they won’t fit on \(n^2\). they will be normally somewhere between best case which is a sorted input array which will fit on n and worst case which is \(n^2\).

comparison merge and bubble sort

finally we are going to compare two sort algorithms. in the plot below we can find out how the rates make difference. \(n^2\) is a immense load on system when it gets bigger than 1000 while \(n logn\) stays efficient even with \(10^5\) even larger numbers.

ggplot(Mdbst,aes(ops,median),size = 1.08) + geom_point(aes(colour = 'bubble'),size=1.1) + geom_line(aes(colour = 'bubble')) + stat_function(fun = n2,size = 1.08,aes(colour = 'n^2')) + stat_function(fun = nlogn,aes(colour= 'nlogn'),size = 1.07)+
      geom_line(data = Mdfmerge,aes(x = ops,y = median,colour = 'merge'),size = 1.08) +
      geom_point(data = Mdfmerge,aes(x = ops,y = median,colour = 'merge')) + labs(x = 'length of array',y = 'median of computing time [miliseconds]')