Sorting Algorithm Runtime Analysis

What is Runtime Complexity?

When we analyze sorting algorithms, we care about how their execution time scales with input size \(n\).

We express this using Big-O notation, which describes the worst-case growth rate:

\(O(n^2)\) — quadratic (Bubble Sort, Insertion Sort)
\(O(n \log n)\) — log-linear (Merge Sort, Quick Sort)
\(O(n)\) — linear (Counting Sort, special cases)

The goal: find algorithms that stay fast as \(n\) grows large.

The Math Behind the Runtimes

Bubble Sort performs nested comparisons:

\[T(n) = \sum_{i=1}^{n-1} \sum_{j=1}^{n-i} 1 = \frac{n(n-1)}{2} \approx O(n^2)\]

Merge Sort uses divide-and-conquer recursion:

\[T(n) = 2T\!\left(\frac{n}{2}\right) + O(n) \implies T(n) = O(n \log n)\]

By the Master Theorem with \(a=2\), \(b=2\), \(f(n)=n\): \[\log_b a = \log_2 2 = 1 \implies f(n) = \Theta(n^1) \implies T(n) = \Theta(n \log n)\]

Empirical Timing Data

ggplot: Empirical Runtime Comparison

ggplot: Theoretical Growth Curves

3D Surface: O(n²) vs O(n log n)

R Code: Timing Bubble Sort

bubble_sort <- function(x) {
  n <- length(x)
  for (i in 1:(n - 1)) {
    for (j in 1:(n - i)) {
      if (x[j] > x[j + 1]) {
        tmp <- x[j]; x[j] <- x[j + 1]; x[j + 1] <- tmp
      }
    }
  }
  x
}

# Time it on a random vector of size 500
data_vec <- sample(1:10000, 500)
system.time(bubble_sort(data_vec))

Statistical Regression on Runtime

We can model runtime as a function of \(n\) using linear regression on transformed data.

For Bubble Sort we expect: \(\log T \approx 2 \log n + c\)

Summary

Algorithm	Best Case	Average Case	Worst Case	Space
Bubble Sort	\(O(n)\)	\(O(n^2)\)	\(O(n^2)\)	\(O(1)\)
Insertion Sort	\(O(n)\)	\(O(n^2)\)	\(O(n^2)\)	\(O(1)\)
Merge Sort	\(O(n \log n)\)	\(O(n \log n)\)	\(O(n \log n)\)	\(O(n)\)
Quick Sort	\(O(n \log n)\)	\(O(n \log n)\)	\(O(n^2)\)	\(O(\log n)\)

Key takeaway: For large \(n\), the difference between \(O(n^2)\) and \(O(n \log n)\) is enormous — and both theory and empirical data confirm it.