Chapter 11: Searching, Sorting, and Complexity Analysis

Mark Ira Galang

Objectives (1 of 2)

  • 11.1 Measure the performance of an algorithm by obtaining running times and instruction counts with different data sets
  • 11.2 Analyze an algorithm’s performance by determining its order of complexity, using big-O notation
  • 11.3 Distinguish the common orders of complexity and the algorithmic patterns that exhibit them
  • 11.4 Distinguish between the improvements obtained by tweaking an algorithm and reducing its order of complexity
  • 11.5 Design, implement, and analyze search and sort algorithms

Measuring the Efficiency of Algorithms

  • When choosing algorithms, we often have to settle for a space/time tradeoff
  • An algorithm can be designed to gain faster run times at the cost of using extra space (memory), or the other way around
  • Memory is now quite inexpensive for desktop and laptop computers, but not yet for miniature devices

Measuring the Run Time of an Algorithm (1 of 5)

One way to measure the time cost of an algorithm is to use computer’s clock to obtain actual run time:

  • Benchmarking or profiling
  • Can use time() in time module
  • Returns number of seconds that have elapsed between current time on the computer’s clock and January 1, 1970

Measuring the Run Time of an Algorithm (2 of 5)

"""
File: timing1.py
Prints the running times for problem sizes that double,
using a single loop.
"""
import time

problemSize = 10000000
print("%12s%16s" % ("Problem Size", "Seconds"))

for count in range(5):
    start = time.time()
    # The start of the algorithm
    work = 1
    for x in range(problemSize):
        work += 1
        work -= 1
    # The end of the algorithm
    elapsed = time.time() - start
    print("%12d%16.3f" % (problemSize, elapsed))
    problemSize *= 2

Measuring the Run Time of an Algorithm (3 of 5)

Sample output for timing1.py:

Problem Size Seconds
10000000 0.357
20000000 0.716
40000000 1.434
80000000 2.871
160000000 5.748

Note: Actual times will vary based on hardware

Measuring the Run Time of an Algorithm (4 of 5)

for j in range(problemSize):
    for k in range(problemSize):
        work += 1
        work -= 1

Measuring the Run Time of an Algorithm (5 of 5)

Problems: - Different hardware platforms have different processing speeds, so the running times of an algorithm differ from machine to machine - Running time varies with OS and programming language too - It is impractical to determine the running time for some algorithms with very large data sets

Counting Instructions (1 of 5)

Another technique is to count the instructions executed with different problem sizes: - We count the instructions in the high-level code in which the algorithm is written, not instructions in the executable machine language program - Distinguish between: - Instructions that execute the same number of times regardless of problem size - Instructions whose execution count varies with problem size

Counting Instructions (2 of 5)

"""
File: counting.py
Prints the number of iterations for problem sizes
that double, using a nested loop.
"""
problemSize = 1000
print("%12s%15s" % ("Problem Size", "Iterations"))

for count in range(5):
    number = 0
    # The start of the algorithm
    work = 1
    for j in range(problemSize):
        for k in range(problemSize):
            number += 1
            work += 1
            work -= 1
    # The end of the algorithm
    print("%12d%15d" % (problemSize, number))
    problemSize *= 2

Counting Instructions (3 of 5)

Sample output for counting.py:

Problem Size Iterations
1000 1000000
2000 4000000
4000 16000000
8000 64000000
16000 256000000

Note: Iterations = (problem size)²

Counting Instructions (4 of 5)

"""
File: countfib.py
Prints the number of calls of a recursive Fibonacci
function with problem sizes that double.
"""
from counter import Counter

def fib(n, counter = None):
    """Count the number of calls of the Fibonacci function."""
    if counter: 
        counter.increment()
    if n < 3:
        return 1
    else:
        return fib(n - 1, counter) + fib(n - 2, counter)

problemSize = 2
print("%12s%15s" % ("Problem Size", "Calls"))

for count in range(5):
    counter = Counter()
    # The start of the algorithm
    fib(problemSize, counter)
    # The end of the algorithm
    print("%12d%15s" % (problemSize, counter))
    problemSize *= 2

Counting Instructions (5 of 5)

Sample output for countfib.py:

Problem Size Calls
2 1
4 5
8 67
16 1973
32 4356617

Note: Recursive Fibonacci has exponential growth

Complexity Analysis

Complexity Analysis - Reading the algorithm and using pencil and paper to work out some simple algorithms

Orders of Complexity (1 of 4)

  • The performances of these algorithms differ by what we call an order of complexity
  • The performance of the first algorithm is linear in that its work grows in direct proportion to the size of the problem
  • The behavior of the second algorithm is quadratic in that its work grows as a function of the square of the problem size

Orders of Complexity (2 of 4)

  • An algorithm has constant performance if it requires the same number of operations for any problem size
  • List indexing is a good example of a constant-time algorithm
  • Another order of complexity that is better than linear but worse than constant is called logarithmic
  • The amount of work of a logarithmic algorithm is proportional to the log₂ of the problem size
  • The work of a polynomial time algorithm grows at a rate of nᵏ where k is a constant greater than 1
  • An order of complexity that is worse than polynomial is called exponential

Orders of Complexity (3 of 4)

Growth Rates Chart:

Operations
    ↑
    |                                    exponential (2ⁿ)
    |                                        ↗
    |                                   ↗  quadratic (n²)
    |                              ↗
    |                         ↗  n log n
    |                    ↗
    |               ↗  linear (n)
    |          ↗
    |     ↗  logarithmic (log n)
    |  ↗
    |→  constant (1)
    +--------------------------------→ Problem Size (n)

Orders of Complexity (4 of 4)

Common Orders of Complexity Table:

Order of Complexity Name Example Algorithm
O(1) Constant List indexing
O(log n) Logarithmic Binary search
O(n) Linear Sequential search
O(n log n) Linearithmic Quicksort, Merge sort
O(n²) Quadratic Selection sort, Bubble sort
O(2ⁿ) Exponential Recursive Fibonacci

Big-O Notation

  • The amount of work in an algorithm typically is the sum of several terms in a polynomial
  • We focus on one term as dominant
  • As n becomes large, the dominant term becomes so large that the amount of work represented by the other terms can be ignored
  • Asymptotic analysis
  • Big-O notation: used to express the efficiency or computational complexity of an algorithm

The Role of the Constant of Proportionality

The constant of proportionality involves the terms and coefficients that are usually ignored during big-O analysis.

work = 1
for x in range(problemSize):
    work += 1
    work -= 1

Measuring the Memory Used by an Algorithm

  • A complete analysis of the resources used by an algorithm includes the amount of memory required
  • We focus on rates of potential growth
  • Some algorithms require the same amount of memory to solve any problem
  • Other algorithms require more memory as the problem size gets larger

Search Algorithms

  • We now present several algorithms that can be used for searching and sorting lists
  • We first discuss the design of an algorithm
  • Then show its implementation as a Python function
  • Finally, provide an analysis of the algorithm’s computational complexity
  • To keep things simple, each function processes a list of integers

Search for a Minimum

Python’s min function returns the minimum or smallest item in a list.

def ourMin(lyst):
    """Returns the position of the minimum item."""
    minpos = 0
    current = 1
    while current < len(lyst):
        if lyst[current] < lyst[minpos]:
            minpos = current
        current += 1
    return minpos
  • n – 1 comparisons for a list of size n
  • O(n)

Sequential Search of a List

Python’s in operator is implemented as a method named __contains__ in the list class. Uses a sequential search or a linear search.

def sequentialSearch(target, lyst):
    """Returns the position of the target item if found,
    or -1 otherwise."""
    position = 0
    while position < len(lyst):
        if target == lyst[position]:
            return position
        position += 1
    return -1

Best-Case, Worst-Case, and Average-Case Performance

Analysis of a linear search considers three cases:

  • Worst case: The target item is at the end of the list or not in the list at all → O(n)
  • Best case: The algorithm finds the target at the first position, after making one iteration → O(1)
  • Average case: Add number of iterations required to find target at each possible position; divide sum by n → O(n)

Binary Search of a List (1 of 2)

A linear search is necessary for data that are not arranged in any particular order. When searching sorted data, use a binary search.

def binarySearch(target, lyst):
    """Returns the position of the target item if found,
    or -1 otherwise."""
    left = 0
    right = len(lyst) - 1
    while left <= right:
        midpoint = (left + right) // 2
        if target == lyst[midpoint]:
            return midpoint
        elif target < lyst[midpoint]:
            right = midpoint - 1  # Search to left
        else:
            left = midpoint + 1   # Search to right
    return -1

Binary Search of a List (2 of 2)

Binary Search Visualization:

Sorted list: [2, 5, 8, 12, 16, 23, 38, 45, 56, 72]
Searching for target = 23

Step 1: [2, 5, 8, 12, 16, 23, 38, 45, 56, 72]
                           ↑
                        midpoint = 16 (too low)

Step 2: [23, 38, 45, 56, 72]
              ↑
         midpoint = 38 (too high)

Step 3: [23]
          ↑
      midpoint = 23 (found!)

## Basic Sort Algorithms

The sort functions that we develop here operate on a list of integers and uses a **swap** function to exchange the positions of two items in the list.

```python
def swap(lyst, i, j):
    """Exchanges the items at positions i and j."""
    # You could say lyst[i], lyst[j] = lyst[j], lyst[i]
    # but the following code shows what is really going on
    temp = lyst[i]
    lyst[i] = lyst[j]
    lyst[j] = temp

Selection Sort (1 of 2)

Selection Sort Visualization:

Initial: [64, 25, 12, 22, 11]

Step 1 - Find minimum (11): [11, 25, 12, 22, 64]
Step 2 - Find minimum (12): [11, 12, 25, 22, 64]
Step 3 - Find minimum (22): [11, 12, 22, 25, 64]
Step 4 - Find minimum (25): [11, 12, 22, 25, 64]

Sorted! ✓

Perhaps the simplest strategy is to search the entire list for the position of the smallest item. If that position does not equal the first position, the algorithm swaps the items at those positions.

Selection Sort (2 of 2)

def selectionSort(lyst):
    """Sorts the items in lyst in ascending order."""
    i = 0
    while i < len(lyst) - 1:  # Do n - 1 searches
        minIndex = i           # for the smallest item
        j = i + 1
        while j < len(lyst):   # Start a search
            if lyst[j] < lyst[minIndex]:
                minIndex = j
            j += 1
        if minIndex != i:      # Swap if necessary
            swap(lyst, minIndex, i)
        i += 1

Bubble Sort (1 of 2)

Bubble Sort Visualization:

Initial: [64, 34, 25, 12, 22, 11, 90]

Pass 1: [34, 25, 12, 22, 11, 64, 90]  ← 90 bubbles up
Pass 2: [25, 12, 22, 11, 34, 64, 90]  ← 64 bubbles up
Pass 3: [12, 22, 11, 25, 34, 64, 90]  ← 34 bubbles up
Pass 4: [12, 11, 22, 25, 34, 64, 90]  ← 25 bubbles up
Pass 5: [11, 12, 22, 25, 34, 64, 90]  ← 22 bubbles up
Pass 6: [11, 12, 22, 25, 34, 64, 90]  ← No swaps, done!

Starts at beginning of list and compares pairs of data items as it moves down to the end. When items in pair are out of order, swap them.

Bubble Sort (2 of 2)

def bubbleSort(lyst):
    """Sorts the items in lyst in ascending order."""
    n = len(lyst)
    while n > 1:  # Do n - 1 bubbles
        i = 1     # Start each bubble
        while i < n:
            if lyst[i] < lyst[i - 1]:  # Exchange if needed
                swap(lyst, i, i - 1)
            i += 1
        n -= 1

Insertion Sort (1 of 2)

def insertionSort(lyst):
    """Sorts the items in lyst in ascending order."""
    i = 1
    while i < len(lyst):
        itemToInsert = lyst[i]
        j = i - 1
        while j >= 0:
            if itemToInsert < lyst[j]:
                lyst[j + 1] = lyst[j]
                j -= 1
            else:
                break
        lyst[j + 1] = itemToInsert
        i += 1

Insertion Sort (2 of 2)

  • The more items in the list that are in order, the better insertion sort gets until, in the best case of a sorted list, the sort’s behavior is linear
  • In the average case, insertion sort is still quadratic

Best-Case, Worst-Case, and Average-Case Performance Revisited

Thorough analysis of an algorithm’s complexity divides its behavior into three types of cases: - Best case - Worst case - Average case

There are algorithms whose best-case and average-case performances are similar, but whose performance can degrade to a worst case. When choosing/developing an algorithm, it is important to be aware of these distinctions.

Faster Sorting

More efficient sorting algorithms achieve O(n log n) performance: - Quicksort - Merge sort - Heapsort (not covered in this chapter)

Quicksort (1 of 4)

An outline of the strategy used in the quicksort algorithm:

  1. Begin by selecting the item at the list’s midpoint. We call this item the pivot.
  2. Partition items in the list so that all items less than the pivot are moved to the left of the pivot, and the rest are moved to its right.
  3. Divide and conquer. Reapply the process recursively to the sublists formed by splitting the list at the pivot.
  4. The process terminates each time it encounters a sublist with fewer than two items.

Quicksort (2 of 4)

Partitioning 1. Swap the pivot with the last item in the sublist. 2. Establish a boundary between the items known to be less than the pivot and the rest of the items 3. Starting with the first item in the sublist, scan across the sublist. Every time an item less than the pivot is encountered, swap it with the first item after the boundary and advance the boundary. 4. Finish by swapping the pivot with the first item after the boundary.

Quicksort (3 of 4)

Complexity Analysis of Quicksort

Case Complexity
Best case O(n log n)
Average case O(n log n)
Worst case O(n²)

Worst case occurs when the pivot is always the smallest or largest item (e.g., already sorted list with poor pivot selection)

Quicksort (4 of 4)

Implementation of Quicksort

The quicksort algorithm is most easily coded using a recursive approach.

def quicksort(lyst, left, right):
    """Sorts the items in lyst using quicksort."""
    if left < right:
        pivotPosition = partition(lyst, left, right)
        quicksort(lyst, left, pivotPosition - 1)
        quicksort(lyst, pivotPosition + 1, right)

def partition(lyst, left, right):
    """Partitions the list and returns the pivot position."""
    pivot = lyst[left]
    boundary = left
    for index in range(left + 1, right + 1):
        if lyst[index] < pivot:
            boundary += 1
            swap(lyst, boundary, index)
    swap(lyst, left, boundary)
    return boundary

Merge Sort (1 of 5)

Informal summary of the algorithm: - Compute the middle position of a list and recursively sort its left and right sublists (divide and conquer) - Merge the two sorted sublists back into a single sorted list - Stop the process when sublists can no longer be subdivided

Three Python functions collaborate: - mergeSort — The function called by users - mergeSortHelper — A helper function that hides the extra parameters required by recursive calls - merge — A function that implements the merging process

Merge Sort (2 of 5)

Implementing the Merging Process

The merging process uses a temporary list of the same size as the list being sorted. We call this list the copyBuffer.

To avoid the overhead of allocating and deallocating the copyBuffer each time merge is called, the buffer is allocated once in mergeSort and passed as an argument to mergeSortHelper and merge.

Merge Sort (3 of 5)

Merge Sort Visualization:

Initial: [38, 27, 43, 3, 9, 82, 10]

Split:   [38, 27, 43, 3]    [9, 82, 10]
Split:   [38, 27] [43, 3]   [9, 82] [10]
Split:   [38] [27] [43] [3] [9] [82] [10]

Merge:   [27, 38] [3, 43]   [9, 82] [10]
Merge:   [3, 27, 38, 43]    [9, 10, 82]
Merge:   [3, 9, 10, 27, 38, 43, 82]

Sorted! ✓

Merge Sort (4 of 5)

Implementation of Merge Sort:

def mergeSort(lyst):
    """Sorts the items in lyst using merge sort."""
    copyBuffer = [None] * len(lyst)
    mergeSortHelper(lyst, copyBuffer, 0, len(lyst) - 1)

def mergeSortHelper(lyst, copyBuffer, low, high):
    """Recursively splits and merges the list."""
    if low < high:
        middle = (low + high) // 2
        mergeSortHelper(lyst, copyBuffer, low, middle)
        mergeSortHelper(lyst, copyBuffer, middle + 1, high)
        merge(lyst, copyBuffer, low, middle, high)

def merge(lyst, copyBuffer, low, middle, high):
    """Merges two sorted sublists into one."""
    i = low
    j = middle + 1
    for k in range(low, high + 1):
        if i > middle:
            copyBuffer[k] = lyst[j]
            j += 1
        elif j > high:
            copyBuffer[k] = lyst[i]
            i += 1
        elif lyst[i] < lyst[j]:
            copyBuffer[k] = lyst[i]
            i += 1
        else:
            copyBuffer[k] = lyst[j]
            j += 1
    for k in range(low, high + 1):
        lyst[k] = copyBuffer[k]

Merge Sort (5 of 5)

Complexity Analysis of Merge Sort

  • The running time of the merge function is dominated by the two for statements
  • Each of which loops (high - low + 1) times
  • The function’s running time is O(high - low), and all the merges at a single level take O(n) time
  • Because mergeSortHelper splits sublists as evenly as possible at each level, the number of levels is O(log n)
  • The running time for this function is O(n log n) in all cases

An Exponential Algorithm: Recursive Fibonacci (1 of 2)

def fib(n):
    """Returns the nth Fibonacci number."""
    if n < 3:
        return 1
    else:
        return fib(n - 1) + fib(n - 2)

An Exponential Algorithm: Recursive Fibonacci (2 of 2)

Memoization

Exponential algorithms are generally impractical to run with any but very small problem sizes. Recursive functions that are called repeatedly with same arguments can be made more efficient by a technique called memoization.

The program maintains a table of the values for each argument used with the function. Before the function recursively computes a value for a given argument, it checks the table to see if that argument already has a value.

def fibMemo(n, table=None):
    """Returns the nth Fibonacci number using memoization."""
    if table is None:
        table = {1: 1, 2: 1}
    if n not in table:
        table[n] = fibMemo(n - 1, table) + fibMemo(n - 2, table)
    return table[n]

Converting Fibonacci to a Linear Algorithm (1 of 3)

Pseudocode:

Set sum to 1
Set first to 1
Set second to 1
Set count to 3
While count <= N
    Set sum to first + second
    Set first to second
    Set second to sum
    Increment count

Converting Fibonacci to a Linear Algorithm (2 of 3)

def fib(n, counter = None):
    """Count the number of iterations in the Fibonacci function."""
    theSum = 1
    first = 1
    second = 1
    count = 3
    while count <= n:
        if counter: 
            counter.increment()
        theSum = first + second
        first = second
        second = theSum
        count += 1
    return theSum

Converting Fibonacci to a Linear Algorithm (3 of 3)

Comparison of Fibonacci Implementations:

n (Problem Size) Recursive Calls (Exponential) Linear Iterations
10 109 8
20 13,529 18
30 1,664,079 28
40 204,668,309 38

Linear algorithm is dramatically faster for large values of n!

Chapter Summary (1 of 2)

  • Thread synchronization problems can occur when two or more threads share data
  • Each computer on a network has a unique IP address that allows other computers to locate it
  • Servers and clients communicate on a network by sending bytes through their socket connections
  • A server can handle several clients concurrently by assigning each client request to a separate handler thread

Chapter Summary (2 of 2)

  • A class variable is a name for a value that all instances of a class share in common
  • Pickling is the process of converting an object to a form that can be saved to permanent file storage
  • try-except statement is used to catch and handle exceptions
  • Most important features of OO programming: encapsulation, inheritance, and polymorphism
  • Encapsulation restricts access to an object’s data to users of the methods of its class
  • Inheritance allows one class to pick up the attributes and behavior of another class for free
  • Polymorphism allows methods in several different classes to have the same headers
  • A data model is a set of classes that are responsible for managing the data of a program