One way to measure the time cost of an algorithm is to use computer’s clock to obtain actual run time:
time() in time module"""
File: timing1.py
Prints the running times for problem sizes that double,
using a single loop.
"""
import time
problemSize = 10000000
print("%12s%16s" % ("Problem Size", "Seconds"))
for count in range(5):
start = time.time()
# The start of the algorithm
work = 1
for x in range(problemSize):
work += 1
work -= 1
# The end of the algorithm
elapsed = time.time() - start
print("%12d%16.3f" % (problemSize, elapsed))
problemSize *= 2Sample output for timing1.py:
| Problem Size | Seconds |
|---|---|
| 10000000 | 0.357 |
| 20000000 | 0.716 |
| 40000000 | 1.434 |
| 80000000 | 2.871 |
| 160000000 | 5.748 |
Note: Actual times will vary based on hardware
Problems: - Different hardware platforms have different processing speeds, so the running times of an algorithm differ from machine to machine - Running time varies with OS and programming language too - It is impractical to determine the running time for some algorithms with very large data sets
Another technique is to count the instructions executed with different problem sizes: - We count the instructions in the high-level code in which the algorithm is written, not instructions in the executable machine language program - Distinguish between: - Instructions that execute the same number of times regardless of problem size - Instructions whose execution count varies with problem size
"""
File: counting.py
Prints the number of iterations for problem sizes
that double, using a nested loop.
"""
problemSize = 1000
print("%12s%15s" % ("Problem Size", "Iterations"))
for count in range(5):
number = 0
# The start of the algorithm
work = 1
for j in range(problemSize):
for k in range(problemSize):
number += 1
work += 1
work -= 1
# The end of the algorithm
print("%12d%15d" % (problemSize, number))
problemSize *= 2Sample output for counting.py:
| Problem Size | Iterations |
|---|---|
| 1000 | 1000000 |
| 2000 | 4000000 |
| 4000 | 16000000 |
| 8000 | 64000000 |
| 16000 | 256000000 |
Note: Iterations = (problem size)²
"""
File: countfib.py
Prints the number of calls of a recursive Fibonacci
function with problem sizes that double.
"""
from counter import Counter
def fib(n, counter = None):
"""Count the number of calls of the Fibonacci function."""
if counter:
counter.increment()
if n < 3:
return 1
else:
return fib(n - 1, counter) + fib(n - 2, counter)
problemSize = 2
print("%12s%15s" % ("Problem Size", "Calls"))
for count in range(5):
counter = Counter()
# The start of the algorithm
fib(problemSize, counter)
# The end of the algorithm
print("%12d%15s" % (problemSize, counter))
problemSize *= 2Sample output for countfib.py:
| Problem Size | Calls |
|---|---|
| 2 | 1 |
| 4 | 5 |
| 8 | 67 |
| 16 | 1973 |
| 32 | 4356617 |
Note: Recursive Fibonacci has exponential growth
Complexity Analysis - Reading the algorithm and using pencil and paper to work out some simple algorithms
Growth Rates Chart:
Operations
↑
| exponential (2ⁿ)
| ↗
| ↗ quadratic (n²)
| ↗
| ↗ n log n
| ↗
| ↗ linear (n)
| ↗
| ↗ logarithmic (log n)
| ↗
|→ constant (1)
+--------------------------------→ Problem Size (n)
Common Orders of Complexity Table:
| Order of Complexity | Name | Example Algorithm |
|---|---|---|
| O(1) | Constant | List indexing |
| O(log n) | Logarithmic | Binary search |
| O(n) | Linear | Sequential search |
| O(n log n) | Linearithmic | Quicksort, Merge sort |
| O(n²) | Quadratic | Selection sort, Bubble sort |
| O(2ⁿ) | Exponential | Recursive Fibonacci |
The constant of proportionality involves the terms and coefficients that are usually ignored during big-O analysis.
Python’s min function returns the minimum or smallest item in a list.
Python’s in operator is implemented as a method named __contains__ in the list class. Uses a sequential search or a linear search.
Analysis of a linear search considers three cases:
A linear search is necessary for data that are not arranged in any particular order. When searching sorted data, use a binary search.
def binarySearch(target, lyst):
"""Returns the position of the target item if found,
or -1 otherwise."""
left = 0
right = len(lyst) - 1
while left <= right:
midpoint = (left + right) // 2
if target == lyst[midpoint]:
return midpoint
elif target < lyst[midpoint]:
right = midpoint - 1 # Search to left
else:
left = midpoint + 1 # Search to right
return -1Binary Search Visualization:
Sorted list: [2, 5, 8, 12, 16, 23, 38, 45, 56, 72]
Searching for target = 23
Step 1: [2, 5, 8, 12, 16, 23, 38, 45, 56, 72]
↑
midpoint = 16 (too low)
Step 2: [23, 38, 45, 56, 72]
↑
midpoint = 38 (too high)
Step 3: [23]
↑
midpoint = 23 (found!)
## Basic Sort Algorithms
The sort functions that we develop here operate on a list of integers and uses a **swap** function to exchange the positions of two items in the list.
```python
def swap(lyst, i, j):
"""Exchanges the items at positions i and j."""
# You could say lyst[i], lyst[j] = lyst[j], lyst[i]
# but the following code shows what is really going on
temp = lyst[i]
lyst[i] = lyst[j]
lyst[j] = temp
Selection Sort Visualization:
Initial: [64, 25, 12, 22, 11]
Step 1 - Find minimum (11): [11, 25, 12, 22, 64]
Step 2 - Find minimum (12): [11, 12, 25, 22, 64]
Step 3 - Find minimum (22): [11, 12, 22, 25, 64]
Step 4 - Find minimum (25): [11, 12, 22, 25, 64]
Sorted! ✓
Perhaps the simplest strategy is to search the entire list for the position of the smallest item. If that position does not equal the first position, the algorithm swaps the items at those positions.
def selectionSort(lyst):
"""Sorts the items in lyst in ascending order."""
i = 0
while i < len(lyst) - 1: # Do n - 1 searches
minIndex = i # for the smallest item
j = i + 1
while j < len(lyst): # Start a search
if lyst[j] < lyst[minIndex]:
minIndex = j
j += 1
if minIndex != i: # Swap if necessary
swap(lyst, minIndex, i)
i += 1Bubble Sort Visualization:
Initial: [64, 34, 25, 12, 22, 11, 90]
Pass 1: [34, 25, 12, 22, 11, 64, 90] ← 90 bubbles up
Pass 2: [25, 12, 22, 11, 34, 64, 90] ← 64 bubbles up
Pass 3: [12, 22, 11, 25, 34, 64, 90] ← 34 bubbles up
Pass 4: [12, 11, 22, 25, 34, 64, 90] ← 25 bubbles up
Pass 5: [11, 12, 22, 25, 34, 64, 90] ← 22 bubbles up
Pass 6: [11, 12, 22, 25, 34, 64, 90] ← No swaps, done!
Starts at beginning of list and compares pairs of data items as it moves down to the end. When items in pair are out of order, swap them.
Thorough analysis of an algorithm’s complexity divides its behavior into three types of cases: - Best case - Worst case - Average case
There are algorithms whose best-case and average-case performances are similar, but whose performance can degrade to a worst case. When choosing/developing an algorithm, it is important to be aware of these distinctions.
More efficient sorting algorithms achieve O(n log n) performance: - Quicksort - Merge sort - Heapsort (not covered in this chapter)
An outline of the strategy used in the quicksort algorithm:
Partitioning 1. Swap the pivot with the last item in the sublist. 2. Establish a boundary between the items known to be less than the pivot and the rest of the items 3. Starting with the first item in the sublist, scan across the sublist. Every time an item less than the pivot is encountered, swap it with the first item after the boundary and advance the boundary. 4. Finish by swapping the pivot with the first item after the boundary.
Complexity Analysis of Quicksort
| Case | Complexity |
|---|---|
| Best case | O(n log n) |
| Average case | O(n log n) |
| Worst case | O(n²) |
Worst case occurs when the pivot is always the smallest or largest item (e.g., already sorted list with poor pivot selection)
Implementation of Quicksort
The quicksort algorithm is most easily coded using a recursive approach.
def quicksort(lyst, left, right):
"""Sorts the items in lyst using quicksort."""
if left < right:
pivotPosition = partition(lyst, left, right)
quicksort(lyst, left, pivotPosition - 1)
quicksort(lyst, pivotPosition + 1, right)
def partition(lyst, left, right):
"""Partitions the list and returns the pivot position."""
pivot = lyst[left]
boundary = left
for index in range(left + 1, right + 1):
if lyst[index] < pivot:
boundary += 1
swap(lyst, boundary, index)
swap(lyst, left, boundary)
return boundaryInformal summary of the algorithm: - Compute the middle position of a list and recursively sort its left and right sublists (divide and conquer) - Merge the two sorted sublists back into a single sorted list - Stop the process when sublists can no longer be subdivided
Three Python functions collaborate: - mergeSort — The function called by users - mergeSortHelper — A helper function that hides the extra parameters required by recursive calls - merge — A function that implements the merging process
Implementing the Merging Process
The merging process uses a temporary list of the same size as the list being sorted. We call this list the copyBuffer.
To avoid the overhead of allocating and deallocating the copyBuffer each time merge is called, the buffer is allocated once in mergeSort and passed as an argument to mergeSortHelper and merge.
Merge Sort Visualization:
Initial: [38, 27, 43, 3, 9, 82, 10]
Split: [38, 27, 43, 3] [9, 82, 10]
Split: [38, 27] [43, 3] [9, 82] [10]
Split: [38] [27] [43] [3] [9] [82] [10]
Merge: [27, 38] [3, 43] [9, 82] [10]
Merge: [3, 27, 38, 43] [9, 10, 82]
Merge: [3, 9, 10, 27, 38, 43, 82]
Sorted! ✓
Implementation of Merge Sort:
def mergeSort(lyst):
"""Sorts the items in lyst using merge sort."""
copyBuffer = [None] * len(lyst)
mergeSortHelper(lyst, copyBuffer, 0, len(lyst) - 1)
def mergeSortHelper(lyst, copyBuffer, low, high):
"""Recursively splits and merges the list."""
if low < high:
middle = (low + high) // 2
mergeSortHelper(lyst, copyBuffer, low, middle)
mergeSortHelper(lyst, copyBuffer, middle + 1, high)
merge(lyst, copyBuffer, low, middle, high)
def merge(lyst, copyBuffer, low, middle, high):
"""Merges two sorted sublists into one."""
i = low
j = middle + 1
for k in range(low, high + 1):
if i > middle:
copyBuffer[k] = lyst[j]
j += 1
elif j > high:
copyBuffer[k] = lyst[i]
i += 1
elif lyst[i] < lyst[j]:
copyBuffer[k] = lyst[i]
i += 1
else:
copyBuffer[k] = lyst[j]
j += 1
for k in range(low, high + 1):
lyst[k] = copyBuffer[k]Complexity Analysis of Merge Sort
for statements(high - low + 1) timeshigh - low), and all the merges at a single level take O(n) timemergeSortHelper splits sublists as evenly as possible at each level, the number of levels is O(log n)Memoization
Exponential algorithms are generally impractical to run with any but very small problem sizes. Recursive functions that are called repeatedly with same arguments can be made more efficient by a technique called memoization.
The program maintains a table of the values for each argument used with the function. Before the function recursively computes a value for a given argument, it checks the table to see if that argument already has a value.
Pseudocode:
Set sum to 1
Set first to 1
Set second to 1
Set count to 3
While count <= N
Set sum to first + second
Set first to second
Set second to sum
Increment count
Comparison of Fibonacci Implementations:
| n (Problem Size) | Recursive Calls (Exponential) | Linear Iterations |
|---|---|---|
| 10 | 109 | 8 |
| 20 | 13,529 | 18 |
| 30 | 1,664,079 | 28 |
| 40 | 204,668,309 | 38 |
Linear algorithm is dramatically faster for large values of n!
try-except statement is used to catch and handle exceptions