We demonstrate two search algorithms (e.g. finding the position of an element from a data structure - in this case a list), the Simple Search and the Binary Search. The relationship between their time complexity and the size of the data structure, is also examined. Through simulations, we see that on average, the Simple Search runs in linear time (\(O(n)\)) and that the Binary Search runs in log time (\(O(\log(n))\)).

The search algorithms are written in Python and then passed to R for plotting.

Helper functions

import random
def sorted_list(n):
    """ Returns a sorted list of n unique numbers """
    list_out = [_i for _i in range(n)]
    return list_out
def random_list(n):
    """ Returns a randomised list of n unique numbers """
    list_out = sorted_list(n)
    random.shuffle(list_out)
    return list_out

Simple Search

The Simple Search goes through each element in the data structure one by one until the element being searched for is found. The average search time is \(O(n)\), which we show later with simulations.

def simple_search(list_to_search, search_item):
    """ Returns the index and number of steps of the search_item in list_to_search using Simple Search """
    # search across the list one by one, until the search_item is found
    search = 0
    steps = 1
    
    while list_to_search[search] != search_item and search < len(list_to_search) - 1:
        search += 1
        steps += 1
    
    if list_to_search[search] == search_item:
        return search, steps
    else:
        return "search_item not found."

Testing the Simple Search:

test_list = random_list(10)
test_search_item = random.randrange(1, 10)
print("test_list is", test_list)

## ('test_list is', [2, 4, 3, 1, 0, 8, 7, 5, 6, 9])

print("test_search_item is", test_search_item)

## ('test_search_item is', 5)

print(simple_search(test_list, test_search_item))

## (7, 8)

print(simple_search(test_list, 100))

## search_item not found.

Simulating Simple Searches for various list lengths:

# find out how long on average it takes to perform a simple search for various
# list lengths
avg_searches_simple = []
for j in range(1, 1001):
    searches_n = []
    for i in range(100):
        #simulate search
        list_to_search = random_list(j)
        search_item = random.randrange(j)
        steps = simple_search(list_to_search, search_item)[1]
        
        searches_n.append(float(steps))
    
    avg_searches = sum(searches_n) / len(searches_n)
    avg_searches_simple.append(avg_searches)
    
    if (j + 1) % 50 == 0:
        print("For n = " + str(j + 1) + ", the average search steps is ", avg_searches)
# we can clearly see the number of steps increases in O(n) time

## ('For n = 50, the average search steps is ', 27.8)
## ('For n = 100, the average search steps is ', 48.79)
## ('For n = 150, the average search steps is ', 71.1)
## ('For n = 200, the average search steps is ', 98.54)
## ('For n = 250, the average search steps is ', 125.14)
## ('For n = 300, the average search steps is ', 124.34)
## ('For n = 350, the average search steps is ', 161.39)
## ('For n = 400, the average search steps is ', 202.5)
## ('For n = 450, the average search steps is ', 240.3)
## ('For n = 500, the average search steps is ', 254.02)
## ('For n = 550, the average search steps is ', 283.21)
## ('For n = 600, the average search steps is ', 296.46)
## ('For n = 650, the average search steps is ', 349.53)
## ('For n = 700, the average search steps is ', 319.77)
## ('For n = 750, the average search steps is ', 344.99)
## ('For n = 800, the average search steps is ', 406.62)
## ('For n = 850, the average search steps is ', 413.04)
## ('For n = 900, the average search steps is ', 459.51)
## ('For n = 950, the average search steps is ', 512.38)
## ('For n = 1000, the average search steps is ', 524.37)

Binary Search

The Binary Search works on ordered (e.g. sorted) data structures. The Binary Search algorithm iteratively checks the midpoint of what is the known upper and lower bounds of the item being searched for, updating the upper and lower points and taking a new midpoint as the next search.

Binary Search algorithm:

While the search floor is less than or equal to the search ceiling:

Take the midpoint of the lower and upper bounds (the lower and upper bounds start off as the smallest and largest elements of the data structure).
If the element at the midpoint is lower (higher) than what you are searching for, update the lower (upper) bound to the midpoint plus (less) one. If the element is what you are searching for, we are done - return the index.

def binary_search(list_to_search, search_item):
    """ returns index of search_item in list_to_search. list_to_search must be sorted """
    # initial search is the rounded down midpoint
    search_floor = 0
    search_ceil = len(list_to_search) - 1
    search = (search_ceil + search_floor) // 2
    steps = 1
    # search across the list until the search_item is found
    while search_floor <= search_ceil:
        search = (search_ceil + search_floor) // 2
        steps += 1
        # print("")
        # print("floor is", search_floor)
        # print("ceil is", search_ceil)
        # print("search is", search)
        # if search search_item is too high, raise the search floor
        if list_to_search[search] < search_item:
            # print("raise floor")
            search_floor = search + 1
        # if search search_item is too low, lower the search ceiling
        elif list_to_search[search] > search_item:
            # print("lower ceil")
            search_ceil = search - 1
        else:
            # print("search item found!")
            return search, steps
    return "search_item not found."

Testing the Binary Search:

test_list = sorted_list(6)
test_search_item = random.randrange(1, 6)
print("test_list is", test_list)

## ('test_list is', [0, 1, 2, 3, 4, 5])

print("test_search_item is", test_search_item)

## ('test_search_item is', 3)

print(binary_search(test_list, test_search_item))

## (3, 4)

print("NO SUCH SEARCH ITEM", 100)

## ('NO SUCH SEARCH ITEM', 100)

print(binary_search(test_list, 100))

## search_item not found.

Simulating Binary Searches for various list lengths:

# find out how long on average it takes to perform a binary search for various
# list lengths
avg_searches_binary = []
for j in range(1, 1001):
    searches_n = []
    for i in range(1000):
        #simulate search
        list_to_search = sorted_list(j)
        search_item = random.randrange(j)
        steps = binary_search(list_to_search, search_item)[1]
        
        searches_n.append(float(steps))
    
    avg_searches = sum(searches_n) / len(searches_n)
    avg_searches_binary.append(avg_searches)
    
    if (j + 1) % 64 == 0:
        print("For n = " + str(j + 1) + ", the average search steps is ", avg_searches)
# we can clearly see the number of steps increases in O(n) time

## ('For n = 64, the average search steps is ', 6.091)
## ('For n = 128, the average search steps is ', 7.155)
## ('For n = 192, the average search steps is ', 7.823)
## ('For n = 256, the average search steps is ', 8.0)
## ('For n = 320, the average search steps is ', 8.437)
## ('For n = 384, the average search steps is ', 8.726)
## ('For n = 448, the average search steps is ', 8.867)
## ('For n = 512, the average search steps is ', 9.058)
## ('For n = 576, the average search steps is ', 9.267)
## ('For n = 640, the average search steps is ', 9.443)
## ('For n = 704, the average search steps is ', 9.577)
## ('For n = 768, the average search steps is ', 9.602)
## ('For n = 832, the average search steps is ', 9.791)
## ('For n = 896, the average search steps is ', 9.896)
## ('For n = 960, the average search steps is ', 9.995)

Comparing run times

We can see by comparing the average simulated run times of the simple and binary search, that they run in linear and log time, respectively.

# passing list from python to R
avg_searches_simple <- py$avg_searches_simple
avg_searches_binary <- py$avg_searches_binary


n = 1:(length(avg_searches_simple))
linear <- n / 2
log_2 <- log(x = n, base = 2)

par(mfrow = c(1,2))
plot(n, avg_searches_simple, main = "Simple Search", ylab = "Average search steps", xlab = "Number of elements")
lines(linear, col = "red")
plot(n, avg_searches_binary, main = "Binary Search", ylab = "Average search steps", xlab = "Number of elements")
lines(log_2, col = "red")

Simple vs binary searches

Andrew Lau

05/01/2019

Helper functions

Simple Search

Binary Search

Comparing run times