We demonstrate two search algorithms (e.g. finding the position of an element from a data structure - in this case a list), the Simple Search and the Binary Search. The relationship between their time complexity and the size of the data structure, is also examined. Through simulations, we see that on average, the Simple Search runs in linear time (\(O(n)\)) and that the Binary Search runs in log time (\(O(\log(n))\)).
The search algorithms are written in Python and then passed to R for plotting.
import random
def sorted_list(n):
""" Returns a sorted list of n unique numbers """
list_out = [_i for _i in range(n)]
return list_out
def random_list(n):
""" Returns a randomised list of n unique numbers """
list_out = sorted_list(n)
random.shuffle(list_out)
return list_out
The Simple Search goes through each element in the data structure one by one until the element being searched for is found. The average search time is \(O(n)\), which we show later with simulations.
def simple_search(list_to_search, search_item):
""" Returns the index and number of steps of the search_item in list_to_search using Simple Search """
# search across the list one by one, until the search_item is found
search = 0
steps = 1
while list_to_search[search] != search_item and search < len(list_to_search) - 1:
search += 1
steps += 1
if list_to_search[search] == search_item:
return search, steps
else:
return "search_item not found."
Testing the Simple Search:
test_list = random_list(10)
test_search_item = random.randrange(1, 10)
print("test_list is", test_list)
## ('test_list is', [2, 4, 3, 1, 0, 8, 7, 5, 6, 9])
print("test_search_item is", test_search_item)
## ('test_search_item is', 5)
print(simple_search(test_list, test_search_item))
## (7, 8)
print(simple_search(test_list, 100))
## search_item not found.
Simulating Simple Searches for various list lengths:
# find out how long on average it takes to perform a simple search for various
# list lengths
avg_searches_simple = []
for j in range(1, 1001):
searches_n = []
for i in range(100):
#simulate search
list_to_search = random_list(j)
search_item = random.randrange(j)
steps = simple_search(list_to_search, search_item)[1]
searches_n.append(float(steps))
avg_searches = sum(searches_n) / len(searches_n)
avg_searches_simple.append(avg_searches)
if (j + 1) % 50 == 0:
print("For n = " + str(j + 1) + ", the average search steps is ", avg_searches)
# we can clearly see the number of steps increases in O(n) time
## ('For n = 50, the average search steps is ', 27.8)
## ('For n = 100, the average search steps is ', 48.79)
## ('For n = 150, the average search steps is ', 71.1)
## ('For n = 200, the average search steps is ', 98.54)
## ('For n = 250, the average search steps is ', 125.14)
## ('For n = 300, the average search steps is ', 124.34)
## ('For n = 350, the average search steps is ', 161.39)
## ('For n = 400, the average search steps is ', 202.5)
## ('For n = 450, the average search steps is ', 240.3)
## ('For n = 500, the average search steps is ', 254.02)
## ('For n = 550, the average search steps is ', 283.21)
## ('For n = 600, the average search steps is ', 296.46)
## ('For n = 650, the average search steps is ', 349.53)
## ('For n = 700, the average search steps is ', 319.77)
## ('For n = 750, the average search steps is ', 344.99)
## ('For n = 800, the average search steps is ', 406.62)
## ('For n = 850, the average search steps is ', 413.04)
## ('For n = 900, the average search steps is ', 459.51)
## ('For n = 950, the average search steps is ', 512.38)
## ('For n = 1000, the average search steps is ', 524.37)
The Binary Search works on ordered (e.g. sorted) data structures. The Binary Search algorithm iteratively checks the midpoint of what is the known upper and lower bounds of the item being searched for, updating the upper and lower points and taking a new midpoint as the next search.
Binary Search algorithm:
While the search floor is less than or equal to the search ceiling:
def binary_search(list_to_search, search_item):
""" returns index of search_item in list_to_search. list_to_search must be sorted """
# initial search is the rounded down midpoint
search_floor = 0
search_ceil = len(list_to_search) - 1
search = (search_ceil + search_floor) // 2
steps = 1
# search across the list until the search_item is found
while search_floor <= search_ceil:
search = (search_ceil + search_floor) // 2
steps += 1
# print("")
# print("floor is", search_floor)
# print("ceil is", search_ceil)
# print("search is", search)
# if search search_item is too high, raise the search floor
if list_to_search[search] < search_item:
# print("raise floor")
search_floor = search + 1
# if search search_item is too low, lower the search ceiling
elif list_to_search[search] > search_item:
# print("lower ceil")
search_ceil = search - 1
else:
# print("search item found!")
return search, steps
return "search_item not found."
Testing the Binary Search:
test_list = sorted_list(6)
test_search_item = random.randrange(1, 6)
print("test_list is", test_list)
## ('test_list is', [0, 1, 2, 3, 4, 5])
print("test_search_item is", test_search_item)
## ('test_search_item is', 3)
print(binary_search(test_list, test_search_item))
## (3, 4)
print("NO SUCH SEARCH ITEM", 100)
## ('NO SUCH SEARCH ITEM', 100)
print(binary_search(test_list, 100))
## search_item not found.
Simulating Binary Searches for various list lengths:
# find out how long on average it takes to perform a binary search for various
# list lengths
avg_searches_binary = []
for j in range(1, 1001):
searches_n = []
for i in range(1000):
#simulate search
list_to_search = sorted_list(j)
search_item = random.randrange(j)
steps = binary_search(list_to_search, search_item)[1]
searches_n.append(float(steps))
avg_searches = sum(searches_n) / len(searches_n)
avg_searches_binary.append(avg_searches)
if (j + 1) % 64 == 0:
print("For n = " + str(j + 1) + ", the average search steps is ", avg_searches)
# we can clearly see the number of steps increases in O(n) time
## ('For n = 64, the average search steps is ', 6.091)
## ('For n = 128, the average search steps is ', 7.155)
## ('For n = 192, the average search steps is ', 7.823)
## ('For n = 256, the average search steps is ', 8.0)
## ('For n = 320, the average search steps is ', 8.437)
## ('For n = 384, the average search steps is ', 8.726)
## ('For n = 448, the average search steps is ', 8.867)
## ('For n = 512, the average search steps is ', 9.058)
## ('For n = 576, the average search steps is ', 9.267)
## ('For n = 640, the average search steps is ', 9.443)
## ('For n = 704, the average search steps is ', 9.577)
## ('For n = 768, the average search steps is ', 9.602)
## ('For n = 832, the average search steps is ', 9.791)
## ('For n = 896, the average search steps is ', 9.896)
## ('For n = 960, the average search steps is ', 9.995)
We can see by comparing the average simulated run times of the simple and binary search, that they run in linear and log time, respectively.
# passing list from python to R
avg_searches_simple <- py$avg_searches_simple
avg_searches_binary <- py$avg_searches_binary
n = 1:(length(avg_searches_simple))
linear <- n / 2
log_2 <- log(x = n, base = 2)
par(mfrow = c(1,2))
plot(n, avg_searches_simple, main = "Simple Search", ylab = "Average search steps", xlab = "Number of elements")
lines(linear, col = "red")
plot(n, avg_searches_binary, main = "Binary Search", ylab = "Average search steps", xlab = "Number of elements")
lines(log_2, col = "red")