Initialization and Data Prep

We first begin by loading the tidyverse library so we can use it later

Requirement #1

Lets get some stats for a categorical and numeric columns

## # A tibble: 9 × 2
##   algorithm      Total_Tests_Ran
##   <chr>                    <int>
## 1 Bubble Sort                101
## 2 Bucket Sort                100
## 3 Heap Sort                  100
## 4 Insertion Sort             101
## 5 Merge Sort                 100
## 6 Odd-Even Sort              100
## 7 Quick Sort                 100
## 8 Selection Sort             101
## 9 Shell Sort                 100

##    Min      Max     Mean Median   StdDev
## 1 8.61 2009.701 658.0955 36.341 773.1327

Insights: The summary dumps allow us to gain a quick insight on the overall structure of the dataset and allows us to start orienting ourselves for what we may want to further analyze. For example we notice that the bubble sort algorithm has the highest average compute time while bucket sort has the smallest compute time. We CANNOT draw conclusions from this though! There are other factors that must be taken into account such as the size of array that is computed on the nature of the algorithm itself. That being said we can still use this data to get some idea of what other data we may want to extract and look at.

Requirement #2

Here we can ask some questions to gain insight about the data. Let us consider the following questions: 1) If we increase the size of an array (say we 2x it) which algorithm is hit the hardest in terms of increased execution time? 2) At what array size does the performance of bubble sort become consistently larger then quick sort? 3) Is there any algorithm that maintains a time complexity of sub 10ms execution time assuming the average case?

We will select the first question to explore

## # A tibble: 9 × 2
##   algorithm      average_time
##   <chr>                 <dbl>
## 1 Bucket Sort            9.28
## 2 Quick Sort            14.6 
## 3 Merge Sort            15.9 
## 4 Heap Sort             25.1 
## 5 Shell Sort            36.5 
## 6 Selection Sort       971.  
## 7 Insertion Sort      1023.  
## 8 Odd-Even Sort       1891.  
## 9 Bubble Sort         1917.

Insights: We can see from the stats that the average time for quick sort compared to something like insertion sort or bubble sort is significantly shorter. From this we cannot extrapolate what the behavior will be for very large inputs. We would have to mathematically construct a proof that analyzes the asymptotic behavior of each algorithm as for significantly large n (I am taking an algorithm class along side with this class and so it would be interesting to carry out these calculations but it would be out of the scope for this lab). That being said, we can say generally that it might be better to use something like quick sort or merge sort instead of using insertion sort or bubble sort to sort large arrays.

Requirement #3

In this section we output 2 graphs each shows some insight on the behavior of the algorithms and how they can be compared relative to time and array size.

Insights: We can observe both of the graphs that are shown. The array size to impact algorithms graph could be better. Currently it just stacks all of the algorithm colors on top of each so it is a bit hard to see. That being said however we can still see that length of time increases a lot for algorithms like bubble sort at 1000 elements. This is expected and reflects the short summaries that we generated earlier.

Week2_Data_Dive

2026-01-26

Initialization and Data Prep

Requirement #1

Requirement #2

Requirement #3