We first begin by loading the tidyverse library so we can use it later
Lets get some stats for a categorical and numeric columns
## # A tibble: 9 × 2
## algorithm Total_Tests_Ran
## <chr> <int>
## 1 Bubble Sort 101
## 2 Bucket Sort 100
## 3 Heap Sort 100
## 4 Insertion Sort 101
## 5 Merge Sort 100
## 6 Odd-Even Sort 100
## 7 Quick Sort 100
## 8 Selection Sort 101
## 9 Shell Sort 100
## Min Max Mean Median StdDev
## 1 8.61 2009.701 658.0955 36.341 773.1327
Insights: The summary dumps allow us to gain a quick insight on the overall structure of the dataset and allows us to start orienting ourselves for what we may want to further analyze. For example we notice that the bubble sort algorithm has the highest average compute time while bucket sort has the smallest compute time. We CANNOT draw conclusions from this though! There are other factors that must be taken into account such as the size of array that is computed on the nature of the algorithm itself. That being said we can still use this data to get some idea of what other data we may want to extract and look at.
Here we can ask some questions to gain insight about the data. Let us consider the following questions: 1) If we increase the size of an array (say we 2x it) which algorithm is hit the hardest in terms of increased execution time? 2) At what array size does the performance of bubble sort become consistently larger then quick sort? 3) Is there any algorithm that maintains a time complexity of sub 10ms execution time assuming the average case?
We will select the first question to explore
## # A tibble: 9 × 2
## algorithm average_time
## <chr> <dbl>
## 1 Bucket Sort 9.28
## 2 Quick Sort 14.6
## 3 Merge Sort 15.9
## 4 Heap Sort 25.1
## 5 Shell Sort 36.5
## 6 Selection Sort 971.
## 7 Insertion Sort 1023.
## 8 Odd-Even Sort 1891.
## 9 Bubble Sort 1917.
Insights: We can see from the stats that the average time for quick sort compared to something like insertion sort or bubble sort is significantly shorter. From this we cannot extrapolate what the behavior will be for very large inputs. We would have to mathematically construct a proof that analyzes the asymptotic behavior of each algorithm as for significantly large n (I am taking an algorithm class along side with this class and so it would be interesting to carry out these calculations but it would be out of the scope for this lab). That being said, we can say generally that it might be better to use something like quick sort or merge sort instead of using insertion sort or bubble sort to sort large arrays.
In this section we output 2 graphs each shows some insight on the behavior of the algorithms and how they can be compared relative to time and array size.
Insights: We can observe both of the graphs that are shown. The array size to impact algorithms graph could be better. Currently it just stacks all of the algorithm colors on top of each so it is a bit hard to see. That being said however we can still see that length of time increases a lot for algorithms like bubble sort at 1000 elements. This is expected and reflects the short summaries that we generated earlier.