R packages used in the production of this assignment:

library(ggplot2)
library(tvthemes)
library(dplyr)

Introduction

The dataset being studied is the marbles dataset which recorded stats from all participants of the most recent season of Marbula One. Marbula One is a series of videos on YouTube, produced by user Jelle’s Marble Runs, that follows different events where various marbles are placed onto tracks and raced much in the style of the show’s namesake, Formula One. During each event, marbles are involved in a time trial qualifying round where they each race one lap and are placed into pole positions based on best lap time for the main event the following day. On race day, the same marbles compete in a circuit-style race course and teams are awarded points based on how well their representative marble has performed. Just from watching at a surface level, the whole process seems very random, but perhaps there is more than what meets the cat’s eye in this series of racing events. The purpose of this analysis is to determine whether or not some marbles have an innate advantage over others. That is to say, are some simply built better for racing, and if so, how does this effect their success in competition?

Link to data

marbles <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-06-02/marbles.csv')

Meet the Marbles

There are thirty-two participants in the Marbula One season from sixteen different teams, they are as follows:

Variables

To achieve this goal of determining which marble is fastest, there are a few different variables that need to be calculated or considered. First of all, the overall average speed of each marble must be calculated. Velocity = distance/time, so if I take the average distance traveled per lap and divide it by the average time each marble takes to complete each lap, I should get a reliable answer to the question of average overall velocity. The information provided by the marbles dataset includes each marble, the team they represent, what races they participated in, how many points they scored, average length of each lap, average time for each lap, and a few other variables that aren’t as relevant to the topic at hand.

The following code was used to calculate the average velocity of each marble overall, in racing events, and in qualifying events. Note: Momo serves as an example, the same code was used for every other marble.

Momo <- filter(marbles, marble_name == "Momo")
mean(Momo$avg_time_lap)
mean(Momo$track_length_m)
Momo_race <- filter(Momo, race == "S1R1" | race == "S1R2" | race == "S1R3" | race == "S1R4" | race == "S1R5" | race == "S1R6" | race == "S1R7" | race == "S1R8")
mean(Momo_race$avg_time_lap)
Momo_qualify <- filter(Momo, race == "S1Q1" | race == "S1Q2" | race == "S1Q3" | race == "S1Q4" | race == "S1Q5" | race == "S1Q6" | race == "S1Q7" | race == "S1Q8")
mean(Momo_qualify$avg_time_lap)

Individual Marble Speeds

After calculating the average speed for each marble, it would be useful to compare them in a bar graph to show which marbles were the fastest, and which were the slowest.

The following information was used to create a data frame that will be utilized in plotting the results.

Racer <- c("Clementin", "Orangin", "Starry", "Pulsar", "Momo", "Mimo",
            "Yellow", "Yellup", "Snowy", "Snowflake", "Razzy", "Rezzy", 
            "Prim", "Mary", "Vespa", "Hive", "Hazy", "Smoggy", "Mallard",
            "Billy", "Wispy", "Wospy", "RojoUno", "RojoDos", "Shock",
            "Bolt", "Sublime", "Limelime", "Clutter", "Anarchy", "Speedy",
            "Rapidly")
Team_Name <- c("O'rangers", "O'rangers", "Team Galactic", "Team Galactic", 
               "Team Momo", "Team Momo", "Mellow Yellow", "Mellow Yellow", 
               "Snowballs", "Snowballs", "Raspberry Racers", "Raspberry Racers", 
               "Team Primary", "Team Primary", "Hornets", "Hornets", 
               "Hazers", "Hazers", "Green Ducks", "Green Ducks", 
               "Midnight Wisps", "Midnight Wisps", "Rojo Rollers", "Rojo Rollers", 
               "Thunderbolts", "Thunderbolts", "Limers", "Limers", 
               "Balls of Chaos", "Balls of Chaos", "Savage Speeders", "Savage Speeders")
Time_Races <- c(28.318, 35.188, 32.493, 31.105, 32.803, 31.115, 
                 29.465, 34.29, 28.06, 35.593, 29.413, 34.373, 
                 28.065, 37.8, 29.6, 34.6, 32.365, 30.773, 32.49, 
                 31.035, 29.39, 33.74, 32.73, 31.233, 29.528, 
                 34.393, 30.497, 35.388, 32.6, 31.33, 28.115, 
                 35.253)

Time_Qualifiers <- c(23.888, 30.193, 27.953, 27.633, 27.59, 27.368, 
                      24.553, 29.173, 24.3, 30.79, 25.613, 29.54, 
                      23.273, 33.448, 25.38, 30.35, 27.798, 25.87, 27.93, 
                      26.938, 25.298, 29.015, 28.313, 27.783, 25.138, 
                      29.715, 24.258, 30.903, 28.415, 27.903, 23.75, 
                      30.628)

Time_Overall <- c(26.103, 32.69, 30.225, 29.369, 30.196, 29.241,
                   27.009, 31.731, 26.18, 33.191, 27.513, 31.956,
                   25.669, 35.313, 27.49, 32.475, 30.081, 28.321, 30.21,
                   28.986, 27.344, 31.037, 30.521, 29.508, 27.333,
                   32.054, 26.931, 33.145, 30.508, 29.616, 25.933, 
                   32.94)
Avg_Track_Length <- c(12.4, 14.045, 13.063, 13.383, 13.063, 13.383, 
                      12.785, 13.66, 12.4, 14.045, 12.785, 13.66, 
                      12.4, 14.166, 12.785, 13.66, 13.063, 13.383, 13.063, 
                      13.383, 12.875, 13.604, 13.0625, 13.383, 12.785, 
                      13.66, 12.471, 14.045, 13.063, 13.383, 12.4, 
                      14.045)
MarbleSpeeds <- data.frame(Racer, Team_Name, Time_Races, Time_Qualifiers, Time_Overall, Avg_Track_Length)

Now that all of the data has been finalized, here is the graph depicting each marble’s average velocity throughout the Marbula One season, measured in centimeters per second.

MarbleSpeedsv2 <- mutate(MarbleSpeeds, Avg_Speed = ((Avg_Track_Length * 100)/Time_Overall))
ggplot(MarbleSpeedsv2) +
  geom_col(aes(x = reorder(Racer, -Avg_Speed), y = Avg_Speed, fill = Team_Name)) +
  labs(y = "Average Velocity (centimeters per second)", x = "Racer", fill = "Team", title = "Average Overall Speed for Each Marble", caption = "From the marbles dataset") +
  scale_fill_manual(values = c("gold1", "forestgreen", "gray30", "khaki3", "green3", 
                               "yellow", "royalblue4", "orange", "violetred2", 
                               "tomato", "red4", "gray82", "darkmagenta", "greenyellow", 
                               "cyan4", "blue3")) +
  theme_avatar() +
  theme(axis.text.x = element_text(angle = 90))

These results were very interesting. It appeared that nearly every team had a marble in the top 50% of average speeds as well as the bottom 50% of average speeds. The most notable instance of this being Team Primary, for they have the fastest marble and the slowest marble in the league on the same squad.

To depict the advantage that some marbles have over others, here is a histogram showing the distribution of marble speeds. Note: Some of the observations are stacked on one another because their speeds are very similar, though not equal.

ggplot(MarbleSpeedsv2) +
  geom_histogram(aes(x = Avg_Speed, fill = Team_Name), bins = 50) +
  labs(y = "Count", fill = "Team", title = "Distribution of Marble Speeds", x = "Average Velocity (centimeters per seconds)", caption = "From the marbles dataset") +
  scale_fill_manual(values = c("gold1", "forestgreen", "gray30", "khaki3", "green3", 
                               "yellow", "royalblue4", "orange", "violetred2", 
                               "tomato", "red4", "gray82", "darkmagenta", "greenyellow", 
                               "cyan4", "blue3")) +
  theme_avatar()

From this, it is clear that some marbles have a significant advantage over others. Though a less than ten centimeters/per second advantage may not seem like a lot, these centimeters can really add up over the course of a five to six minute race.

Though not necessary, I thought it would be cool to see whether some marbles were better at qualifying events or racing events as opposed to their overall speed.

MarbleSpeedsRaces <- mutate(MarbleSpeeds, Avg_Speed = ((Avg_Track_Length * 100)/Time_Races))
ggplot(MarbleSpeedsRaces) +
  geom_col(aes(x = reorder(Racer, -Avg_Speed), y = Avg_Speed, fill = Team_Name)) +
  labs(y = "Average Velocity (centimeters per second)", x = "Racer", fill = "Team", title = "Average Speed in Race Events", caption = "From the marbles dataset") +
  scale_fill_manual(values = c("gold1", "forestgreen", "gray30", "khaki3", "green3", 
                               "yellow", "royalblue4", "orange", "violetred2", 
                               "tomato", "red4", "gray82", "darkmagenta", "greenyellow", 
                               "cyan4", "blue3")) +
  theme_avatar() +
  theme(axis.text.x = element_text(angle = 90))

MarbleSpeedsQualify <- mutate(MarbleSpeeds, Avg_Speed = ((Avg_Track_Length * 100)/Time_Qualifiers))
ggplot(MarbleSpeedsQualify) +
  geom_col(aes(x = reorder(Racer, -Avg_Speed), y = Avg_Speed, fill = Team_Name)) +
  labs(y = "Average Velocity (centimeters per second)", x = "Racer", fill = "Team", title = "Average Speed in Qualifying Events", caption = "From the marbles dataset") +
  scale_fill_manual(values = c("gold1", "forestgreen", "gray30", "khaki3", "green3", 
                               "yellow", "royalblue4", "orange", "violetred2", 
                               "tomato", "red4", "gray82", "darkmagenta", "greenyellow", 
                               "cyan4", "blue3")) +
  theme_avatar() +
  theme(axis.text.x = element_text(angle = 90))

Numerical Summary

Measure of Center

MarbleSpeeds %>%
  mutate(Speed = (Avg_Track_Length/Time_Overall) * 100) %>%
  summarize("Mean Speed" = mean(Speed))
##   Mean Speed
## 1    44.6954

The mean velocity among all participants is 44.6954 centimeters per second. Knowing this, any marble with an average speed higher than this value has an advantage over most other marbles.

Measure of Spread

MarbleSpeeds %>%
  mutate(Speed = (Avg_Track_Length/Time_Overall) * 100) %>%
  summarize("Standard Deviation Speed" = sd(Speed))
##   Standard Deviation Speed
## 1                 2.184971

Because the standard deviation is 2.184971, the majority of the racers should be within 2.184971 centimeters per second of the mean, so within the range of 42.51043 centimeters per second and 46.88037 centimeters per second. That being said, those with an average velocity below 42.51043 cm/s are disadvantaged compared to most of the cast and those with an average velocity above 46.88037 cm/s are going to have an advantage over most of the cast.

Average Speed Per Team

Now that the difference in average speeds has been established, to give meaning to the data it must be applied to how the teams performed in the Marbula One standings. To do so, the average speed per team must be calculated and compared with how many points the team scored over the course of the season.

For reference, here are the final standings of the Marbula One season:

Points <- c(69, 64, 39, 44, 66, 27, 54, 8, 94, 64, 34, 32, 49, 25, 46, 101)
Team_Name2 <- c("O'rangers", "Team Galactic", "Team Momo", "Mellow Yellow", 
                 "Snowballs", "Raspberry Racers", "Team Primary", "Hornets", 
                 "Hazers", "Green Ducks", "Midnight Wisps","Rojo Rollers", 
                 "Thunderbolts", "Limers", "Balls of Chaos", "Savage Speeders")
Final_Standings <- data.frame(Team_Name2, Points)
ggplot(Final_Standings) +
  geom_col(aes(x = reorder(Team_Name2, -Points), y = Points, fill = Team_Name2), show.legend = FALSE) +
  labs(x = "Team", title = "Final Standings", caption = "From the marbles dataset") +
  scale_fill_manual(values = c("gold1", "forestgreen", "gray30", "khaki3", "green3", 
                               "yellow", "royalblue4", "orange", "violetred2", 
                               "tomato", "red4", "gray82", "darkmagenta", "greenyellow", 
                               "cyan4", "blue3")) +
  theme_avatar() +
  coord_flip()

The average velocity per team was determined by the mean value between the speeds of each member of the team.

MarbleSpeeds %>%
  mutate(Speed = (Avg_Track_Length/Time_Overall) * 100) %>%
  group_by(Team_Name) %>%
  summarize("Mean Speed" = mean(Speed))
## # A tibble: 16 × 2
##    Team_Name        `Mean Speed`
##    <chr>                   <dbl>
##  1 Balls of Chaos           44.0
##  2 Green Ducks              44.7
##  3 Hazers                   45.3
##  4 Hornets                  44.3
##  5 Limers                   44.3
##  6 Mellow Yellow            45.2
##  7 Midnight Wisps           45.5
##  8 O'rangers                45.2
##  9 Raspberry Racers         44.6
## 10 Rojo Rollers             44.1
## 11 Savage Speeders          45.2
## 12 Snowballs                44.8
## 13 Team Galactic            44.4
## 14 Team Momo                44.5
## 15 Team Primary             44.2
## 16 Thunderbolts             44.7
Avg_Speed_Per_Team <- c(45.23416, 44.39382, 44.51431, 45.19272, 44.84005, 
                        44.60761, 44.21142, 44.28548, 45.34038, 44.70561, 
                        45.45842, 44.07610, 44.69527, 44.34082, 44.00334, 45.22682)
SpeedPerTeam <- data.frame(Team_Name2, Avg_Speed_Per_Team, Points)

Now here is a scatter plot showing the relationship between average overall velocity per team and how many points they scored during the season. If there is any correlation between the two variables, then it will be clear that some marbles had a competitive advantage this season.

ggplot(SpeedPerTeam) +
  geom_point(aes(x = Avg_Speed_Per_Team, y = Points, color = Team_Name2), show.legend = FALSE, size = 3) +
  labs(x = "Average Velocity Per Team (centimeters per second)", caption = "From the marbles dataset") +
  scale_color_manual(values = c("gold1", "forestgreen", "gray30", "khaki3", "green3", 
                               "yellow", "royalblue4", "orange", "violetred2", 
                               "tomato", "red4", "gray82", "darkmagenta", "greenyellow", 
                               "cyan4", "blue3")) +
  theme_avatar()

Conclusion

Following this analysis of the marbles dataset, it appears that certain marbles do have an advantage over others and this advantage has had an effect on the final standings in this season of Marbula One. After calculating the average velocity per team and comparing it to the points each team scored throughout the season, the results showed a weak, yet positive correlation between the two variables. This means that, generally speaking, the faster the average velocity of a team is, the more points they should be capable of scoring. However, this cannot be a perfect correlation given the random nature of marble racing. Either way, it is worth betting that the Savage Speeders, the winners of this year’s championship and one of the fastest teams on average, will do very well next season.