About Dataset


The data this week comes from Jelle’s Marble Runs courtesy of Randy Olson.

Jelle’s Marble Runs started as a quirky YouTube channel back in 2006 and has refined the art of marble racing to the point that many — including sponsor John Oliver from Last Week Tonight — consider marble racing a legitimate contender for the national sports spotlight. Given that Jelle’s Marble Runs just completed their popular Marbula One competition last month, I was curious to look at the race results to see if these races were anything more than chaos.

Do some marbles race better than others? Who would I put my money on in season 2 of Marbula One? … If any of these questions interest you, read on and I’ll answer some of them.

The first step to answering these questions was to get some data. Thankfully, all of the Marbula One videos are organized in a YouTube playlist available here. From every race, my marble racing analytics team recorded each marble racer’s qualifier performance, total race time, average lap time, final rank, and some other statistics. That dataset is available for download on my website here.

Some additional context from the fandom Wiki for Jelle’s Marble Runs and a link to Season 1 courtesy of Georgios Karamanis.

More Information About Dataset.

Loading Library and Data


library(tidyverse)
library(lubridate)
library(cowplot)
library(knitr)

marbles_raw <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-06-02/marbles.csv')

Data Preparation


marbles <- marbles_raw %>%
  separate(race, c("series", "stage"), sep = 2) %>%
  select(-source, -notes, -host) %>%
  mutate(date = parse_date(date, format = "%d-%b-%y"),
         stage_name = case_when(str_detect(stage, "Q") ~ "Qualifier",
                                str_detect(stage, "R") ~ "Race",
                                TRUE ~ stage),
         speed_mps = track_length_m/avg_time_lap)

marbles %>% 
  select(date, stage_name, site, marble_name, team_name, speed_mps, time_s) %>%
  head(10) %>% 
  kable()
date stage_name site marble_name team_name speed_mps time_s
2020-02-15 Qualifier Savage Speedway Clementin O’rangers 0.46 28
2020-02-15 Qualifier Savage Speedway Starry Team Galactic 0.45 28
2020-02-15 Qualifier Savage Speedway Momo Team Momo 0.45 28
2020-02-15 Qualifier Savage Speedway Yellow Mellow Yellow 0.45 29
2020-02-15 Qualifier Savage Speedway Snowy Snowballs 0.45 29
2020-02-15 Qualifier Savage Speedway Razzy Raspberry Racers 0.45 29
2020-02-15 Qualifier Savage Speedway Prim Team Primary 0.44 29
2020-02-15 Qualifier Savage Speedway Vespa Hornets 0.44 29
2020-02-15 Qualifier Savage Speedway Hazy Hazers 0.44 29
2020-02-15 Qualifier Savage Speedway Mallard Green Ducks 0.43 30

I modified the race column become two coloumn, they are series and stage.

Data Exploration


Mention the name of marble for each team member?

marbles %>% 
  group_by(team_name, marble_name) %>%
  count() %>%
  ggplot(aes(marble_name, n, fill=team_name)) +
  geom_bar(stat = 'identity', alpha=0.7) +
  geom_text(aes(label=marble_name), vjust=2, col = "black", size = 5) +
  facet_wrap(~team_name, scales = 'free', nrow=8) +
  theme_void() +
  theme(legend.position = 'none',
        strip.text.x = element_text(size = 15))

How competition points for each marble?

marbles %>%
  filter(!is.na(points)) %>%
  distinct(team_name, date, points) %>%
  group_by(team_name) %>%
  mutate(total_points = cumsum(points),
         label = if_else(total_points > 90, team_name, NULL)) %>%
  ungroup() %>%
  ggplot(aes(x = date, y = total_points, col = team_name)) +
  #geom_label_repel(aes(label = label), nudge_x = 1, na.rm = TRUE) +
  geom_text(aes(label = label), hjust = 1, size = 5) + 
  geom_line(size = 1, alpha = 0.5) +
  scale_y_continuous(breaks = seq(0, 100, by=10)) +
  theme_minimal() +
  theme(legend.position = "none") +
  labs(x = "Dates",
       y = "Cumulative Points")
## Warning: Removed 126 rows containing missing values (geom_text).

How are the points earned for each marble in the team?

marbles %>%
  group_by(team_name, marble_name) %>%
  summarise(total_points = sum(points, na.rm = TRUE)) %>%
  arrange(-total_points) %>%
  ggplot(aes(x = reorder(marble_name, total_points), y = total_points, fill=team_name)) +
  geom_bar(stat = 'identity') +
  coord_flip() +
  facet_wrap(~team_name, scales = 'free') +
  theme_minimal() +
  theme(legend.position = 'none') +
  labs(x = "Marbles Name", 
       y = "\nTotal Points")

How are the points earned for each team and site?

xaxis <- marbles %>%
  group_by(team_name) %>%
  summarise(total_point = sum(points, na.rm = TRUE)) %>%
  arrange(total_point) %>%
  pull(team_name) 

heatmap <- marbles %>%
  select(site, team_name, points) %>%
  group_by(site, team_name) %>%
  na.omit() %>%
  ggplot(aes(x = site, y = team_name)) +
  geom_tile(aes(fill = points)) +
  geom_text(aes(label = points), col="white") +
  scale_fill_gradient(low = "gray95", high = "steelblue") +
  theme_minimal() +
  theme(legend.position = 'none', 
        axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
  labs(x = element_blank(),
       y = "Team Name") +
  ylim(xaxis)

total_points <- marbles %>%
  group_by(team_name) %>%
  summarise(total_point = sum(points, na.rm = TRUE)) %>%
  ggplot(aes(x = team_name, y = total_point)) +
  geom_col(fill = "steelblue") +
  geom_text(aes(label = total_point, hjust= 2), col="white") +
  coord_flip() +  
  theme_minimal() +
  theme(axis.text.y = element_blank()) +
  labs(x = element_blank(),
       y = element_blank()) +
  xlim(xaxis)
 

plot_grid(heatmap, 
          total_points, 
          align="h", 
          rel_widths = c(1, 0.8))

Is there a relationship between marble speed and points acquisition?

marbles %>% 
  filter(stage_name == "Race",
         points > 1) %>%
  ggplot(aes(points, speed_mps)) +
  geom_point(col="steelblue", size = 1, alpha = 0.7) +
  facet_wrap(~site, scales = 'free') +
  theme_minimal() +
  labs(y = "Speed (m/s)\n",
       x = "\nPoints")

Thank You

Amri Rohman
Sidoarjo, East Java, ID.