Who’s the Star Baker? Analyzing Contestants in the Great British Bake Off.

Authors: Maina and Ocean

Website: ADD LINK

Aim: This project analyzes the demographic variables of all contestants in series 8 through 13 of the Great British Bake Off. It also analyzes the performance of winners across all episodes for each series.

Our aim was to answer four main questions:

  1. Is there a difference in contestants’ ages based on their gender?

  2. Is there a difference in contestants’ types of occupation based on their gender?

  3. Are contestants more likely to come from certain regions of the United Kingdom?

  4. Is there a relationship between winning the competition and the rankings/number of Star Baker awards a contestant earns throughout the series?

Technical Report:

Data Acquisition:

We created two data sets: bakeoff_data and series_data

bakeoff_data:

  • We began by scraping the “Baker” table for each series from the Wikipedia pages

  • We manually entered values for gender and occupation type

  • We cleaned the data so all first names were replaced by nicknames when applicable, and hometown was separated into country and city/town

  • We joined three different data sets which had latitude and longitude values for each city/town

    • Two of the three were found online and one was manually created
  • We dealt with ties in rankings and also created a series column to keep track of which series the contestants were from

series_data:

  • We began by scraping all episode tables for each series from the Wikipedia pages

  • We kept the contestants’ names, rankings for technical challenges and result of each episode

  • We added the episode number and series number for each row

Visualizations:

  • Box-plots for differences in contestants’ ages by gender.
-   This allowed us to make numeric comparisons between the genders
    for means, interquartile ranges, and overall ranges
  • Stacked bar plots for proportions of each gender

    • The first plot was an overview of the data and showed the proportions of each gender for every series

    • The second plot showed the difference in proportions of each gender by the final rankings across all series.

    • We decided to use stacked bar plots which showed proportions because we wanted to investigate whether there were clear differences between genders and their final rankings. In order to do so, however, we thought it would be important to check that the initial proportions of genders weren’t overly different and we found no issues.

  • Bar charts for occupation types

    • We wanted one to illustrate the overall counts of the different types of occupation in order to get a broader sense of which types were more/less common.

    • We also wanted a graph that illustrated the counts of occupation types by gender, so we decided to create a population pyramid. To achieve this, we learnt how to use a joint y-axis so the two gender counts could be mirrored visually on either side of the same axis

  • Map of the U.K.

    • Display where the contestants came from in order to see if contestants were more likely to come from certain regions
  • Line charts

    • Display how winners of each series ranked on each technical challenge across 10 episodes. We wanted to explore whether there were any patterns in how the winners ranked throughout each series.

Library Tools:

  • geographr

  • ggplot2

    • For visualizations
  • rvest

    • For scraping websites
  • stringr

    • For cleaning data
  • tidyverse

    • For cleaning data
  • janitor

    • For cleaning names in data sets
  • dplyr

    • For cleaning data
  • gt

    • For displaying data sets in the dashboard

Output:

Our end product is a Quarto Dashboard.

We used what we learnt from the Olympics dashboard tutorial and also used the quarto website (https://quarto.org/docs/guide/) for any additional help with layouts for each section including making tabsets, adding images, and choosing themes.

Files:

The data folder contains all the csv and rds files that were used for this project.

The images folder contains all of the photos that were used for the dashboard.

The dashboard.qmd contains the code for the quarto dashboard and the dashboard.html takes you to the actual webpage. The dashboard_files folder was automatically created and contains png files of the visualizations aand other supporting assets required for the dashboard.

The READ.md is this file that explains the project, provides links to all references, and also contains the link to the final quarto dashboard website.