Authors: Maina and Ocean
Website: ADD LINK
Aim: This project analyzes the demographic variables of all contestants in series 8 through 13 of the Great British Bake Off. It also analyzes the performance of winners across all episodes for each series.
Our aim was to answer four main questions:
Is there a difference in contestants’ ages based on their gender?
Is there a difference in contestants’ types of occupation based on their gender?
Are contestants more likely to come from certain regions of the United Kingdom?
Is there a relationship between winning the competition and the rankings/number of Star Baker awards a contestant earns throughout the series?
We created two data sets: bakeoff_data and series_data
bakeoff_data:
We began by scraping the “Baker” table for each series from the Wikipedia pages
We manually entered values for gender and occupation type
We cleaned the data so all first names were replaced by nicknames when applicable, and hometown was separated into country and city/town
We joined three different data sets which had latitude and longitude values for each city/town
We dealt with ties in rankings and also created a series column to keep track of which series the contestants were from
series_data:
We began by scraping all episode tables for each series from the Wikipedia pages
We kept the contestants’ names, rankings for technical challenges and result of each episode
We added the episode number and series number for each row
- This allowed us to make numeric comparisons between the genders
for means, interquartile ranges, and overall ranges
Stacked bar plots for proportions of each gender
The first plot was an overview of the data and showed the proportions of each gender for every series
The second plot showed the difference in proportions of each gender by the final rankings across all series.
We decided to use stacked bar plots which showed proportions because we wanted to investigate whether there were clear differences between genders and their final rankings. In order to do so, however, we thought it would be important to check that the initial proportions of genders weren’t overly different and we found no issues.
Bar charts for occupation types
We wanted one to illustrate the overall counts of the different types of occupation in order to get a broader sense of which types were more/less common.
We also wanted a graph that illustrated the counts of occupation types by gender, so we decided to create a population pyramid. To achieve this, we learnt how to use a joint y-axis so the two gender counts could be mirrored visually on either side of the same axis
Map of the U.K.
Line charts
geographr
This was a new tool used to create the map of the U.K.
Reference: https://github.com/humaniverse/geographr
ggplot2
rvest
stringr
tidyverse
janitor
dplyr
gt
Our end product is a Quarto Dashboard.
We used what we learnt from the Olympics dashboard tutorial and also used the quarto website (https://quarto.org/docs/guide/) for any additional help with layouts for each section including making tabsets, adding images, and choosing themes.
The data folder contains all the csv and rds files that were used for this project.
The images folder contains all of the photos that were used for the dashboard.
The dashboard.qmd contains the code for the quarto dashboard and the dashboard.html takes you to the actual webpage. The dashboard_files folder was automatically created and contains png files of the visualizations aand other supporting assets required for the dashboard.
The READ.md is this file that explains the project, provides links to all references, and also contains the link to the final quarto dashboard website.
For help with data sets:
For code:
https://cran.r-project.org/web/packages/janitor/vignettes/janitor.html
https://www.statology.org/r-rename_with/ https://stackoverflow.com/questions/8161836/how-do-i-replace-na-values-with-zeros-in-an-r-dataframe
https://www.r-bloggers.com/2023/09/creating-population-pyramid-plots-in-r-with-ggplot2
For images: