We will investigate the tidy tuesday dataset for 20/10/2020. The dataset deals with beer awards. See the link for the data source: https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-10-20/readme.md
Let’s read the data and quickly view what is happening.
library(tidyverse)
library(skimr)
beer_awards <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-10-20/beer_awards.csv')
beer_awards
## # A tibble: 4,970 x 7
## medal beer_name brewery city state category year
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 Gold Volksbier Vienna Wibby Brewi~ Longmo~ CO American Amb~ 2020
## 2 Silver Oktoberfest Founders Br~ Grand ~ MI American Amb~ 2020
## 3 Bronze Amber Lager Skipping Ro~ Staunt~ VA American Amb~ 2020
## 4 Gold Lager at World's End Epidemic Al~ Concord CA American Lag~ 2020
## 5 Silver Seismic Tremor Seismic Bre~ Santa ~ CA American Lag~ 2020
## 6 Bronze Lite Thinking Pollyanna B~ Lemont IL American Lag~ 2020
## 7 Gold Beachscape Ventura Coa~ Ventura CA American Pil~ 2020
## 8 Silver Imagine a World with B~ Freetail Br~ San An~ TX American Pil~ 2020
## 9 Bronze Pilsner Old Town Br~ Portla~ OR American Pil~ 2020
## 10 Gold Tank 7 Boulevard B~ Kansas~ MO American-Bel~ 2020
## # ... with 4,960 more rows
skim(beer_awards)
| Name | beer_awards |
| Number of rows | 4970 |
| Number of columns | 7 |
| _______________________ | |
| Column type frequency: | |
| character | 6 |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| medal | 0 | 1 | 4 | 6 | 0 | 3 | 0 |
| beer_name | 0 | 1 | 2 | 89 | 0 | 3811 | 0 |
| brewery | 0 | 1 | 6 | 58 | 0 | 1859 | 0 |
| city | 0 | 1 | 3 | 44 | 0 | 803 | 0 |
| state | 0 | 1 | 2 | 2 | 0 | 52 | 0 |
| category | 0 | 1 | 4 | 76 | 0 | 515 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| year | 0 | 1 | 2007.88 | 8.68 | 1987 | 2002 | 2009 | 2015 | 2020 | ▂▃▅▆▇ |
We will factor all the variables
beer_awards_fct = beer_awards %>%
mutate(medal = factor(medal, levels = c("Bronze","Silver","Gold") ),
state = str_to_upper(state),
across(where(is.character),as.factor)
)
beer_awards_fct
## # A tibble: 4,970 x 7
## medal beer_name brewery city state category year
## <fct> <fct> <fct> <fct> <fct> <fct> <dbl>
## 1 Gold Volksbier Vienna Wibby Brewi~ Longmo~ CO American Amb~ 2020
## 2 Silver Oktoberfest Founders Br~ Grand ~ MI American Amb~ 2020
## 3 Bronze Amber Lager Skipping Ro~ Staunt~ VA American Amb~ 2020
## 4 Gold Lager at World's End Epidemic Al~ Concord CA American Lag~ 2020
## 5 Silver Seismic Tremor Seismic Bre~ Santa ~ CA American Lag~ 2020
## 6 Bronze Lite Thinking Pollyanna B~ Lemont IL American Lag~ 2020
## 7 Gold Beachscape Ventura Coa~ Ventura CA American Pil~ 2020
## 8 Silver Imagine a World with B~ Freetail Br~ San An~ TX American Pil~ 2020
## 9 Bronze Pilsner Old Town Br~ Portla~ OR American Pil~ 2020
## 10 Gold Tank 7 Boulevard B~ Kansas~ MO American-Bel~ 2020
## # ... with 4,960 more rows
Let’s see the number of awards given by year
medal_colors = c("#965A38","#BFC1C2","#DFBC00") # bronze, silver, and gold
beer_awards_fct %>%
ggplot(aes(x=year,fill=medal))+
stat_count()+
scale_fill_manual(values=medal_colors)+
theme_minimal()+
theme(legend.position = "top",
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank() )+
labs(title="More Medals Are Given Out As Time Passes",fill="Medal",x="Year",y="Number of Medals")
We clearly see a trend that more medals are awarded over time, though we notice that there has been a recent dip in 2020, though given the known circumstances of a global pandemic, it shouldn’t be too surprising that there was a dip.
beer_awards_fct %>%
count(state,medal) %>%
ggplot(aes(x=reorder(state,n, FUN = sum ),y=n))+
geom_col(aes(fill=medal))+
geom_text(aes(label= stat(y)),
stat = 'summary', fun = sum, vjust = -0.5,size=3 )+
scale_fill_manual(values=medal_colors)+
theme_minimal()+
theme(legend.position = "top",
panel.grid.major.x = element_blank())+
labs(title="Californian Breweries Are The Most Successful",subtitle="Number Of Awards Received By State Of Brewery",x="State",y="Number of Awards",fill="Medal")+
scale_y_continuous(expand = c(0,50) )
This graph shows that the top three most successful states in temrs of medals won for beers are: California, Colorado, and Oregon. The least successful were: Rhode Island, Oklahoma, Mississippi, and North Dakota.
beer_awards_fct %>%
group_by(brewery,medal)%>%
count() %>%
ungroup(medal) %>%
mutate(Total = sum(n)) %>%
arrange(-Total)%>%
ungroup() %>%
slice_max(n=30, order_by=Total)%>% # Slightly hacky - use n=30 because Each Brewery can get 3 medals,and we want the top 10.
ggplot(aes(x=n,y=reorder(brewery,n,FUN=sum)))+
geom_col(aes(fill=medal))+
geom_text(aes(label= n,group=medal),size=3 ,position = position_stack(vjust = 0.5))+
geom_text(aes(label = paste0("(",stat(x),")" ) , group=brewery), stat = 'summary', fun = sum,col="black", nudge_x = 2,size=4,fontface="italic")+
scale_fill_manual(values=medal_colors)+
theme_minimal()+
theme(legend.position = "top",
panel.grid.major.y = element_blank() )+
guides(fill = guide_legend(reverse=T))+
labs(title="Top 10 Breweries by Total Medal Count",subtitle = "Brackets Shows Total",x="Medal Count",y="Brewery",fill="Medal")
Pabst Brewing CO were the most successful brewery in terms of the number of medals.
beer_awards_fct %>%
group_by(brewery,state,medal)%>%
count()%>%
ungroup(medal) %>%
mutate(Total = sum(n)) %>%
arrange(-Total)%>%
ungroup() %>%
slice_max(n=30, order_by=Total) %>% # Slightly hacky - use n=30 because Each Brewery can get 3 medals,and we want the top 10.
ggplot(aes(x=n,y=reorder(brewery,n,FUN=sum)))+
geom_col(aes(fill=medal))+
geom_text(aes(label= n,group=medal),size=3 ,position = position_stack(vjust = 0.5))+
geom_text(aes(label = paste0("(",stat(x),")" ) , group=brewery), stat = 'summary', fun = sum,col="black", nudge_x = 2,size=4,fontface="italic")+
geom_label(aes(x=0, label=state,group=brewery,col=state))+
scale_fill_manual(values=medal_colors)+
theme_minimal()+
theme(legend.position = "top",
panel.grid.major.y = element_blank() )+
guides(fill = guide_legend(reverse=T),col=FALSE)+
labs(title="Top 10 Individual Breweries by Total Medal Count",subtitle = "With State Labelled",x="Medal Count",y="Brewery",fill="Medal")
This time we separate out the breweries at a state level; meaning we are now looking at individuals breweries at a specific state, rather than a group of them.
Anheuser-Busch is the best performing single brewery from the state of Missouri.
Let’s find which cities were best at crafting beers?
beer_awards_fct %>%
count(city,state) %>%
arrange(-n) %>%
slice_max(order_by=n,n=10) %>%
ggplot(aes(x=n,y= reorder(city,n),fill=state ))+
geom_col()+
geom_text(aes(label=n),hjust=-0.05)+
theme_minimal()+
theme(panel.grid.major.y = element_blank() )+
labs(title="Which Cities Are Home To The Most Awarded Breweries?",x="Number of Awards",y="City",fill="State")
We see that Denver from the State of Colorado was the most successful, obtaining 145 medals. Other successful cities from Colorado include: Fort Collins, and Golden.
Which Beer won the most?
beer_awards_fct %>%
count(beer_name,brewery,city,state)%>%
arrange(-n) %>%
slice_head(n=10) %>%
rename("medals_won" = n) %>%
knitr::kable(caption="Top 10 awarded beers by brewery and city")
| beer_name | brewery | city | state | medals_won |
|---|---|---|---|---|
| Alaskan Smoked Porter | Alaskan Brewing and Bottling Co. | Juneau | AK | 11 |
| Samuel Adams Double Bock | Boston Beer Co. | Boston | MA | 9 |
| Raspberry Tart | New Glarus Brewing Co. | New Glarus | WI | 8 |
| Abbey Belgian Style Ale | New Belgium Brewing Co. | Fort Collins | CO | 7 |
| Kiwanda Cream Ale | Pelican Pub & Brewery | Pacific City | OR | 7 |
| Laughing Lab Scottish Ale | Bristol Brewing Co. | Colorado Springs | CO | 7 |
| Miller Genuine Draft | Miller Brewing Co. | Milwaukee | WI | 7 |
| Coors Light | Coors Brewing Co. | Golden | CO | 6 |
| Genesee Cream Ale | Genesee/High Falls Brewing | Rochester | NY | 6 |
| Belgian Red | New Glarus Brewing Co. | New Glarus | WI | 5 |
We see that the Alasken Snoked Porter was the best individual drink by brewery and city; winning 11 awards. Let’s see how it performed over the years.
beer_awards_fct %>%
filter(beer_name == "Alaskan Smoked Porter") %>%
ggplot(aes(x=year,y=medal,group=1))+
geom_line(lty=3)+
geom_point(aes(col=medal),size=6,show.legend = FALSE)+
theme_classic()+
scale_color_manual(values=medal_colors)+
labs(title="The Comeback of Alaskan Smoked Porter",subtitle="In the Catergory of Smoke-Flavored Beer",x="Year",y=NULL)
We see that Alaskan Smoked Porter was very popular during the early to mid 90s, but became less popular over time. It saw a resurgence in popularity in 2005 where it won gold.
top_beers = beer_awards_fct %>%
count(beer_name,brewery,city,state)%>%
slice_max(order_by = n, n=15) %>%
mutate(beer_name = fct_reorder(beer_name,-n))%>%
select(beer_name,city)
top_beers %>%
inner_join(y=beer_awards_fct,by=c("beer_name","city"))%>%
ggplot(aes(x=year,y=medal,group=1))+
geom_line(lty=3)+
geom_point(aes(col=medal),size=3,show.legend = FALSE)+
facet_wrap(~beer_name)+
theme_bw()+
scale_color_manual(values=medal_colors)+
labs(title="Top Award Winning Beers",subtitle="Performance over Time For Beers Made By Unique Breweries In Cities",x="Year",y=NULL)
And finally, we see can see how the top awarding winning beers performed over time. We see that that Samuel Adams Double Bock performed very well and was the second best beer crafted by an individual brewery, it won the most amount of gold medals. We can also see that Belgian Red had all but one if it’s medals as gold, showing a very high performance in the rankings.