We will investigate the tidy tuesday dataset for 20/10/2020. The dataset deals with beer awards. See the link for the data source: https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-10-20/readme.md

Let’s read the data and quickly view what is happening.

library(tidyverse)
library(skimr)
beer_awards <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-10-20/beer_awards.csv')
beer_awards
## # A tibble: 4,970 x 7
##    medal  beer_name               brewery      city    state category       year
##    <chr>  <chr>                   <chr>        <chr>   <chr> <chr>         <dbl>
##  1 Gold   Volksbier Vienna        Wibby Brewi~ Longmo~ CO    American Amb~  2020
##  2 Silver Oktoberfest             Founders Br~ Grand ~ MI    American Amb~  2020
##  3 Bronze Amber Lager             Skipping Ro~ Staunt~ VA    American Amb~  2020
##  4 Gold   Lager at World's End    Epidemic Al~ Concord CA    American Lag~  2020
##  5 Silver Seismic Tremor          Seismic Bre~ Santa ~ CA    American Lag~  2020
##  6 Bronze Lite Thinking           Pollyanna B~ Lemont  IL    American Lag~  2020
##  7 Gold   Beachscape              Ventura Coa~ Ventura CA    American Pil~  2020
##  8 Silver Imagine a World with B~ Freetail Br~ San An~ TX    American Pil~  2020
##  9 Bronze Pilsner                 Old Town Br~ Portla~ OR    American Pil~  2020
## 10 Gold   Tank 7                  Boulevard B~ Kansas~ MO    American-Bel~  2020
## # ... with 4,960 more rows
skim(beer_awards)
Data summary
Name beer_awards
Number of rows 4970
Number of columns 7
_______________________
Column type frequency:
character 6
numeric 1
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
medal 0 1 4 6 0 3 0
beer_name 0 1 2 89 0 3811 0
brewery 0 1 6 58 0 1859 0
city 0 1 3 44 0 803 0
state 0 1 2 2 0 52 0
category 0 1 4 76 0 515 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
year 0 1 2007.88 8.68 1987 2002 2009 2015 2020 ▂▃▅▆▇

We will factor all the variables

beer_awards_fct = beer_awards %>%
  mutate(medal = factor(medal, levels = c("Bronze","Silver","Gold") ),
         state = str_to_upper(state),
         across(where(is.character),as.factor)
         )
beer_awards_fct
## # A tibble: 4,970 x 7
##    medal  beer_name               brewery      city    state category       year
##    <fct>  <fct>                   <fct>        <fct>   <fct> <fct>         <dbl>
##  1 Gold   Volksbier Vienna        Wibby Brewi~ Longmo~ CO    American Amb~  2020
##  2 Silver Oktoberfest             Founders Br~ Grand ~ MI    American Amb~  2020
##  3 Bronze Amber Lager             Skipping Ro~ Staunt~ VA    American Amb~  2020
##  4 Gold   Lager at World's End    Epidemic Al~ Concord CA    American Lag~  2020
##  5 Silver Seismic Tremor          Seismic Bre~ Santa ~ CA    American Lag~  2020
##  6 Bronze Lite Thinking           Pollyanna B~ Lemont  IL    American Lag~  2020
##  7 Gold   Beachscape              Ventura Coa~ Ventura CA    American Pil~  2020
##  8 Silver Imagine a World with B~ Freetail Br~ San An~ TX    American Pil~  2020
##  9 Bronze Pilsner                 Old Town Br~ Portla~ OR    American Pil~  2020
## 10 Gold   Tank 7                  Boulevard B~ Kansas~ MO    American-Bel~  2020
## # ... with 4,960 more rows

Let’s see the number of awards given by year

medal_colors = c("#965A38","#BFC1C2","#DFBC00") # bronze, silver, and gold


beer_awards_fct %>%
  ggplot(aes(x=year,fill=medal))+
  stat_count()+
  scale_fill_manual(values=medal_colors)+
  theme_minimal()+
  theme(legend.position = "top",
        panel.grid.major.x  = element_blank(),
        panel.grid.minor.x  = element_blank() )+
  labs(title="More Medals Are Given Out As Time Passes",fill="Medal",x="Year",y="Number of Medals")

We clearly see a trend that more medals are awarded over time, though we notice that there has been a recent dip in 2020, though given the known circumstances of a global pandemic, it shouldn’t be too surprising that there was a dip.

beer_awards_fct %>%
  count(state,medal) %>%
  ggplot(aes(x=reorder(state,n, FUN = sum ),y=n))+
  geom_col(aes(fill=medal))+
  geom_text(aes(label= stat(y)),
                  stat = 'summary', fun = sum, vjust = -0.5,size=3   )+
  scale_fill_manual(values=medal_colors)+
  theme_minimal()+
  theme(legend.position = "top", 
        panel.grid.major.x = element_blank())+
  labs(title="Californian Breweries Are The Most Successful",subtitle="Number Of Awards Received By State Of Brewery",x="State",y="Number of Awards",fill="Medal")+
 scale_y_continuous(expand = c(0,50) )

This graph shows that the top three most successful states in temrs of medals won for beers are: California, Colorado, and Oregon. The least successful were: Rhode Island, Oklahoma, Mississippi, and North Dakota.

beer_awards_fct %>%
group_by(brewery,medal)%>%
  count() %>%
  ungroup(medal) %>%
  mutate(Total = sum(n)) %>%
  arrange(-Total)%>%
  ungroup() %>%
  slice_max(n=30, order_by=Total)%>%    # Slightly hacky - use n=30 because Each Brewery can get 3 medals,and we want the top 10.
  ggplot(aes(x=n,y=reorder(brewery,n,FUN=sum)))+
  geom_col(aes(fill=medal))+
  geom_text(aes(label= n,group=medal),size=3 ,position = position_stack(vjust = 0.5))+
  geom_text(aes(label = paste0("(",stat(x),")" ) , group=brewery), stat = 'summary', fun = sum,col="black", nudge_x = 2,size=4,fontface="italic")+
  scale_fill_manual(values=medal_colors)+
  theme_minimal()+
  theme(legend.position = "top", 
        panel.grid.major.y = element_blank() )+
  guides(fill = guide_legend(reverse=T))+
  labs(title="Top 10 Breweries by Total Medal Count",subtitle = "Brackets Shows Total",x="Medal Count",y="Brewery",fill="Medal")

Pabst Brewing CO were the most successful brewery in terms of the number of medals.

beer_awards_fct %>%
  group_by(brewery,state,medal)%>%
  count()%>%
  ungroup(medal) %>%
  mutate(Total = sum(n)) %>%
  arrange(-Total)%>%
  ungroup() %>%
  slice_max(n=30, order_by=Total)  %>%    # Slightly hacky - use n=30 because Each Brewery can get 3 medals,and we want the top 10.
  ggplot(aes(x=n,y=reorder(brewery,n,FUN=sum)))+
  geom_col(aes(fill=medal))+
  geom_text(aes(label= n,group=medal),size=3 ,position = position_stack(vjust = 0.5))+
  geom_text(aes(label = paste0("(",stat(x),")" ) , group=brewery), stat = 'summary', fun = sum,col="black", nudge_x = 2,size=4,fontface="italic")+
  geom_label(aes(x=0, label=state,group=brewery,col=state))+
  scale_fill_manual(values=medal_colors)+
  theme_minimal()+
  theme(legend.position = "top", 
        panel.grid.major.y = element_blank() )+
  guides(fill = guide_legend(reverse=T),col=FALSE)+
  labs(title="Top 10 Individual Breweries by Total Medal Count",subtitle = "With State Labelled",x="Medal Count",y="Brewery",fill="Medal")

This time we separate out the breweries at a state level; meaning we are now looking at individuals breweries at a specific state, rather than a group of them.

Anheuser-Busch is the best performing single brewery from the state of Missouri.

Let’s find which cities were best at crafting beers?

beer_awards_fct %>%
  count(city,state) %>%
  arrange(-n) %>%
  slice_max(order_by=n,n=10) %>%
  ggplot(aes(x=n,y= reorder(city,n),fill=state  ))+
  geom_col()+
  geom_text(aes(label=n),hjust=-0.05)+
  theme_minimal()+
  theme(panel.grid.major.y = element_blank() )+
  labs(title="Which Cities Are Home To The Most Awarded Breweries?",x="Number of Awards",y="City",fill="State")

We see that Denver from the State of Colorado was the most successful, obtaining 145 medals. Other successful cities from Colorado include: Fort Collins, and Golden.

Which Beer won the most?

beer_awards_fct %>%
  count(beer_name,brewery,city,state)%>%
  arrange(-n) %>%
  slice_head(n=10) %>%
  rename("medals_won" = n) %>%
 knitr::kable(caption="Top 10 awarded beers by brewery and city")
Top 10 awarded beers by brewery and city
beer_name brewery city state medals_won
Alaskan Smoked Porter Alaskan Brewing and Bottling Co. Juneau AK 11
Samuel Adams Double Bock Boston Beer Co. Boston MA 9
Raspberry Tart New Glarus Brewing Co. New Glarus WI 8
Abbey Belgian Style Ale New Belgium Brewing Co. Fort Collins CO 7
Kiwanda Cream Ale Pelican Pub & Brewery Pacific City OR 7
Laughing Lab Scottish Ale Bristol Brewing Co. Colorado Springs CO 7
Miller Genuine Draft Miller Brewing Co. Milwaukee WI 7
Coors Light Coors Brewing Co. Golden CO 6
Genesee Cream Ale Genesee/High Falls Brewing Rochester NY 6
Belgian Red New Glarus Brewing Co. New Glarus WI 5

We see that the Alasken Snoked Porter was the best individual drink by brewery and city; winning 11 awards. Let’s see how it performed over the years.

beer_awards_fct %>%
  filter(beer_name == "Alaskan Smoked Porter") %>%
  ggplot(aes(x=year,y=medal,group=1))+
  geom_line(lty=3)+
  geom_point(aes(col=medal),size=6,show.legend = FALSE)+
  theme_classic()+
   scale_color_manual(values=medal_colors)+
  labs(title="The Comeback of Alaskan Smoked Porter",subtitle="In the Catergory of Smoke-Flavored Beer",x="Year",y=NULL)

We see that Alaskan Smoked Porter was very popular during the early to mid 90s, but became less popular over time. It saw a resurgence in popularity in 2005 where it won gold.

  top_beers = beer_awards_fct %>%
  count(beer_name,brewery,city,state)%>%
  slice_max(order_by = n, n=15) %>%
  mutate(beer_name = fct_reorder(beer_name,-n))%>%
  select(beer_name,city)
  
  top_beers %>%
  inner_join(y=beer_awards_fct,by=c("beer_name","city"))%>%
  ggplot(aes(x=year,y=medal,group=1))+
  geom_line(lty=3)+
  geom_point(aes(col=medal),size=3,show.legend = FALSE)+
  facet_wrap(~beer_name)+
  theme_bw()+
   scale_color_manual(values=medal_colors)+
  labs(title="Top Award Winning Beers",subtitle="Performance over Time For Beers Made By Unique Breweries In Cities",x="Year",y=NULL)

And finally, we see can see how the top awarding winning beers performed over time. We see that that Samuel Adams Double Bock performed very well and was the second best beer crafted by an individual brewery, it won the most amount of gold medals. We can also see that Belgian Red had all but one if it’s medals as gold, showing a very high performance in the rankings.