This exam will focus on some more fun data sets, namely one based upon beer styles. There is a description of the data as a presentation here. That data file has 100 distinct styles where each row represent a style and each column represents a particular parameter. In general, styles of beer are defined by a range of values (minimum - maximum) of - Alcohol by volume (ABV) - International Biterness Units (IBU), - Color (SRM), - Starting sugar content (original gravity OG), and - Sugar content after fermentation (final gravity FG)
# Load the data
url <- ( "https://docs.google.com/spreadsheets/d/e/2PACX-1vSokQfpoDuwZcllv3gCGOFtdT2OtZ32lQY_729TfQ_36YuTl38YwUvoOtf4092sX7AL9rDeFEfr2pfK/pub?output=csv")
BeerstylesData <- read.csv( url )
# Loading libraries I'll need
library( tidyverse )
## -- Attaching packages ------------------------------------ tidyverse 1.2.1 --
## v ggplot2 3.2.1 v purrr 0.3.2
## v tibble 2.1.3 v dplyr 0.8.3
## v tidyr 0.8.3 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## -- Conflicts --------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library( ggrepel )
While each type of charactistic is given as both minimum and maximum values, create new columns of data that represent the mid-point between the minimum and maximum. Remove the minimum and maximum values and we will use the mid-point for the rest of the questions.
# Define midpoint for all characteristics
BeerstylesData <- tbl_df( BeerstylesData )
BeerstylesData.Mid <- mutate( BeerstylesData, ABV_Mid = (ABV_Max - ABV_Min)/2, IBU_Mid = (IBU_Max - IBU_Min)/2, SRM_Mid = (SRM_Max - SRM_Min)/2, OG_Mid = (OG_Max-OG_Min)/2, FG_Mid = (FG_Max-FG_Min)/2)
# Remove the min & max columns
B <- select( BeerstylesData.Mid, Styles, Yeast, ABV_Mid, IBU_Mid, SRM_Mid, OG_Mid, FG_Mid)
The process of brewing involves taking grains and extracting the starch from them. Then, in a series of temperature rests where the grain is heated by the brewer, the proteins in those grains convert the starch into sugar. This is the food for the yeast during fermentation. The amount and style of starting grains determines the style of beer. To begin exploring these styles, make a plot of the density of starting (OG) and ending (FG) gravities.
# Density plots of OG and FG
x <- data.frame( OG_Mid=rnorm( 100 ), FG_Mid=rnorm( 100,1,1 ) )
library( ggplot2 );library( reshape2 )
##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
## smiths
data <- melt( x )
## No id variables; using all as measure variables
ggplot( data, aes( x = value, fill = variable )) + geom_density( alpha = 0.25 )
ggplot( data, aes( x = value, fill = variable )) + geom_density( alpha = 0.25)
Is there a relationship between OG and FG. If you make a beer that is very dense in sugars and give it to the yeast, does it always make a more sweet beer (higher FG) after fermenation or a more dry beer (lower FG) after fermentation. Create a plot of FG as a function of OG. Does there appear to be a relationship? Describe your answer in words using the output of the figure as a reference.
# Scatter plot of FG ~ OG
library(tidyverse)
as.tibble(B)
## Warning: `as.tibble()` is deprecated, use `as_tibble()` (but mind the new semantics).
## This warning is displayed once per session.
## # A tibble: 100 x 7
## Styles Yeast ABV_Mid IBU_Mid SRM_Mid OG_Mid FG_Mid
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 American Light Lager Lager 0.7 2 0.5 0.006 0.005
## 2 American Lager Lager 0.550 5 1 0.005 0.003
## 3 Cream Ale Ale 0.700 6 1.25 0.00650 0.003
## 4 American Wheat Beer Either 0.75 7.5 1.5 0.00750 0.00250
## 5 International Pale Lager Either 0.7 3.5 2 0.004 0.002
## 6 International Amber Lager Lager 0.7 8.5 3.5 0.00650 0.003
## 7 International Dark Lager Lager 0.900 6 4 0.006 0.002
## 8 Czech Pale Lager Lager 0.550 7.5 1.5 0.008 0.003
## 9 Czech Premium Pale Lager Lager 0.800 7.5 1.25 0.008 0.002
## 10 Czech Amber Lager Lager 0.700 7.5 3 0.008 0.002
## # ... with 90 more rows
ggplot(B, aes( FG_Mid, OG_Mid, color = Yeast )) +
geom_point( aes( color = Yeast )) +
labs(
x = "Final Gravity",
y = "Original Gravity"
)
The strength of a beer is dependent upon the amount of sugar in it before fermentation–more available sugar means more fuel. I’m sure you all remember the fermentation equation \(C_6H_{12}O_6 \to 2C_2H_5OH + 2CO\) (sugar \(\to\) alcohol and carbon monoxide).
Make a plot of the final amount of alcohol by volume as a function of starting gravity (OG). For this plot, color the yeast types (ale vs. lager) differently and label all the styles whose starting gravity exceeds OG > 1.09.
# Beer Strength plot
ggplot(B, aes( ABV_Mid, OG_Mid, color = Yeast )) +
geom_point( aes( color = Yeast )) +
labs(
x = "Alcohol By Volume",
y = "Original Gravity"
)
# Now I need to label all the styles whose starting gravity exceeds 1.09
styles2label <- c("Specialty IPA - Belgian IPA", "English IPA", "American Stout", "Belgian Golden Strong Ale", "British Strong Ale", "Weizenbock", "American Strong Ale", "Baltic Porter", "Old Ale", "Oud Bruin", "Specialty IPA - Black IPA", "Belgian Dark Strong Ale", "Doppelbock", "English Barleywine", "Imperial Stout", "American Barleywine", "Wheatwine", "Eisbock", "Sahti", "Wee Heavy")
df <- B[ B$Styles %in% styles2label, ]
summary(df)
## Styles Yeast ABV_Mid
## American Barleywine : 1 Ale :17 Min. :1.000
## American Stout : 1 Either: 0 1st Qu.:1.500
## American Strong Ale : 1 Lager : 3 Median :1.750
## Baltic Porter : 1 Mean :1.725
## Belgian Dark Strong Ale : 1 3rd Qu.:2.000
## Belgian Golden Strong Ale: 1 Max. :2.500
## (Other) :14
## IBU_Mid SRM_Mid OG_Mid FG_Mid
## Min. : 2.50 Min. :1.500 Min. :0.01100 Min. :0.002000
## 1st Qu.: 7.25 1st Qu.:4.875 1st Qu.:0.01287 1st Qu.:0.003875
## Median :12.50 Median :5.750 Median :0.01725 Median :0.004500
## Mean :13.22 Mean :5.825 Mean :0.01723 Mean :0.005125
## 3rd Qu.:20.00 3rd Qu.:7.000 3rd Qu.:0.02000 3rd Qu.:0.006250
## Max. :25.00 Max. :9.500 Max. :0.03000 Max. :0.011000
##
ggplot(B, aes( ABV_Mid, OG_Mid, color = Yeast )) +
geom_point( aes( color = Yeast )) +
labs(
x = "Alcohol By Volume",
y = "Original Gravity"
) +
geom_text_repel( aes(label=Styles), data=df, force=75, size=3 ) +
xlim(c(0,3)) + ylim(c(0.0,0.03))
There is a notion in the public that dark beers are strong. Make a plot of color (SRM) and alcohol content (ABV) and describe this relationship in words.
# Plot of color and strength
ggplot(B, aes( SRM_Mid, ABV_Mid, color = Yeast )) +
geom_point( aes( color = Yeast )) +
labs(
x = "Color",
y = "Alcohol By Volume"
)