Exam 1

This exam will focus on some more fun data sets, namely one based upon beer styles. There is a description of the data as a presentation here. That data file has 100 distinct styles where each row represent a style and each column represents a particular parameter. In general, styles of beer are defined by a range of values (minimum - maximum) of - Alcohol by volume (ABV) - International Biterness Units (IBU), - Color (SRM), - Starting sugar content (original gravity OG), and - Sugar content after fermentation (final gravity FG)

# Load the data
url <- ( "https://docs.google.com/spreadsheets/d/e/2PACX-1vSokQfpoDuwZcllv3gCGOFtdT2OtZ32lQY_729TfQ_36YuTl38YwUvoOtf4092sX7AL9rDeFEfr2pfK/pub?output=csv")
BeerstylesData <- read.csv( url )

# Loading libraries I'll need
library( tidyverse )
## -- Attaching packages ------------------------------------ tidyverse 1.2.1 --
## v ggplot2 3.2.1     v purrr   0.3.2
## v tibble  2.1.3     v dplyr   0.8.3
## v tidyr   0.8.3     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0
## -- Conflicts --------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library( ggrepel )

Question 1

While each type of charactistic is given as both minimum and maximum values, create new columns of data that represent the mid-point between the minimum and maximum. Remove the minimum and maximum values and we will use the mid-point for the rest of the questions.

# Define midpoint for all characteristics

BeerstylesData <- tbl_df( BeerstylesData )
BeerstylesData.Mid <- mutate( BeerstylesData, ABV_Mid = (ABV_Max - ABV_Min)/2, IBU_Mid = (IBU_Max - IBU_Min)/2, SRM_Mid = (SRM_Max - SRM_Min)/2, OG_Mid = (OG_Max-OG_Min)/2, FG_Mid = (FG_Max-FG_Min)/2)

# Remove the min & max columns

B <- select( BeerstylesData.Mid, Styles, Yeast, ABV_Mid, IBU_Mid, SRM_Mid, OG_Mid, FG_Mid)

Question 2

The process of brewing involves taking grains and extracting the starch from them. Then, in a series of temperature rests where the grain is heated by the brewer, the proteins in those grains convert the starch into sugar. This is the food for the yeast during fermentation. The amount and style of starting grains determines the style of beer. To begin exploring these styles, make a plot of the density of starting (OG) and ending (FG) gravities.

# Density plots of OG and FG

x <- data.frame( OG_Mid=rnorm( 100 ), FG_Mid=rnorm( 100,1,1 ) )
library( ggplot2 );library( reshape2 )
## 
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
## 
##     smiths
data <- melt( x )
## No id variables; using all as measure variables
ggplot( data, aes( x = value, fill = variable )) + geom_density( alpha = 0.25 )

ggplot( data, aes( x = value, fill = variable )) + geom_density( alpha = 0.25)

Question 3

Is there a relationship between OG and FG. If you make a beer that is very dense in sugars and give it to the yeast, does it always make a more sweet beer (higher FG) after fermenation or a more dry beer (lower FG) after fermentation. Create a plot of FG as a function of OG. Does there appear to be a relationship? Describe your answer in words using the output of the figure as a reference.

# Scatter plot of FG ~ OG 
library(tidyverse)
as.tibble(B)
## Warning: `as.tibble()` is deprecated, use `as_tibble()` (but mind the new semantics).
## This warning is displayed once per session.
## # A tibble: 100 x 7
##    Styles                    Yeast  ABV_Mid IBU_Mid SRM_Mid  OG_Mid  FG_Mid
##    <fct>                     <fct>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
##  1 American Light Lager      Lager    0.7       2      0.5  0.006   0.005  
##  2 American Lager            Lager    0.550     5      1    0.005   0.003  
##  3 Cream Ale                 Ale      0.700     6      1.25 0.00650 0.003  
##  4 American Wheat Beer       Either   0.75      7.5    1.5  0.00750 0.00250
##  5 International Pale Lager  Either   0.7       3.5    2    0.004   0.002  
##  6 International Amber Lager Lager    0.7       8.5    3.5  0.00650 0.003  
##  7 International Dark Lager  Lager    0.900     6      4    0.006   0.002  
##  8 Czech Pale Lager          Lager    0.550     7.5    1.5  0.008   0.003  
##  9 Czech Premium Pale Lager  Lager    0.800     7.5    1.25 0.008   0.002  
## 10 Czech Amber Lager         Lager    0.700     7.5    3    0.008   0.002  
## # ... with 90 more rows
ggplot(B, aes( FG_Mid, OG_Mid, color = Yeast )) +
        geom_point( aes( color =  Yeast )) +
        labs( 
        x = "Final Gravity",
        y = "Original Gravity"
        )

There is a strong relationship between Original Gravity and the Final Gravity for beer. The graph shows that in each case a higher original gravity will yield on average a dryer beer (lower FG) after fermentation.

Question 4

The strength of a beer is dependent upon the amount of sugar in it before fermentation–more available sugar means more fuel. I’m sure you all remember the fermentation equation \(C_6H_{12}O_6 \to 2C_2H_5OH + 2CO\) (sugar \(\to\) alcohol and carbon monoxide).

Make a plot of the final amount of alcohol by volume as a function of starting gravity (OG). For this plot, color the yeast types (ale vs. lager) differently and label all the styles whose starting gravity exceeds OG > 1.09.

# Beer Strength plot

ggplot(B, aes( ABV_Mid, OG_Mid, color = Yeast )) +
        geom_point( aes( color =  Yeast )) +
        labs( 
        x = "Alcohol By Volume",
        y = "Original Gravity"
        )

# Now I need to label all the styles whose starting gravity exceeds 1.09

styles2label <- c("Specialty IPA - Belgian IPA", "English IPA", "American Stout", "Belgian Golden Strong Ale", "British Strong Ale", "Weizenbock", "American Strong Ale", "Baltic Porter", "Old Ale", "Oud Bruin",  "Specialty IPA - Black IPA", "Belgian Dark Strong Ale", "Doppelbock", "English Barleywine", "Imperial Stout", "American Barleywine", "Wheatwine", "Eisbock", "Sahti", "Wee Heavy")

df <- B[ B$Styles %in% styles2label, ]
summary(df)
##                        Styles      Yeast       ABV_Mid     
##  American Barleywine      : 1   Ale   :17   Min.   :1.000  
##  American Stout           : 1   Either: 0   1st Qu.:1.500  
##  American Strong Ale      : 1   Lager : 3   Median :1.750  
##  Baltic Porter            : 1               Mean   :1.725  
##  Belgian Dark Strong Ale  : 1               3rd Qu.:2.000  
##  Belgian Golden Strong Ale: 1               Max.   :2.500  
##  (Other)                  :14                              
##     IBU_Mid         SRM_Mid          OG_Mid            FG_Mid        
##  Min.   : 2.50   Min.   :1.500   Min.   :0.01100   Min.   :0.002000  
##  1st Qu.: 7.25   1st Qu.:4.875   1st Qu.:0.01287   1st Qu.:0.003875  
##  Median :12.50   Median :5.750   Median :0.01725   Median :0.004500  
##  Mean   :13.22   Mean   :5.825   Mean   :0.01723   Mean   :0.005125  
##  3rd Qu.:20.00   3rd Qu.:7.000   3rd Qu.:0.02000   3rd Qu.:0.006250  
##  Max.   :25.00   Max.   :9.500   Max.   :0.03000   Max.   :0.011000  
## 
ggplot(B, aes( ABV_Mid, OG_Mid, color = Yeast )) +
        geom_point( aes( color =  Yeast )) +
        labs( 
        x = "Alcohol By Volume",
        y = "Original Gravity"
        ) +
        geom_text_repel( aes(label=Styles), data=df, force=75, size=3 ) + 
        xlim(c(0,3)) + ylim(c(0.0,0.03))

Question 5

There is a notion in the public that dark beers are strong. Make a plot of color (SRM) and alcohol content (ABV) and describe this relationship in words.

# Plot of color and strength
ggplot(B, aes( SRM_Mid, ABV_Mid, color = Yeast )) +
        geom_point( aes( color =  Yeast )) +
        labs( 
        x = "Color",
        y = "Alcohol By Volume"
        )

Darker color beer does not mean a higher alcohol content. From the plot, we can see that the color of the beer has no correlation with the content of alcohol in the beer. Dark beers registered high and low alcohol by volume as well as the light beers. This is shown in the weak relationship between the two variables in the plot.