About the Data:
The Beer Profile and Ratings dataset from Kaggle was used for the project. The main data set (beer_profile_and_ratings.csv) contains the following columns: (General) • Name: Beer name (label) • Style: Beer Style • Brewery: Brewery name • Beer Name: Complete beer name (Brewery + Brew Name) • Description: Notes on the beer if available • ABV: Alcohol content of beer (% by volume) • Min IBU: The minimum IBU value each beer can possess • Max IBU: The maximum IBU value each beer can possess
(Mouth feel) • Astringency • Body • Alcohol (Taste) • Bitter • Sweet •Sour • Salty (Flavor And Aroma) • Fruits • Hoppy • Spices • Malty
(Reviews) • review_aroma • review_appearance • review_palate •review_taste • review_overall • number_of_reviews
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(reshape2)
##
## Attaching package: 'reshape2'
##
## The following object is masked from 'package:tidyr':
##
## smiths
library(dplyr)
library(gridExtra)
##
## Attaching package: 'gridExtra'
##
## The following object is masked from 'package:dplyr':
##
## combine
library(boot)
library(pwr)
# Loading the dataset in beers data frame
beers <- read.csv("/Users/bhavyakalra/Desktop/Stats R/Final_project_beer/beer_profile_and_ratings.csv")
head(beers)
## Name Style
## 1 Amber Altbier
## 2 Double Bag Altbier
## 3 Long Trail Ale Altbier
## 4 Doppelsticke Altbier
## 5 Sleigh'r Dark Doüble Alt Ale Altbier
## 6 Sticke Altbier
## Brewery
## 1 Alaskan Brewing Co.
## 2 Long Trail Brewing Co.
## 3 Long Trail Brewing Co.
## 4 Uerige Obergärige Hausbrauerei GmbH / Zum Uerige
## 5 Ninkasi Brewing Company
## 6 Uerige Obergärige Hausbrauerei GmbH / Zum Uerige
## Beer.Name..Full.
## 1 Alaskan Brewing Co. Alaskan Amber
## 2 Long Trail Brewing Co. Double Bag
## 3 Long Trail Brewing Co. Long Trail Ale
## 4 Uerige Obergärige Hausbrauerei GmbH / Zum Uerige Uerige Doppelsticke
## 5 Ninkasi Brewing Company Sleigh'r Dark Doüble Alt Ale
## 6 Uerige Obergärige Hausbrauerei GmbH / Zum Uerige Uerige Sticke
## Description
## 1 Notes:Richly malty and long on the palate, with just enough hop backing to make this beautiful amber colored "alt" style beer notably well balanced.\\t
## 2 Notes:This malty, full-bodied double alt is also known as “Stickebier” – German slang for “secret brew”. Long Trail Double Bag was originally offered only in our brewery taproom as a special treat to our visitors. With an alcohol content of 7.2%, please indulge in moderation. The Long Trail Brewing Company is proud to have Double Bag named Malt Advocate’s “Beer of the Year” in 2001. Malt Advocate is a national magazine devoted to “expanding the boundaries of fine drinks”. Their panel of judges likes to keep things simple, and therefore of thousands of eligible competitors they award only two categories: “Imported” and “Domestic”. It is a great honor to receive this recognition.33 IBU\\t
## 3 Notes:Long Trail Ale is a full-bodied amber ale modeled after the “Alt-biers” of Düsseldorf, Germany. Our top fermenting yeast and cold finishing temperature result in a complex, yet clean, full flavor. Originally introduced in November of 1989, Long Trail Ale beer quickly became, and remains, the largest selling craft-brew in Vermont. It is a multiple medal winner at the Great American Beer Festival.25 IBU\\t
## 4 Notes:
## 5 Notes:Called 'Dark Double Alt' on the label.Seize the season with Sleigh'r. Layers of deeply toasted malt are balanced by just enough hop bitterness to make it deceivingly drinkable. Paired with a dry finish, Sleigh’r is anything but your typical winter brew.An Alt ferments with Ale yeast at colder lagering temperatures. This effect gives Alts a more refined, crisp lager-like flavor than traditional ales. The Alt has been “Ninkasified” raising the ABV and IBUs. Sleigh'r has a deep, toasted malt flavor that finishes dry and balanced.50 IBU\\t
## 6 Notes:
## ABV Min.IBU Max.IBU Astringency Body Alcohol Bitter Sweet Sour Salty Fruits
## 1 5.3 25 50 13 32 9 47 74 33 0 33
## 2 7.2 25 50 12 57 18 33 55 16 0 24
## 3 5.0 25 50 14 37 6 42 43 11 0 10
## 4 8.5 25 50 13 55 31 47 101 18 1 49
## 5 7.2 25 50 25 51 26 44 45 9 1 11
## 6 6.0 25 50 22 45 13 46 62 25 1 34
## Hoppy Spices Malty review_aroma review_appearance review_palate review_taste
## 1 57 8 111 3.498994 3.636821 3.556338 3.643863
## 2 35 12 84 3.798337 3.846154 3.904366 4.024948
## 3 54 4 62 3.409814 3.667109 3.600796 3.631300
## 4 40 16 119 4.148098 4.033967 4.150815 4.205163
## 5 51 20 95 3.625000 3.973958 3.734375 3.765625
## 6 60 4 103 4.007937 4.007937 4.087302 4.192063
## review_overall number_of_reviews
## 1 3.847082 497
## 2 4.034304 481
## 3 3.830239 377
## 4 4.005435 368
## 5 3.817708 96
## 6 4.230159 315
beers_num <- subset(beers, select = c(ABV, Min.IBU, Max.IBU, Astringency, Body, Alcohol, Bitter, Sweet, Sour, Salty, Fruits, Hoppy, Spices, Malty))
correlation_matrix <- cor(beers_num)
# Print the correlation matrix
print(correlation_matrix)
## ABV Min.IBU Max.IBU Astringency Body
## ABV 1.00000000 0.43200478 0.50103728 -0.16952052 0.24167312
## Min.IBU 0.43200478 1.00000000 0.85424806 -0.07150127 0.32533790
## Max.IBU 0.50103728 0.85424806 1.00000000 -0.12027328 0.31061666
## Astringency -0.16952052 -0.07150127 -0.12027328 1.00000000 -0.05953971
## Body 0.24167312 0.32533790 0.31061666 -0.05953971 1.00000000
## Alcohol 0.65490813 0.32369408 0.39281140 -0.17198688 0.26888501
## Bitter 0.06738792 0.53945216 0.47808036 0.11468598 0.54223642
## Sweet 0.46348750 0.22713878 0.27729218 -0.02145640 0.45884180
## Sour 0.10079488 -0.07309796 -0.04327508 0.57102991 -0.12673331
## Salty -0.12008883 -0.05751177 -0.08321352 0.34715504 -0.09927735
## Fruits 0.29100109 0.06633532 0.17292851 0.34523213 -0.04815457
## Hoppy -0.05259620 0.40747532 0.34516752 0.33095085 0.07013823
## Spices 0.19146831 -0.04615179 0.04453346 -0.08379502 0.18512299
## Malty 0.16206018 0.30004067 0.28821916 -0.08208537 0.75422818
## Alcohol Bitter Sweet Sour Salty
## ABV 0.654908132 0.067387915 0.46348750 0.100794883 -0.120088829
## Min.IBU 0.323694076 0.539452155 0.22713878 -0.073097956 -0.057511768
## Max.IBU 0.392811402 0.478080363 0.27729218 -0.043275080 -0.083213518
## Astringency -0.171986878 0.114685977 -0.02145640 0.571029913 0.347155038
## Body 0.268885007 0.542236421 0.45884180 -0.126733314 -0.099277352
## Alcohol 1.000000000 0.009087782 0.52703889 0.048767388 -0.094329293
## Bitter 0.009087782 1.000000000 0.09170547 -0.136913688 0.004692825
## Sweet 0.527038889 0.091705467 1.00000000 0.257912561 -0.131917834
## Sour 0.048767388 -0.136913688 0.25791256 1.000000000 0.098172842
## Salty -0.094329293 0.004692825 -0.13191783 0.098172842 1.000000000
## Fruits 0.254299063 -0.093449864 0.48202994 0.785882542 0.026919585
## Hoppy -0.079949288 0.712886753 -0.03432745 0.068894607 0.172606178
## Spices 0.252875793 -0.084048103 0.10754762 0.001831036 -0.023078823
## Malty 0.270105608 0.565570029 0.47103197 -0.303266373 -0.028241289
## Fruits Hoppy Spices Malty
## ABV 0.29100109 -0.05259620 0.191468311 0.16206018
## Min.IBU 0.06633532 0.40747532 -0.046151786 0.30004067
## Max.IBU 0.17292851 0.34516752 0.044533456 0.28821916
## Astringency 0.34523213 0.33095085 -0.083795019 -0.08208537
## Body -0.04815457 0.07013823 0.185122992 0.75422818
## Alcohol 0.25429906 -0.07994929 0.252875793 0.27010561
## Bitter -0.09344986 0.71288675 -0.084048103 0.56557003
## Sweet 0.48202994 -0.03432745 0.107547623 0.47103197
## Sour 0.78588254 0.06889461 0.001831036 -0.30326637
## Salty 0.02691958 0.17260618 -0.023078823 -0.02824129
## Fruits 1.00000000 0.11040734 0.148281264 -0.19688969
## Hoppy 0.11040734 1.00000000 -0.131963707 0.19576698
## Spices 0.14828126 -0.13196371 1.000000000 0.06139856
## Malty -0.19688969 0.19576698 0.061398556 1.00000000
sour_threshold <- 20
beers$IsSour <- ifelse(beers$Sour > sour_threshold, 1, 0)
head(beers)
## Name Style
## 1 Amber Altbier
## 2 Double Bag Altbier
## 3 Long Trail Ale Altbier
## 4 Doppelsticke Altbier
## 5 Sleigh'r Dark Doüble Alt Ale Altbier
## 6 Sticke Altbier
## Brewery
## 1 Alaskan Brewing Co.
## 2 Long Trail Brewing Co.
## 3 Long Trail Brewing Co.
## 4 Uerige Obergärige Hausbrauerei GmbH / Zum Uerige
## 5 Ninkasi Brewing Company
## 6 Uerige Obergärige Hausbrauerei GmbH / Zum Uerige
## Beer.Name..Full.
## 1 Alaskan Brewing Co. Alaskan Amber
## 2 Long Trail Brewing Co. Double Bag
## 3 Long Trail Brewing Co. Long Trail Ale
## 4 Uerige Obergärige Hausbrauerei GmbH / Zum Uerige Uerige Doppelsticke
## 5 Ninkasi Brewing Company Sleigh'r Dark Doüble Alt Ale
## 6 Uerige Obergärige Hausbrauerei GmbH / Zum Uerige Uerige Sticke
## Description
## 1 Notes:Richly malty and long on the palate, with just enough hop backing to make this beautiful amber colored "alt" style beer notably well balanced.\\t
## 2 Notes:This malty, full-bodied double alt is also known as “Stickebier” – German slang for “secret brew”. Long Trail Double Bag was originally offered only in our brewery taproom as a special treat to our visitors. With an alcohol content of 7.2%, please indulge in moderation. The Long Trail Brewing Company is proud to have Double Bag named Malt Advocate’s “Beer of the Year” in 2001. Malt Advocate is a national magazine devoted to “expanding the boundaries of fine drinks”. Their panel of judges likes to keep things simple, and therefore of thousands of eligible competitors they award only two categories: “Imported” and “Domestic”. It is a great honor to receive this recognition.33 IBU\\t
## 3 Notes:Long Trail Ale is a full-bodied amber ale modeled after the “Alt-biers” of Düsseldorf, Germany. Our top fermenting yeast and cold finishing temperature result in a complex, yet clean, full flavor. Originally introduced in November of 1989, Long Trail Ale beer quickly became, and remains, the largest selling craft-brew in Vermont. It is a multiple medal winner at the Great American Beer Festival.25 IBU\\t
## 4 Notes:
## 5 Notes:Called 'Dark Double Alt' on the label.Seize the season with Sleigh'r. Layers of deeply toasted malt are balanced by just enough hop bitterness to make it deceivingly drinkable. Paired with a dry finish, Sleigh’r is anything but your typical winter brew.An Alt ferments with Ale yeast at colder lagering temperatures. This effect gives Alts a more refined, crisp lager-like flavor than traditional ales. The Alt has been “Ninkasified” raising the ABV and IBUs. Sleigh'r has a deep, toasted malt flavor that finishes dry and balanced.50 IBU\\t
## 6 Notes:
## ABV Min.IBU Max.IBU Astringency Body Alcohol Bitter Sweet Sour Salty Fruits
## 1 5.3 25 50 13 32 9 47 74 33 0 33
## 2 7.2 25 50 12 57 18 33 55 16 0 24
## 3 5.0 25 50 14 37 6 42 43 11 0 10
## 4 8.5 25 50 13 55 31 47 101 18 1 49
## 5 7.2 25 50 25 51 26 44 45 9 1 11
## 6 6.0 25 50 22 45 13 46 62 25 1 34
## Hoppy Spices Malty review_aroma review_appearance review_palate review_taste
## 1 57 8 111 3.498994 3.636821 3.556338 3.643863
## 2 35 12 84 3.798337 3.846154 3.904366 4.024948
## 3 54 4 62 3.409814 3.667109 3.600796 3.631300
## 4 40 16 119 4.148098 4.033967 4.150815 4.205163
## 5 51 20 95 3.625000 3.973958 3.734375 3.765625
## 6 60 4 103 4.007937 4.007937 4.087302 4.192063
## review_overall number_of_reviews IsSour
## 1 3.847082 497 1
## 2 4.034304 481 0
## 3 3.830239 377 0
## 4 4.005435 368 0
## 5 3.817708 96 0
## 6 4.230159 315 1
# Counting the Binary values in the IsSour variable
value_counts <- table(beers$IsSour)
print(value_counts)
##
## 0 1
## 1510 1687
# Logistic regression on IsSour function with ABV as the explanatory variable
model <- glm(IsSour ~ ABV, data = beers, family = "binomial")
summary(model)
##
## Call:
## glm(formula = IsSour ~ ABV, family = "binomial", data = beers)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.04389 0.11446 -9.12 <2e-16 ***
## ABV 0.17888 0.01703 10.51 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 4422.2 on 3196 degrees of freedom
## Residual deviance: 4297.3 on 3195 degrees of freedom
## AIC: 4301.3
##
## Number of Fisher Scoring iterations: 3
# Scatterplot of ABV vs. Probability of being sour
ggplot(beers, aes(ABV, IsSour)) +
geom_smooth(method = "glm", method.args = list(family = "binomial")) +
ylab("Probability of Being Sour") +
xlab("ABV (Alcohol by Volume)")
## `geom_smooth()` using formula = 'y ~ x'
# Standard error of the 'ABV' coefficient
se_ABV <- summary(model)$coefficients["ABV", "Std. Error"]
alpha <- 0.05 # confidence level
z <- qnorm(1 - alpha/2)
coef_ABV <- coef(model)["ABV"]
margin_error <- z * se_ABV
lower_bound <- coef_ABV - margin_error
upper_bound <- coef_ABV + margin_error
cat("95% Confidence Interval for 'ABV' Coefficient:", lower_bound, ", ", upper_bound, "\n")
## 95% Confidence Interval for 'ABV' Coefficient: 0.1455155 , 0.2122537
# Model with multiple explanatory variables
model_multi <- glm(IsSour ~ Sour + ABV + Bitter + Sweet + Fruits, data = beers, family = "binomial")
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(model_multi)
##
## Call:
## glm(formula = IsSour ~ Sour + ABV + Bitter + Sweet + Fruits,
## family = "binomial", data = beers)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.918e+02 1.091e+04 -0.063 0.949
## Sour 3.374e+01 5.303e+02 0.064 0.949
## ABV 7.823e-02 1.521e+02 0.001 1.000
## Bitter -8.691e-03 1.350e+01 -0.001 0.999
## Sweet 2.333e-03 1.134e+01 0.000 1.000
## Fruits -6.022e-03 1.750e+01 0.000 1.000
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 4.4222e+03 on 3196 degrees of freedom
## Residual deviance: 1.0487e-05 on 3191 degrees of freedom
## AIC: 12
##
## Number of Fisher Scoring iterations: 25
# Applying a square root transformation to 'Fruits' variable.
beers$sqrt_Fruits <- sqrt(beers$Fruits)
model <- glm(IsSour ~ sqrt_Fruits, family = "binomial", data = beers)
summary(model)
##
## Call:
## glm(formula = IsSour ~ sqrt_Fruits, family = "binomial", data = beers)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.52569 0.23159 -28.18 <2e-16 ***
## sqrt_Fruits 1.25022 0.04361 28.67 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 4422.2 on 3196 degrees of freedom
## Residual deviance: 1916.7 on 3195 degrees of freedom
## AIC: 1920.7
##
## Number of Fisher Scoring iterations: 6
## Why we need to apply the sqrt transformation:
ggplot(beers, aes(x = Fruits, y = Sour)) +
geom_point() +
geom_smooth(method = 'lm', color = 'blue', linetype = 'dashed') +
labs(x = "Square Root of Fruits", y = "Sourness") +
ggtitle("Relationship Between Sourness and Fruits")
## `geom_smooth()` using formula = 'y ~ x'
## Sqrt transformation:
beers$sqrt_Sour <- sqrt(beers$Sour)
ggplot(beers, aes(x = sqrt_Fruits, y = sqrt_Sour)) +
geom_point() +
geom_smooth(method = 'lm', color = 'blue', linetype = 'dashed') +
labs(x = "Square Root of Fruits", y = "Sourness") +
ggtitle("Relationship Between Sourness and Square Root of Fruits")
## `geom_smooth()` using formula = 'y ~ x'