Beer Profile and Ratings Analysis

About the Data:

The Beer Profile and Ratings dataset from Kaggle was used for the project. The main data set (beer_profile_and_ratings.csv) contains the following columns: (General) • Name: Beer name (label) • Style: Beer Style • Brewery: Brewery name • Beer Name: Complete beer name (Brewery + Brew Name) • Description: Notes on the beer if available • ABV: Alcohol content of beer (% by volume) • Min IBU: The minimum IBU value each beer can possess • Max IBU: The maximum IBU value each beer can possess

(Mouth feel) • Astringency • Body • Alcohol (Taste) • Bitter • Sweet •Sour • Salty (Flavor And Aroma) • Fruits • Hoppy • Spices • Malty

(Reviews) • review_aroma • review_appearance • review_palate •review_taste • review_overall • number_of_reviews

Loading the libraries

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(reshape2)
## 
## Attaching package: 'reshape2'
## 
## The following object is masked from 'package:tidyr':
## 
##     smiths
library(dplyr)
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## 
## The following object is masked from 'package:dplyr':
## 
##     combine
library(boot)
library(pwr)
# Loading the dataset in beers data frame

beers <- read.csv("/Users/bhavyakalra/Desktop/Stats R/Final_project_beer/beer_profile_and_ratings.csv")
head(beers)
##                           Name   Style
## 1                        Amber Altbier
## 2                   Double Bag Altbier
## 3               Long Trail Ale Altbier
## 4                 Doppelsticke Altbier
## 5 Sleigh'r Dark Doüble Alt Ale Altbier
## 6                       Sticke Altbier
##                                            Brewery
## 1                              Alaskan Brewing Co.
## 2                           Long Trail Brewing Co.
## 3                           Long Trail Brewing Co.
## 4 Uerige Obergärige Hausbrauerei GmbH / Zum Uerige
## 5                          Ninkasi Brewing Company
## 6 Uerige Obergärige Hausbrauerei GmbH / Zum Uerige
##                                                       Beer.Name..Full.
## 1                                    Alaskan Brewing Co. Alaskan Amber
## 2                                    Long Trail Brewing Co. Double Bag
## 3                                Long Trail Brewing Co. Long Trail Ale
## 4 Uerige Obergärige Hausbrauerei GmbH / Zum Uerige Uerige Doppelsticke
## 5                 Ninkasi Brewing Company Sleigh'r Dark Doüble Alt Ale
## 6       Uerige Obergärige Hausbrauerei GmbH / Zum Uerige Uerige Sticke
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Description
## 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Notes:Richly malty and long on the palate, with just enough hop backing to make this beautiful amber colored "alt" style beer notably well balanced.\\t
## 2 Notes:This malty, full-bodied double alt is also known as “Stickebier” – German slang for “secret brew”. Long Trail Double Bag was originally offered only in our brewery taproom as a special treat to our visitors. With an alcohol content of 7.2%, please indulge in moderation. The Long Trail Brewing Company is proud to have Double Bag named Malt Advocate’s “Beer of the Year” in 2001. Malt Advocate is a national magazine devoted to “expanding the boundaries of fine drinks”. Their panel of judges likes to keep things simple, and therefore of thousands of eligible competitors they award only two categories: “Imported” and “Domestic”. It is a great honor to receive this recognition.33 IBU\\t
## 3                                                                                                                                                                                                                                                                                           Notes:Long Trail Ale is a full-bodied amber ale modeled after the “Alt-biers” of Düsseldorf, Germany. Our top fermenting yeast and cold finishing temperature result in a complex, yet clean, full flavor. Originally introduced in November of 1989, Long Trail Ale beer quickly became, and remains, the largest selling craft-brew in Vermont. It is a multiple medal winner at the Great American Beer Festival.25 IBU\\t
## 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Notes:
## 5                                                                                                                                                     Notes:Called 'Dark Double Alt' on the label.Seize the season with Sleigh'r. Layers of deeply toasted malt are balanced by just enough hop bitterness to make it deceivingly drinkable. Paired with a dry finish, Sleigh’r is anything but your typical winter brew.An Alt ferments with Ale yeast at colder lagering temperatures. This effect gives Alts a more refined, crisp lager-like flavor than traditional ales. The Alt has been “Ninkasified” raising the ABV and IBUs. Sleigh'r has a deep, toasted malt flavor that finishes dry and balanced.50 IBU\\t
## 6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Notes:
##   ABV Min.IBU Max.IBU Astringency Body Alcohol Bitter Sweet Sour Salty Fruits
## 1 5.3      25      50          13   32       9     47    74   33     0     33
## 2 7.2      25      50          12   57      18     33    55   16     0     24
## 3 5.0      25      50          14   37       6     42    43   11     0     10
## 4 8.5      25      50          13   55      31     47   101   18     1     49
## 5 7.2      25      50          25   51      26     44    45    9     1     11
## 6 6.0      25      50          22   45      13     46    62   25     1     34
##   Hoppy Spices Malty review_aroma review_appearance review_palate review_taste
## 1    57      8   111     3.498994          3.636821      3.556338     3.643863
## 2    35     12    84     3.798337          3.846154      3.904366     4.024948
## 3    54      4    62     3.409814          3.667109      3.600796     3.631300
## 4    40     16   119     4.148098          4.033967      4.150815     4.205163
## 5    51     20    95     3.625000          3.973958      3.734375     3.765625
## 6    60      4   103     4.007937          4.007937      4.087302     4.192063
##   review_overall number_of_reviews
## 1       3.847082               497
## 2       4.034304               481
## 3       3.830239               377
## 4       4.005435               368
## 5       3.817708                96
## 6       4.230159               315
beers_num <- subset(beers, select = c(ABV, Min.IBU, Max.IBU, Astringency, Body, Alcohol, Bitter, Sweet, Sour, Salty, Fruits, Hoppy, Spices, Malty))
correlation_matrix <- cor(beers_num)

# Print the correlation matrix
print(correlation_matrix)
##                     ABV     Min.IBU     Max.IBU Astringency        Body
## ABV          1.00000000  0.43200478  0.50103728 -0.16952052  0.24167312
## Min.IBU      0.43200478  1.00000000  0.85424806 -0.07150127  0.32533790
## Max.IBU      0.50103728  0.85424806  1.00000000 -0.12027328  0.31061666
## Astringency -0.16952052 -0.07150127 -0.12027328  1.00000000 -0.05953971
## Body         0.24167312  0.32533790  0.31061666 -0.05953971  1.00000000
## Alcohol      0.65490813  0.32369408  0.39281140 -0.17198688  0.26888501
## Bitter       0.06738792  0.53945216  0.47808036  0.11468598  0.54223642
## Sweet        0.46348750  0.22713878  0.27729218 -0.02145640  0.45884180
## Sour         0.10079488 -0.07309796 -0.04327508  0.57102991 -0.12673331
## Salty       -0.12008883 -0.05751177 -0.08321352  0.34715504 -0.09927735
## Fruits       0.29100109  0.06633532  0.17292851  0.34523213 -0.04815457
## Hoppy       -0.05259620  0.40747532  0.34516752  0.33095085  0.07013823
## Spices       0.19146831 -0.04615179  0.04453346 -0.08379502  0.18512299
## Malty        0.16206018  0.30004067  0.28821916 -0.08208537  0.75422818
##                  Alcohol       Bitter       Sweet         Sour        Salty
## ABV          0.654908132  0.067387915  0.46348750  0.100794883 -0.120088829
## Min.IBU      0.323694076  0.539452155  0.22713878 -0.073097956 -0.057511768
## Max.IBU      0.392811402  0.478080363  0.27729218 -0.043275080 -0.083213518
## Astringency -0.171986878  0.114685977 -0.02145640  0.571029913  0.347155038
## Body         0.268885007  0.542236421  0.45884180 -0.126733314 -0.099277352
## Alcohol      1.000000000  0.009087782  0.52703889  0.048767388 -0.094329293
## Bitter       0.009087782  1.000000000  0.09170547 -0.136913688  0.004692825
## Sweet        0.527038889  0.091705467  1.00000000  0.257912561 -0.131917834
## Sour         0.048767388 -0.136913688  0.25791256  1.000000000  0.098172842
## Salty       -0.094329293  0.004692825 -0.13191783  0.098172842  1.000000000
## Fruits       0.254299063 -0.093449864  0.48202994  0.785882542  0.026919585
## Hoppy       -0.079949288  0.712886753 -0.03432745  0.068894607  0.172606178
## Spices       0.252875793 -0.084048103  0.10754762  0.001831036 -0.023078823
## Malty        0.270105608  0.565570029  0.47103197 -0.303266373 -0.028241289
##                  Fruits       Hoppy       Spices       Malty
## ABV          0.29100109 -0.05259620  0.191468311  0.16206018
## Min.IBU      0.06633532  0.40747532 -0.046151786  0.30004067
## Max.IBU      0.17292851  0.34516752  0.044533456  0.28821916
## Astringency  0.34523213  0.33095085 -0.083795019 -0.08208537
## Body        -0.04815457  0.07013823  0.185122992  0.75422818
## Alcohol      0.25429906 -0.07994929  0.252875793  0.27010561
## Bitter      -0.09344986  0.71288675 -0.084048103  0.56557003
## Sweet        0.48202994 -0.03432745  0.107547623  0.47103197
## Sour         0.78588254  0.06889461  0.001831036 -0.30326637
## Salty        0.02691958  0.17260618 -0.023078823 -0.02824129
## Fruits       1.00000000  0.11040734  0.148281264 -0.19688969
## Hoppy        0.11040734  1.00000000 -0.131963707  0.19576698
## Spices       0.14828126 -0.13196371  1.000000000  0.06139856
## Malty       -0.19688969  0.19576698  0.061398556  1.00000000

The correlation matrix showcases that the strongest relationship is between Fruits and Sour, 0.78. Let’s take any one of those two to be our response variable to convert into binary variable.

Binary Conversion of Sour varibale based on a threshold value.:

IsSour: 1 if Sourness rating is above the threshold (indicating the beer is sour), 0 if below the threshold (indicating the beer is not sour).

sour_threshold <- 20
beers$IsSour <- ifelse(beers$Sour > sour_threshold, 1, 0)
head(beers)
##                           Name   Style
## 1                        Amber Altbier
## 2                   Double Bag Altbier
## 3               Long Trail Ale Altbier
## 4                 Doppelsticke Altbier
## 5 Sleigh'r Dark Doüble Alt Ale Altbier
## 6                       Sticke Altbier
##                                            Brewery
## 1                              Alaskan Brewing Co.
## 2                           Long Trail Brewing Co.
## 3                           Long Trail Brewing Co.
## 4 Uerige Obergärige Hausbrauerei GmbH / Zum Uerige
## 5                          Ninkasi Brewing Company
## 6 Uerige Obergärige Hausbrauerei GmbH / Zum Uerige
##                                                       Beer.Name..Full.
## 1                                    Alaskan Brewing Co. Alaskan Amber
## 2                                    Long Trail Brewing Co. Double Bag
## 3                                Long Trail Brewing Co. Long Trail Ale
## 4 Uerige Obergärige Hausbrauerei GmbH / Zum Uerige Uerige Doppelsticke
## 5                 Ninkasi Brewing Company Sleigh'r Dark Doüble Alt Ale
## 6       Uerige Obergärige Hausbrauerei GmbH / Zum Uerige Uerige Sticke
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Description
## 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Notes:Richly malty and long on the palate, with just enough hop backing to make this beautiful amber colored "alt" style beer notably well balanced.\\t
## 2 Notes:This malty, full-bodied double alt is also known as “Stickebier” – German slang for “secret brew”. Long Trail Double Bag was originally offered only in our brewery taproom as a special treat to our visitors. With an alcohol content of 7.2%, please indulge in moderation. The Long Trail Brewing Company is proud to have Double Bag named Malt Advocate’s “Beer of the Year” in 2001. Malt Advocate is a national magazine devoted to “expanding the boundaries of fine drinks”. Their panel of judges likes to keep things simple, and therefore of thousands of eligible competitors they award only two categories: “Imported” and “Domestic”. It is a great honor to receive this recognition.33 IBU\\t
## 3                                                                                                                                                                                                                                                                                           Notes:Long Trail Ale is a full-bodied amber ale modeled after the “Alt-biers” of Düsseldorf, Germany. Our top fermenting yeast and cold finishing temperature result in a complex, yet clean, full flavor. Originally introduced in November of 1989, Long Trail Ale beer quickly became, and remains, the largest selling craft-brew in Vermont. It is a multiple medal winner at the Great American Beer Festival.25 IBU\\t
## 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Notes:
## 5                                                                                                                                                     Notes:Called 'Dark Double Alt' on the label.Seize the season with Sleigh'r. Layers of deeply toasted malt are balanced by just enough hop bitterness to make it deceivingly drinkable. Paired with a dry finish, Sleigh’r is anything but your typical winter brew.An Alt ferments with Ale yeast at colder lagering temperatures. This effect gives Alts a more refined, crisp lager-like flavor than traditional ales. The Alt has been “Ninkasified” raising the ABV and IBUs. Sleigh'r has a deep, toasted malt flavor that finishes dry and balanced.50 IBU\\t
## 6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Notes:
##   ABV Min.IBU Max.IBU Astringency Body Alcohol Bitter Sweet Sour Salty Fruits
## 1 5.3      25      50          13   32       9     47    74   33     0     33
## 2 7.2      25      50          12   57      18     33    55   16     0     24
## 3 5.0      25      50          14   37       6     42    43   11     0     10
## 4 8.5      25      50          13   55      31     47   101   18     1     49
## 5 7.2      25      50          25   51      26     44    45    9     1     11
## 6 6.0      25      50          22   45      13     46    62   25     1     34
##   Hoppy Spices Malty review_aroma review_appearance review_palate review_taste
## 1    57      8   111     3.498994          3.636821      3.556338     3.643863
## 2    35     12    84     3.798337          3.846154      3.904366     4.024948
## 3    54      4    62     3.409814          3.667109      3.600796     3.631300
## 4    40     16   119     4.148098          4.033967      4.150815     4.205163
## 5    51     20    95     3.625000          3.973958      3.734375     3.765625
## 6    60      4   103     4.007937          4.007937      4.087302     4.192063
##   review_overall number_of_reviews IsSour
## 1       3.847082               497      1
## 2       4.034304               481      0
## 3       3.830239               377      0
## 4       4.005435               368      0
## 5       3.817708                96      0
## 6       4.230159               315      1

We have a new variable IsSour in the Beers dataset, consisting of binary values based on the threshold.

# Counting the Binary values in the IsSour variable
value_counts <- table(beers$IsSour)
print(value_counts)
## 
##    0    1 
## 1510 1687
# Logistic regression on IsSour function with ABV as the explanatory variable
model <- glm(IsSour ~ ABV, data = beers, family = "binomial")
summary(model)
## 
## Call:
## glm(formula = IsSour ~ ABV, family = "binomial", data = beers)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.04389    0.11446   -9.12   <2e-16 ***
## ABV          0.17888    0.01703   10.51   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 4422.2  on 3196  degrees of freedom
## Residual deviance: 4297.3  on 3195  degrees of freedom
## AIC: 4301.3
## 
## Number of Fisher Scoring iterations: 3
# Scatterplot of ABV vs. Probability of being sour
ggplot(beers, aes(ABV, IsSour)) +
  geom_smooth(method = "glm", method.args = list(family = "binomial")) +
  ylab("Probability of Being Sour") +
  xlab("ABV (Alcohol by Volume)")
## `geom_smooth()` using formula = 'y ~ x'

In my model, I find that the logistic regression model indicates a statistically significant positive relationship between “ABV” and the likelihood of a beer being sour. Specifically, the coefficient for “ABV” implies that as the ABV increases, the log-odds of the beer being sour also increase.

# Standard error of the 'ABV' coefficient
se_ABV <- summary(model)$coefficients["ABV", "Std. Error"]

alpha <- 0.05 # confidence level
z <- qnorm(1 - alpha/2)

coef_ABV <- coef(model)["ABV"]

margin_error <- z * se_ABV

lower_bound <- coef_ABV - margin_error
upper_bound <- coef_ABV + margin_error

cat("95% Confidence Interval for 'ABV' Coefficient:", lower_bound, ", ", upper_bound, "\n")
## 95% Confidence Interval for 'ABV' Coefficient: 0.1455155 ,  0.2122537

Understanding this interval, it implies that when the ‘ABV’ of a beer increases by one unit, the log-odds of the beer being sour tend to increase by a value within the range of approximately 0.1455 to 0.2123. This entire interval being positive affirms that there is a statistically significant positive connection between the ‘ABV’ and the likelihood of a beer being sour. In simpler terms, as the ‘ABV’ goes up, it’s more likely that the beer will be sour.

# Model with multiple explanatory variables
model_multi <- glm(IsSour ~ Sour + ABV + Bitter + Sweet + Fruits, data = beers, family = "binomial")
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(model_multi)
## 
## Call:
## glm(formula = IsSour ~ Sour + ABV + Bitter + Sweet + Fruits, 
##     family = "binomial", data = beers)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.918e+02  1.091e+04  -0.063    0.949
## Sour         3.374e+01  5.303e+02   0.064    0.949
## ABV          7.823e-02  1.521e+02   0.001    1.000
## Bitter      -8.691e-03  1.350e+01  -0.001    0.999
## Sweet        2.333e-03  1.134e+01   0.000    1.000
## Fruits      -6.022e-03  1.750e+01   0.000    1.000
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 4.4222e+03  on 3196  degrees of freedom
## Residual deviance: 1.0487e-05  on 3191  degrees of freedom
## AIC: 12
## 
## Number of Fisher Scoring iterations: 25

I investigated the impact of several predictor variables, such as “Sour,” “ABV,” “Bitter,” “Sweet,” and “Fruits.” However, the analysis uncovered that none of these variables displayed statistical significance in their ability to predict whether a beer is sour or not. The coefficients for all the predictor variables were found to be close to zero, and their respective p-values were notably high, approaching 1.000. This indicates that there is insufficient evidence to suggest that any of these factors have a significant influence on a beer’s sourness within the scope of this model

# Applying a square root transformation to 'Fruits' variable.
beers$sqrt_Fruits <- sqrt(beers$Fruits)

model <- glm(IsSour ~ sqrt_Fruits, family = "binomial", data = beers)
summary(model)
## 
## Call:
## glm(formula = IsSour ~ sqrt_Fruits, family = "binomial", data = beers)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -6.52569    0.23159  -28.18   <2e-16 ***
## sqrt_Fruits  1.25022    0.04361   28.67   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 4422.2  on 3196  degrees of freedom
## Residual deviance: 1916.7  on 3195  degrees of freedom
## AIC: 1920.7
## 
## Number of Fisher Scoring iterations: 6

In this logistic regression model that I’ve fitted, I used the square root of “Fruits” as a predictor for whether a beer is sour. The results indicate a strong, statistically significant positive relationship between the square root of “Fruits” and the probability of a beer being sour. Specifically, for every one-unit increase in the square root of “Fruits,” there is an approximately 1.25022 increase in the log-odds of the beer being sour.

Applying Square Root transformation on Fruits and Sour variable to achieve a better linear relationship between them.

## Why we need to apply the sqrt transformation:
ggplot(beers, aes(x = Fruits, y = Sour)) +
  geom_point() +
   geom_smooth(method = 'lm', color = 'blue', linetype = 'dashed') +
  labs(x = "Square Root of Fruits", y = "Sourness") +
  ggtitle("Relationship Between Sourness and Fruits")
## `geom_smooth()` using formula = 'y ~ x'

The above graph between variables Fruits and Sour show that they do have a linear relationship but after applying sqrt transformation a variable can lead to a better fit of the logistic regression model to the data, resulting in a model that is more accurate in predicting the outcome variable.

## Sqrt transformation:
beers$sqrt_Sour <- sqrt(beers$Sour)
ggplot(beers, aes(x = sqrt_Fruits, y = sqrt_Sour)) +
  geom_point() +
  geom_smooth(method = 'lm', color = 'blue', linetype = 'dashed') +
  labs(x = "Square Root of Fruits", y = "Sourness") +
  ggtitle("Relationship Between Sourness and Square Root of Fruits")
## `geom_smooth()` using formula = 'y ~ x'