Beer Profile and Ratings Analysis

About the Data:

The Beer Profile and Ratings dataset from Kaggle was used for the project. The main data set (beer_profile_and_ratings.csv) contains the following columns: (General) • Name: Beer name (label) • Style: Beer Style • Brewery: Brewery name • Beer Name: Complete beer name (Brewery + Brew Name) • Description: Notes on the beer if available • ABV: Alcohol content of beer (% by volume) • Min IBU: The minimum IBU value each beer can possess • Max IBU: The maximum IBU value each beer can possess

(Mouth feel) • Astringency • Body • Alcohol (Taste) • Bitter • Sweet •Sour • Salty (Flavor And Aroma) • Fruits • Hoppy • Spices • Malty

(Reviews) • review_aroma • review_appearance • review_palate •review_taste • review_overall • number_of_reviews

Loading the libraries

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(reshape2)
## 
## Attaching package: 'reshape2'
## 
## The following object is masked from 'package:tidyr':
## 
##     smiths
library(dplyr)
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## 
## The following object is masked from 'package:dplyr':
## 
##     combine
library(boot)
library(pwr)
# Loading the dataset in beers data frame

beers <- read.csv("/Users/bhavyakalra/Desktop/Stats R/Final_project_beer/beer_profile_and_ratings.csv")
head(beers)
##                           Name   Style
## 1                        Amber Altbier
## 2                   Double Bag Altbier
## 3               Long Trail Ale Altbier
## 4                 Doppelsticke Altbier
## 5 Sleigh'r Dark Doüble Alt Ale Altbier
## 6                       Sticke Altbier
##                                            Brewery
## 1                              Alaskan Brewing Co.
## 2                           Long Trail Brewing Co.
## 3                           Long Trail Brewing Co.
## 4 Uerige Obergärige Hausbrauerei GmbH / Zum Uerige
## 5                          Ninkasi Brewing Company
## 6 Uerige Obergärige Hausbrauerei GmbH / Zum Uerige
##                                                       Beer.Name..Full.
## 1                                    Alaskan Brewing Co. Alaskan Amber
## 2                                    Long Trail Brewing Co. Double Bag
## 3                                Long Trail Brewing Co. Long Trail Ale
## 4 Uerige Obergärige Hausbrauerei GmbH / Zum Uerige Uerige Doppelsticke
## 5                 Ninkasi Brewing Company Sleigh'r Dark Doüble Alt Ale
## 6       Uerige Obergärige Hausbrauerei GmbH / Zum Uerige Uerige Sticke
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Description
## 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Notes:Richly malty and long on the palate, with just enough hop backing to make this beautiful amber colored "alt" style beer notably well balanced.\\t
## 2 Notes:This malty, full-bodied double alt is also known as “Stickebier” – German slang for “secret brew”. Long Trail Double Bag was originally offered only in our brewery taproom as a special treat to our visitors. With an alcohol content of 7.2%, please indulge in moderation. The Long Trail Brewing Company is proud to have Double Bag named Malt Advocate’s “Beer of the Year” in 2001. Malt Advocate is a national magazine devoted to “expanding the boundaries of fine drinks”. Their panel of judges likes to keep things simple, and therefore of thousands of eligible competitors they award only two categories: “Imported” and “Domestic”. It is a great honor to receive this recognition.33 IBU\\t
## 3                                                                                                                                                                                                                                                                                           Notes:Long Trail Ale is a full-bodied amber ale modeled after the “Alt-biers” of Düsseldorf, Germany. Our top fermenting yeast and cold finishing temperature result in a complex, yet clean, full flavor. Originally introduced in November of 1989, Long Trail Ale beer quickly became, and remains, the largest selling craft-brew in Vermont. It is a multiple medal winner at the Great American Beer Festival.25 IBU\\t
## 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Notes:
## 5                                                                                                                                                     Notes:Called 'Dark Double Alt' on the label.Seize the season with Sleigh'r. Layers of deeply toasted malt are balanced by just enough hop bitterness to make it deceivingly drinkable. Paired with a dry finish, Sleigh’r is anything but your typical winter brew.An Alt ferments with Ale yeast at colder lagering temperatures. This effect gives Alts a more refined, crisp lager-like flavor than traditional ales. The Alt has been “Ninkasified” raising the ABV and IBUs. Sleigh'r has a deep, toasted malt flavor that finishes dry and balanced.50 IBU\\t
## 6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Notes:
##   ABV Min.IBU Max.IBU Astringency Body Alcohol Bitter Sweet Sour Salty Fruits
## 1 5.3      25      50          13   32       9     47    74   33     0     33
## 2 7.2      25      50          12   57      18     33    55   16     0     24
## 3 5.0      25      50          14   37       6     42    43   11     0     10
## 4 8.5      25      50          13   55      31     47   101   18     1     49
## 5 7.2      25      50          25   51      26     44    45    9     1     11
## 6 6.0      25      50          22   45      13     46    62   25     1     34
##   Hoppy Spices Malty review_aroma review_appearance review_palate review_taste
## 1    57      8   111     3.498994          3.636821      3.556338     3.643863
## 2    35     12    84     3.798337          3.846154      3.904366     4.024948
## 3    54      4    62     3.409814          3.667109      3.600796     3.631300
## 4    40     16   119     4.148098          4.033967      4.150815     4.205163
## 5    51     20    95     3.625000          3.973958      3.734375     3.765625
## 6    60      4   103     4.007937          4.007937      4.087302     4.192063
##   review_overall number_of_reviews
## 1       3.847082               497
## 2       4.034304               481
## 3       3.830239               377
## 4       4.005435               368
## 5       3.817708                96
## 6       4.230159               315

Binary Conversion of Alcohol varibale based on a threshold value.:

IsAlcoholic: 1 if Alcohol rating is above the threshold (indicating the beer is more Alcoholic), 0 if below the threshold (indicating the beer is less Alcoholic).

alcohol_threshold <- 10
beers$IsAlcoholic <- ifelse(beers$Alcohol > alcohol_threshold, 1, 0)
head(beers)
##                           Name   Style
## 1                        Amber Altbier
## 2                   Double Bag Altbier
## 3               Long Trail Ale Altbier
## 4                 Doppelsticke Altbier
## 5 Sleigh'r Dark Doüble Alt Ale Altbier
## 6                       Sticke Altbier
##                                            Brewery
## 1                              Alaskan Brewing Co.
## 2                           Long Trail Brewing Co.
## 3                           Long Trail Brewing Co.
## 4 Uerige Obergärige Hausbrauerei GmbH / Zum Uerige
## 5                          Ninkasi Brewing Company
## 6 Uerige Obergärige Hausbrauerei GmbH / Zum Uerige
##                                                       Beer.Name..Full.
## 1                                    Alaskan Brewing Co. Alaskan Amber
## 2                                    Long Trail Brewing Co. Double Bag
## 3                                Long Trail Brewing Co. Long Trail Ale
## 4 Uerige Obergärige Hausbrauerei GmbH / Zum Uerige Uerige Doppelsticke
## 5                 Ninkasi Brewing Company Sleigh'r Dark Doüble Alt Ale
## 6       Uerige Obergärige Hausbrauerei GmbH / Zum Uerige Uerige Sticke
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Description
## 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Notes:Richly malty and long on the palate, with just enough hop backing to make this beautiful amber colored "alt" style beer notably well balanced.\\t
## 2 Notes:This malty, full-bodied double alt is also known as “Stickebier” – German slang for “secret brew”. Long Trail Double Bag was originally offered only in our brewery taproom as a special treat to our visitors. With an alcohol content of 7.2%, please indulge in moderation. The Long Trail Brewing Company is proud to have Double Bag named Malt Advocate’s “Beer of the Year” in 2001. Malt Advocate is a national magazine devoted to “expanding the boundaries of fine drinks”. Their panel of judges likes to keep things simple, and therefore of thousands of eligible competitors they award only two categories: “Imported” and “Domestic”. It is a great honor to receive this recognition.33 IBU\\t
## 3                                                                                                                                                                                                                                                                                           Notes:Long Trail Ale is a full-bodied amber ale modeled after the “Alt-biers” of Düsseldorf, Germany. Our top fermenting yeast and cold finishing temperature result in a complex, yet clean, full flavor. Originally introduced in November of 1989, Long Trail Ale beer quickly became, and remains, the largest selling craft-brew in Vermont. It is a multiple medal winner at the Great American Beer Festival.25 IBU\\t
## 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Notes:
## 5                                                                                                                                                     Notes:Called 'Dark Double Alt' on the label.Seize the season with Sleigh'r. Layers of deeply toasted malt are balanced by just enough hop bitterness to make it deceivingly drinkable. Paired with a dry finish, Sleigh’r is anything but your typical winter brew.An Alt ferments with Ale yeast at colder lagering temperatures. This effect gives Alts a more refined, crisp lager-like flavor than traditional ales. The Alt has been “Ninkasified” raising the ABV and IBUs. Sleigh'r has a deep, toasted malt flavor that finishes dry and balanced.50 IBU\\t
## 6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Notes:
##   ABV Min.IBU Max.IBU Astringency Body Alcohol Bitter Sweet Sour Salty Fruits
## 1 5.3      25      50          13   32       9     47    74   33     0     33
## 2 7.2      25      50          12   57      18     33    55   16     0     24
## 3 5.0      25      50          14   37       6     42    43   11     0     10
## 4 8.5      25      50          13   55      31     47   101   18     1     49
## 5 7.2      25      50          25   51      26     44    45    9     1     11
## 6 6.0      25      50          22   45      13     46    62   25     1     34
##   Hoppy Spices Malty review_aroma review_appearance review_palate review_taste
## 1    57      8   111     3.498994          3.636821      3.556338     3.643863
## 2    35     12    84     3.798337          3.846154      3.904366     4.024948
## 3    54      4    62     3.409814          3.667109      3.600796     3.631300
## 4    40     16   119     4.148098          4.033967      4.150815     4.205163
## 5    51     20    95     3.625000          3.973958      3.734375     3.765625
## 6    60      4   103     4.007937          4.007937      4.087302     4.192063
##   review_overall number_of_reviews IsAlcoholic
## 1       3.847082               497           0
## 2       4.034304               481           1
## 3       3.830239               377           0
## 4       4.005435               368           1
## 5       3.817708                96           1
## 6       4.230159               315           1
# Counting the Binary values in the IsSour variable
value_counts <- table(beers$IsAlcoholic)
print(value_counts)
## 
##    0    1 
## 1548 1649
# Logistic regression on IsSour function with ABV as the explanatory variable
model <- glm(IsAlcoholic ~ ABV, data = beers, family = "binomial")
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(model)
## 
## Call:
## glm(formula = IsAlcoholic ~ ABV, family = "binomial", data = beers)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -5.65125    0.22111  -25.56   <2e-16 ***
## ABV          0.92016    0.03635   25.32   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 4428.8  on 3196  degrees of freedom
## Residual deviance: 3125.5  on 3195  degrees of freedom
## AIC: 3129.5
## 
## Number of Fisher Scoring iterations: 6

The logistic regression model indicates that the Alcohol by Volume (ABV) content is a highly significant predictor of whether a beer is alcoholic. As ABV increases, the likelihood of a beer being alcoholic also rises. The model provides a good fit to the data, supported by low p-values and a lower residual deviance compared to the null deviance.

Interpreting the coefficient:

In the logistic regression model for “IsAlcoholic” in the “beers” dataset, the coefficient for the “ABV” variable is approximately 0.92016. This means that for every one-unit increase in the Alcohol by Volume (ABV) of a beer, I found that the log-odds of that beer being classified as “IsAlcoholic” (versus less-alcoholic) increase by about 0.92016 units.

In more understandable terms, for each additional unit of ABV, I observed that the odds of a beer being categorized as alcoholic, as opposed to less-alcoholic, increase by a factor of roughly 2.5. So, it appears that higher ABV is associated with a greater likelihood of a beer being classified as alcoholic in this model, with the odds increasing by a factor of approximately 2.5 for each one-unit rise in ABV.

ggplot(beers, aes(x = ABV, y = IsAlcoholic)) +
  geom_point() +
  labs(
    title = "Scatter Plot of ABV vs. IsAlcoholic",
    x = "ABV (Alcohol by Volume)",
    y = "IsAlcoholic"
  )

We dont need to do any transformations. Logistic regression does not make any assumptions which requires us to use some form of transformation.

beers$ABV_Range <- cut(beers$ABV, breaks = c(0, 5, 10, 15, 20, max(beers$ABV)))


ggplot(beers, aes(x = ABV_Range, fill = factor(IsAlcoholic))) +
  geom_bar(position = "dodge") +
  labs(title = "Distribution of More Alcoholic vs. Less-Alcoholic Beers by ABV Range",
       x = "ABV Range", y = "Count") +
  scale_fill_discrete(name = "IsAlcoholic")

ABV ranging from 0-5 have less Alcoholic content found in Beers where in case of 5-10 or even greater than 10 the concentration of Alcohol is more in Beers, this is also one of the factors we took the IsAlcoholic threshold value as 10.

# Standard error of the 'ABV' coefficient
se_ABV <- summary(model)$coefficients["ABV", "Std. Error"]

alpha <- 0.05 # confidence level
z <- qnorm(1 - alpha/2)

coef_ABV <- coef(model)["ABV"]

margin_error <- z * se_ABV

lower_bound <- coef_ABV - margin_error
upper_bound <- coef_ABV + margin_error

cat("95% Confidence Interval for 'ABV' Coefficient:", lower_bound, ", ", upper_bound, "\n")
## 95% Confidence Interval for 'ABV' Coefficient: 0.8489137 ,  0.9913964

EXERCISE FROM CLASS

Build two nested GLM models using the beers data (i.e., both sharing one variable, but one with an added variable not in the other). Compare the above model comparison methods by running them with the two models you’ve built. Do they all agree?

# Base Model: ABV as the single predictor
base_model <- glm(IsAlcoholic ~ ABV, data = beers, family = "binomial")
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
# Extended Model: ABV and Sweet variable as predictors
extended_model <- glm(IsAlcoholic ~ ABV + Sweet, data = beers, family = "binomial")
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
AIC_base <- AIC(base_model)
AIC_extended <- AIC(extended_model)
BIC_base <- BIC(base_model)
BIC_extended <- BIC(extended_model)

cat("Base Model AIC:", AIC_base, "\n")
## Base Model AIC: 3129.514
cat("Extended Model AIC:", AIC_extended, "\n")
## Extended Model AIC: 2941.963
cat("Base Model BIC:", BIC_base, "\n")
## Base Model BIC: 3141.654
cat("Extended Model BIC:", BIC_extended, "\n")
## Extended Model BIC: 2960.173
# Comparing models
if (AIC_base < AIC_extended) {
  cat("Base Model is preferred based on AIC\n")
} else {
  cat("Extended Model is preferred based on AIC\n")
}
## Extended Model is preferred based on AIC
if (BIC_base < BIC_extended) {
  cat("Base Model is preferred based on BIC\n")
} else {
  cat("Extended Model is preferred based on BIC\n")
}
## Extended Model is preferred based on BIC

Based on the AIC and BIC values, it is evident that the extended model, which includes both the “ABV” variable and the “Sweet” variable, is preferred over the base model, which only includes the “ABV” variable.

The AIC value for the extended model is lower (2941.963) compared to the base model (3129.514). A lower AIC indicates a better fit, suggesting that the extended model is more appropriate for the data.

The BIC value for the extended model is also lower (2960.173) compared to the base model (3141.654). Similarly, a lower BIC value indicates a better fit, further supporting the preference for the extended model.

Overall, both AIC and BIC favor the extended model with the addition of the “Sweet” variable as a better fit for the data.