Beer Profile and Ratings Analysis

About the Data:

The Beer Profile and Ratings dataset from Kaggle was used for the project. The main data set (beer_profile_and_ratings.csv) contains the following columns: (General) • Name: Beer name (label) • Style: Beer Style • Brewery: Brewery name • Beer Name: Complete beer name (Brewery + Brew Name) • Description: Notes on the beer if available • ABV: Alcohol content of beer (% by volume) • Min IBU: The minimum IBU value each beer can possess • Max IBU: The maximum IBU value each beer can possess

(Mouth feel) • Astringency • Body • Alcohol (Taste) • Bitter • Sweet •Sour • Salty (Flavor And Aroma) • Fruits • Hoppy • Spices • Malty

(Reviews) • review_aroma • review_appearance • review_palate •review_taste • review_overall • number_of_reviews

Loading the libraries

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(reshape2)
## 
## Attaching package: 'reshape2'
## 
## The following object is masked from 'package:tidyr':
## 
##     smiths
library(dplyr)
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## 
## The following object is masked from 'package:dplyr':
## 
##     combine
# Loading the dataset in beers data frame

beers <- read.csv("/Users/bhavyakalra/Desktop/Stats R/Final_project_beer/beer_profile_and_ratings.csv")
head(beers)
##                           Name   Style
## 1                        Amber Altbier
## 2                   Double Bag Altbier
## 3               Long Trail Ale Altbier
## 4                 Doppelsticke Altbier
## 5 Sleigh'r Dark Doüble Alt Ale Altbier
## 6                       Sticke Altbier
##                                            Brewery
## 1                              Alaskan Brewing Co.
## 2                           Long Trail Brewing Co.
## 3                           Long Trail Brewing Co.
## 4 Uerige Obergärige Hausbrauerei GmbH / Zum Uerige
## 5                          Ninkasi Brewing Company
## 6 Uerige Obergärige Hausbrauerei GmbH / Zum Uerige
##                                                       Beer.Name..Full.
## 1                                    Alaskan Brewing Co. Alaskan Amber
## 2                                    Long Trail Brewing Co. Double Bag
## 3                                Long Trail Brewing Co. Long Trail Ale
## 4 Uerige Obergärige Hausbrauerei GmbH / Zum Uerige Uerige Doppelsticke
## 5                 Ninkasi Brewing Company Sleigh'r Dark Doüble Alt Ale
## 6       Uerige Obergärige Hausbrauerei GmbH / Zum Uerige Uerige Sticke
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Description
## 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Notes:Richly malty and long on the palate, with just enough hop backing to make this beautiful amber colored "alt" style beer notably well balanced.\\t
## 2 Notes:This malty, full-bodied double alt is also known as “Stickebier” – German slang for “secret brew”. Long Trail Double Bag was originally offered only in our brewery taproom as a special treat to our visitors. With an alcohol content of 7.2%, please indulge in moderation. The Long Trail Brewing Company is proud to have Double Bag named Malt Advocate’s “Beer of the Year” in 2001. Malt Advocate is a national magazine devoted to “expanding the boundaries of fine drinks”. Their panel of judges likes to keep things simple, and therefore of thousands of eligible competitors they award only two categories: “Imported” and “Domestic”. It is a great honor to receive this recognition.33 IBU\\t
## 3                                                                                                                                                                                                                                                                                           Notes:Long Trail Ale is a full-bodied amber ale modeled after the “Alt-biers” of Düsseldorf, Germany. Our top fermenting yeast and cold finishing temperature result in a complex, yet clean, full flavor. Originally introduced in November of 1989, Long Trail Ale beer quickly became, and remains, the largest selling craft-brew in Vermont. It is a multiple medal winner at the Great American Beer Festival.25 IBU\\t
## 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Notes:
## 5                                                                                                                                                     Notes:Called 'Dark Double Alt' on the label.Seize the season with Sleigh'r. Layers of deeply toasted malt are balanced by just enough hop bitterness to make it deceivingly drinkable. Paired with a dry finish, Sleigh’r is anything but your typical winter brew.An Alt ferments with Ale yeast at colder lagering temperatures. This effect gives Alts a more refined, crisp lager-like flavor than traditional ales. The Alt has been “Ninkasified” raising the ABV and IBUs. Sleigh'r has a deep, toasted malt flavor that finishes dry and balanced.50 IBU\\t
## 6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Notes:
##   ABV Min.IBU Max.IBU Astringency Body Alcohol Bitter Sweet Sour Salty Fruits
## 1 5.3      25      50          13   32       9     47    74   33     0     33
## 2 7.2      25      50          12   57      18     33    55   16     0     24
## 3 5.0      25      50          14   37       6     42    43   11     0     10
## 4 8.5      25      50          13   55      31     47   101   18     1     49
## 5 7.2      25      50          25   51      26     44    45    9     1     11
## 6 6.0      25      50          22   45      13     46    62   25     1     34
##   Hoppy Spices Malty review_aroma review_appearance review_palate review_taste
## 1    57      8   111     3.498994          3.636821      3.556338     3.643863
## 2    35     12    84     3.798337          3.846154      3.904366     4.024948
## 3    54      4    62     3.409814          3.667109      3.600796     3.631300
## 4    40     16   119     4.148098          4.033967      4.150815     4.205163
## 5    51     20    95     3.625000          3.973958      3.734375     3.765625
## 6    60      4   103     4.007937          4.007937      4.087302     4.192063
##   review_overall number_of_reviews
## 1       3.847082               497
## 2       4.034304               481
## 3       3.830239               377
## 4       4.005435               368
## 5       3.817708                96
## 6       4.230159               315

From the Beers dataset, here are three columns (or values) that are unclear until you, here I read the documentation:

Description Column: The “Description” column contains textual descriptions of the beers, including details about their taste, ingredients, and history. Without reading the descriptions or having a clear understanding of the format, it might have be challenging to interpret this column. The descriptions vary in length and style, making automated analysis of this column complex.

head(beers$Description)
## [1] "Notes:Richly malty and long on the palate, with just enough hop backing to make this beautiful amber colored \"alt\" style beer notably well balanced.\\t"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
## [2] "Notes:This malty, full-bodied double alt is also known as “Stickebier” – German slang for “secret brew”. Long Trail Double Bag was originally offered only in our brewery taproom as a special treat to our visitors. With an alcohol content of 7.2%, please indulge in moderation. The Long Trail Brewing Company is proud to have Double Bag named Malt Advocate’s “Beer of the Year” in 2001. Malt Advocate is a national magazine devoted to “expanding the boundaries of fine drinks”. Their panel of judges likes to keep things simple, and therefore of thousands of eligible competitors they award only two categories: “Imported” and “Domestic”. It is a great honor to receive this recognition.33 IBU\\t"
## [3] "Notes:Long Trail Ale is a full-bodied amber ale modeled after the “Alt-biers” of Düsseldorf, Germany. Our top fermenting yeast and cold finishing temperature result in a complex, yet clean, full flavor. Originally introduced in November of 1989, Long Trail Ale beer quickly became, and remains, the largest selling craft-brew in Vermont. It is a multiple medal winner at the Great American Beer Festival.25 IBU\\t"                                                                                                                                                                                                                                                                                          
## [4] "Notes:"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
## [5] "Notes:Called 'Dark Double Alt' on the label.Seize the season with Sleigh'r. Layers of deeply toasted malt are balanced by just enough hop bitterness to make it deceivingly drinkable. Paired with a dry finish, Sleigh’r is anything but your typical winter brew.An Alt ferments with Ale yeast at colder lagering temperatures. This effect gives Alts a more refined, crisp lager-like flavor than traditional ales. The Alt has been “Ninkasified” raising the ABV and IBUs. Sleigh'r has a deep, toasted malt flavor that finishes dry and balanced.50 IBU\\t"                                                                                                                                                    
## [6] "Notes:"

Beer Name (Full) Column: The “Beer Name (Full)” column appears to contain both the brewery name and the beer name in a single cell. This combined format might lead to confusion without proper parsing or understanding of the structure.

head(beers$Beer.Name..Full.)
## [1] "Alaskan Brewing Co. Alaskan Amber"                                   
## [2] "Long Trail Brewing Co. Double Bag"                                   
## [3] "Long Trail Brewing Co. Long Trail Ale"                               
## [4] "Uerige Obergärige Hausbrauerei GmbH / Zum Uerige Uerige Doppelsticke"
## [5] "Ninkasi Brewing Company Sleigh'r Dark Doüble Alt Ale"                
## [6] "Uerige Obergärige Hausbrauerei GmbH / Zum Uerige Uerige Sticke"

IBU Range: The “Min IBU” and “Max IBU” columns seem to specify the International Bitterness Units (IBU) range for each beer, but the values are combined with the unit “IBU” within the cell. This format requires additional processing to extract the numerical values for analysis.

print("Min IBU values are:")
## [1] "Min IBU values are:"
head(beers$Min.IBU)
## [1] 25 25 25 25 25 25
print("Max IBU values are:")
## [1] "Max IBU values are:"
tail(beers$Max.IBU)
## [1] 50 50 50 50 50 50

The choice to encode the data in this way may be due to the convenience of presenting information in a human-readable format. Combining data like the brewery and beer name in a single cell might make the dataset more compact and user-friendly for manual inspection. Similarly, including the unit “IBU” in the IBU range could help clarify the unit of measurement for those viewing the data casually.

If I didn’t read the documentation or wasn’t aware of the encoding choices, it could have lead to misinterpretations and errors in data analysis. For example, attempting to perform operations directly on columns like “Min IBU” and “Max IBU” without first stripping the “IBU” units could result in incorrect calculations. Similarly, trying to extract brewery names from the “Beer Name (Full)” column without understanding the format might lead to inaccurate information.

At least one element or your data that is unclear even after reading the documentation:

I found a column that was understandable but still mysterious in his ways of work.

“number_of_reviews” Column: While the column name suggests that it represents the number of reviews for each beer, the documentation might not explain the source of these reviews, the time frame they cover, or the methodology used for collecting and aggregating these reviews. Without this additional context, it may be challenging to assess the reliability and relevance of the review count for each beer.

unclear_df <- subset(beers, select = c(Name, number_of_reviews))

unclear_df_10 <- head(unclear_df, 10)
unclear_df_10
##                            Name number_of_reviews
## 1                         Amber               497
## 2                    Double Bag               481
## 3                Long Trail Ale               377
## 4                  Doppelsticke               368
## 5  Sleigh'r Dark Doüble Alt Ale                96
## 6                        Sticke               315
## 7             Okto Festival Ale               124
## 8           Southampton Altbier               445
## 9                        Copper                46
## 10          Organic Münster Alt               245
plot <- ggplot(unclear_df_10, aes(x = Name, y = number_of_reviews)) +
  geom_bar(stat = "identity", fill = "skyblue") +
  
  # Highlight the issue with an annotation
  annotate("text", x = 9, y = 550, label = "Source/Context Unclear",
           color = "red", size = 4) +
  
  # Adding labels and title
  labs(x = "Beer Name", y = "Number of Reviews",
       title = "Number of Reviews for Different Beers") +
  
  # Rotate x-axis labels for better readability
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

# Display the plot
print(plot)

This visualization makes it clear that there is uncertainty about the source and context of the review data, which is essential for viewers to understand the limitations of this particular dataset column.

There are other columns such as review aroma, palate, taste and appearance which would provide better insight than “number of reviews” in terms of context.