Predicting IBU (Bitterness) Based on ABV and Style

getwd()
## [1] "C:/Users/manue/OneDrive/Desktop"
# Read in data
beers <- read.csv("beers.csv", header = TRUE, sep = ",")
str(beers)
## 'data.frame':    2410 obs. of  15 variables:
##  $ X         : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ count.x   : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ abv       : num  0.05 0.066 0.071 0.09 0.075 0.077 0.045 0.065 0.055 0.086 ...
##  $ ibu       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ id        : int  1436 2265 2264 2263 2262 2261 2260 2259 2258 2131 ...
##  $ beer      : chr  "Pub Beer" "Devil's Cup" "Rise of the Phoenix" "Sinister" ...
##  $ style     : chr  "American Pale Lager" "American Pale Ale (APA)" "American IPA" "American Double / Imperial IPA" ...
##  $ brewery_id: int  408 177 177 177 177 177 177 177 177 177 ...
##  $ ounces    : num  12 12 12 12 12 12 12 12 12 12 ...
##  $ style2    : chr  NA "American Pale Ale (APA)" "American IPA" "American Double / Imperial IPA" ...
##  $ count.y   : int  409 178 178 178 178 178 178 178 178 178 ...
##  $ brewery   : chr  "10 Barrel Brewing Company" "18th Street Brewery" "18th Street Brewery" "18th Street Brewery" ...
##  $ city      : chr  "Bend" "Gary" "Gary" "Gary" ...
##  $ state     : chr  "OR" "IN" "IN" "IN" ...
##  $ label     : chr  "Pub Beer (10 Barrel Brewing Company)" "Devil's Cup (18th Street Brewery)" "Rise of the Phoenix (18th Street Brewery)" "Sinister (18th Street Brewery)" ...

Finding missing values in beers:

colSums(is.na(beers))
##          X    count.x        abv        ibu         id       beer      style 
##          0          0         62       1005          0          0          5 
## brewery_id     ounces     style2    count.y    brewery       city      state 
##          0          0       1298          0          0          0          0 
##      label 
##          0

Finding unique values in style and style2:

length(unique(beers$style))
## [1] 100
length(unique(beers$style2))
## [1] 7

Check for duplicate rows:

beers[duplicated(beers), ]
##  [1] X          count.x    abv        ibu        id         beer      
##  [7] style      brewery_id ounces     style2     count.y    brewery   
## [13] city       state      label     
## <0 rows> (or 0-length row.names)

Check for duplicate columns:

duplicate_cols <- duplicated(as.list(beers))
duplicate_cols
##  [1] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE
# Show names of duplicate columns:
names(beers)[duplicate_cols]
## [1] "count.x"

Make Predictions using Random Forest Predicting IBU (Bitterness) Based on ABV and Style This is a practical model because bitterness often correlates with alcohol content and beer style.

Clean Dataset “beers” We’ll remove rows with missing values in ibu, abv, or style.

clean_beers <- beers[complete.cases(beers[, c("abv", "ibu", "style")]), ]

# Convert style to factor for modeling
clean_beers$style <- as.factor(clean_beers$style)

Build the model We used linear regression to predict IBU based on ABV and Style.

ibu_model <- lm(ibu ~ abv + style, data = clean_beers)

Make predictions

clean_beers$predicted_ibu <- predict(ibu_model, clean_beers)

# View actual vs predicted
head(clean_beers[, c("ibu", "predicted_ibu", "abv", "style")])
##    ibu predicted_ibu   abv                   style
## 15  60      49.20298 0.061 American Pale Ale (APA)
## 22  92      96.00000 0.099     American Barleywine
## 23  45      31.34359 0.079           Winter Warmer
## 25  42      37.18025 0.044 American Pale Ale (APA)
## 26  17      12.54982 0.049  Fruit / Vegetable Beer
## 27  17      12.54982 0.049  Fruit / Vegetable Beer

Evaluate the model

# Calculate RMSE (Root Mean Squared Error)
rmse <- sqrt(mean((clean_beers$ibu - clean_beers$predicted_ibu)^2))
print(paste("RMSE:", round(rmse, 2)))
## [1] "RMSE: 11.82"

The predicted bitterness (IBU) is off by 11.82 units on average.