Predicting IBU (Bitterness) Based on ABV and Style
getwd()
## [1] "C:/Users/manue/OneDrive/Desktop"
# Read in data
beers <- read.csv("beers.csv", header = TRUE, sep = ",")
str(beers)
## 'data.frame': 2410 obs. of 15 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ count.x : int 1 2 3 4 5 6 7 8 9 10 ...
## $ abv : num 0.05 0.066 0.071 0.09 0.075 0.077 0.045 0.065 0.055 0.086 ...
## $ ibu : int NA NA NA NA NA NA NA NA NA NA ...
## $ id : int 1436 2265 2264 2263 2262 2261 2260 2259 2258 2131 ...
## $ beer : chr "Pub Beer" "Devil's Cup" "Rise of the Phoenix" "Sinister" ...
## $ style : chr "American Pale Lager" "American Pale Ale (APA)" "American IPA" "American Double / Imperial IPA" ...
## $ brewery_id: int 408 177 177 177 177 177 177 177 177 177 ...
## $ ounces : num 12 12 12 12 12 12 12 12 12 12 ...
## $ style2 : chr NA "American Pale Ale (APA)" "American IPA" "American Double / Imperial IPA" ...
## $ count.y : int 409 178 178 178 178 178 178 178 178 178 ...
## $ brewery : chr "10 Barrel Brewing Company" "18th Street Brewery" "18th Street Brewery" "18th Street Brewery" ...
## $ city : chr "Bend" "Gary" "Gary" "Gary" ...
## $ state : chr "OR" "IN" "IN" "IN" ...
## $ label : chr "Pub Beer (10 Barrel Brewing Company)" "Devil's Cup (18th Street Brewery)" "Rise of the Phoenix (18th Street Brewery)" "Sinister (18th Street Brewery)" ...
Finding missing values in beers:
colSums(is.na(beers))
## X count.x abv ibu id beer style
## 0 0 62 1005 0 0 5
## brewery_id ounces style2 count.y brewery city state
## 0 0 1298 0 0 0 0
## label
## 0
Finding unique values in style and style2:
length(unique(beers$style))
## [1] 100
length(unique(beers$style2))
## [1] 7
Check for duplicate rows:
beers[duplicated(beers), ]
## [1] X count.x abv ibu id beer
## [7] style brewery_id ounces style2 count.y brewery
## [13] city state label
## <0 rows> (or 0-length row.names)
Check for duplicate columns:
duplicate_cols <- duplicated(as.list(beers))
duplicate_cols
## [1] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE
# Show names of duplicate columns:
names(beers)[duplicate_cols]
## [1] "count.x"
Make Predictions using Random Forest Predicting IBU (Bitterness) Based on ABV and Style This is a practical model because bitterness often correlates with alcohol content and beer style.
Clean Dataset “beers” We’ll remove rows with missing values in ibu, abv, or style.
clean_beers <- beers[complete.cases(beers[, c("abv", "ibu", "style")]), ]
# Convert style to factor for modeling
clean_beers$style <- as.factor(clean_beers$style)
Build the model We used linear regression to predict IBU based on ABV and Style.
ibu_model <- lm(ibu ~ abv + style, data = clean_beers)
Make predictions
clean_beers$predicted_ibu <- predict(ibu_model, clean_beers)
# View actual vs predicted
head(clean_beers[, c("ibu", "predicted_ibu", "abv", "style")])
## ibu predicted_ibu abv style
## 15 60 49.20298 0.061 American Pale Ale (APA)
## 22 92 96.00000 0.099 American Barleywine
## 23 45 31.34359 0.079 Winter Warmer
## 25 42 37.18025 0.044 American Pale Ale (APA)
## 26 17 12.54982 0.049 Fruit / Vegetable Beer
## 27 17 12.54982 0.049 Fruit / Vegetable Beer
Evaluate the model
# Calculate RMSE (Root Mean Squared Error)
rmse <- sqrt(mean((clean_beers$ibu - clean_beers$predicted_ibu)^2))
print(paste("RMSE:", round(rmse, 2)))
## [1] "RMSE: 11.82"
The predicted bitterness (IBU) is off by 11.82 units on average.