Homework 2: Good and Bad Wine Predictions

Authors: Josef Liem and Andrew Kriter

In this assignment, we learn to predict the “best” of the best and “worst” of the worst wines. We will consider such methods as linear discriminant analysis, decision trees, and knn to arrive at our final list of “good” and “bad” wines. In other words, we care more about precision rather than recall; we must be EXTREMELY selective when making a positive prediction for a given wine being “good” or “bad”. We will be considering a one versus the rest approach: a model for “good” wines vs. the rest, “bad” wines vs the rest, etc…

Feature Selection: Alcohol, Volatile Acidity, & Chlorides

First let us see the kinds of features that we will be able to work with:

library(ggplot2)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ lubridate 1.9.4     ✔ tibble    3.2.1
## ✔ purrr     1.0.4     ✔ tidyr     1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(dplyr)
library(rpart)
library(rpart.plot)
library(ipred)
library(MASS)

## 
## Attaching package: 'MASS'
## 
## The following object is masked from 'package:dplyr':
## 
##     select

library(ISLR)
library(class)
library(caret)

## Loading required package: lattice
## 
## Attaching package: 'caret'
## 
## The following object is masked from 'package:purrr':
## 
##     lift

# All data
show_col_types = FALSE
all_wine_data = read_csv("Wine_Data.csv")

## Rows: 5997 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): type
## dbl (12): fixed acidity, volatile acidity, citric acid, residual sugar, chlo...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

all_wine_data = all_wine_data |> drop_na(quality, 
                                         alcohol, 
                                         `volatile acidity`, 
                                         chlorides)
str(all_wine_data)

## tibble [5,987 × 13] (S3: tbl_df/tbl/data.frame)
##  $ type                : chr [1:5987] "white" "white" "white" "white" ...
##  $ fixed acidity       : num [1:5987] 7 6.3 7.2 7.2 8.1 6.2 7 6.3 8.1 8.1 ...
##  $ volatile acidity    : num [1:5987] 0.27 0.3 0.23 0.23 0.28 0.32 0.27 0.3 0.22 0.27 ...
##  $ citric acid         : num [1:5987] 0.36 0.34 0.32 0.32 0.4 0.16 0.36 0.34 0.43 0.41 ...
##  $ residual sugar      : num [1:5987] 20.7 1.6 8.5 8.5 6.9 7 20.7 1.6 1.5 1.45 ...
##  $ chlorides           : num [1:5987] 0.045 0.049 0.058 0.058 0.05 0.045 0.045 0.049 0.044 0.033 ...
##  $ free sulfur dioxide : num [1:5987] 45 14 47 47 30 30 45 14 28 11 ...
##  $ total sulfur dioxide: num [1:5987] 170 132 186 186 97 136 170 132 129 63 ...
##  $ density             : num [1:5987] 1.001 0.994 0.996 0.996 0.995 ...
##  $ pH                  : num [1:5987] 3 3.3 3.19 3.19 3.26 3.18 3 3.3 3.22 2.99 ...
##  $ sulphates           : num [1:5987] 0.45 0.49 0.4 0.4 0.44 0.47 0.45 0.49 0.45 0.56 ...
##  $ alcohol             : num [1:5987] 8.8 9.5 9.9 9.9 10.1 9.6 8.8 9.5 11 12 ...
##  $ quality             : num [1:5987] 6 6 6 6 6 6 6 6 6 5 ...

We now know that all of our values are numerical except for type. We can additionally see that quality is the “label” that we will be using when gauging how well our models perform at inference.

A helpful indicator for which features we might care about would be to see what features a tree decides to split on:

tree = rpart(quality ~ `fixed acidity` + `volatile acidity` + `citric acid` + 
              `residual sugar` + chlorides + `free sulfur dioxide` + 
              `total sulfur dioxide` + density + pH + sulphates + alcohol, 
              data = all_wine_data)
rpart.plot(tree)

Notice that our decision tree chooses volatile acidity and alcohol as the general features to split on if we were to treat our quality scores in a continuous range.

# https://stackoverflow.com/questions/49889403/loop-through-dataframe-column-names
for (col_name in colnames(all_wine_data)) {
  if (!col_name %in% c("quality", "type")) {
    correlation_value = cor(all_wine_data[[col_name]], 
                             all_wine_data$quality, 
                             use = "complete.obs")
    print(paste("Correlation between", col_name, "and quality:", correlation_value))
  }
}

## [1] "Correlation between fixed acidity and quality: -0.0718283752553146"
## [1] "Correlation between volatile acidity and quality: -0.266997343154087"
## [1] "Correlation between citric acid and quality: 0.0847277344389174"
## [1] "Correlation between residual sugar and quality: -0.0403426933181702"
## [1] "Correlation between chlorides and quality: -0.201318340881084"
## [1] "Correlation between free sulfur dioxide and quality: 0.0469242634492181"
## [1] "Correlation between total sulfur dioxide and quality: -0.0449432889814409"
## [1] "Correlation between density and quality: -0.305755325742749"
## [1] "Correlation between pH and quality: 0.0112343690174921"
## [1] "Correlation between sulphates and quality: 0.0380564179339317"
## [1] "Correlation between alcohol and quality: 0.444824578867225"

#Correlation coefficient winners: alcohol, chlorides, volatile acidity

It appears that alcohol, chlorides, and volatile acidity have decent enough correlations with the quality of a given wine, so these might be our features.

Further Feature Exploration

Now that we have narrowed down Alcohol, Volatile Acidity, & Chlorides as our features of interest in understanding quality, let us take a deeper dive to gain an intuition into which methods might work best for us.

We know that we ultimately want the “good” and “bad” wines defined as quality ratings in the intervals [3, 4] and [7, 9] (I include 9 even though they are extremely rare), respectively. Let us first see the unique kinds of labels we are working with:

# get unique quality values
sort(unique(all_wine_data$quality))

## [1] 3 4 5 6 7 8 9

How do these “quality” ratings hold up against “alcohol” and “volatile acidity”?

all_wine_data |>
  filter(quality %in% c(3, 4, 7, 8, 9)) |>
  ggplot() +
  geom_point(aes(x = alcohol, y = `volatile acidity`, color = factor(quality)))

Interesting! It appears that many of the “good” wines (7, 8, 9) will have lower volatile acidity (generally) than the “bad” (3, 4) wines. Additionally, the “bad” wines tend to have a slightly lower alcohol content. This might be something useful for a k-NN model to work with, as there seems to be some amount of separability based on quality and these features… Also notice that our bad wines are far fewer in number. I believe that predicting the bad wines will be the harder of the two tasks.

all_wine_data |>
  filter(quality %in% c(3, 4, 7, 8, 9)) |>
  ggplot() +
  geom_point(aes(x = alcohol, y = `chlorides`, color = factor(quality)))

For bad wines, we have slightly higher chloride levels with low alcohol content.

all_wine_data |>
  filter(quality %in% c(3, 4, 7, 8, 9)) |>
  ggplot() +
  geom_point(aes(x = chlorides, y = `volatile acidity`, color = factor(quality)))

It appears that again, high volatile acidity is a pretty good indicator of of bad wines, with a very weak relationship for chlorides. Let us now move on to preparation of our data.

Data Preparation

Let us first reload in our data now that we are done with the exploration phase. Our training data will consist of “all_wine_data”, and our inference data will consist of “our_wine_data”:

# All data
all_wine_data = read_csv("Wine_Data.csv")

## Rows: 5997 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): type
## dbl (12): fixed acidity, volatile acidity, citric acid, residual sugar, chlo...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# Our data: alcohol volatile acidity    chlorides
our_wine_data = read_csv("Andrew_Josef_wine_data.csv")

## Rows: 500 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): type
## dbl (4): wine_id, alcohol, volatile acidity, chlorides
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

As always, lets make sure that we handle missing NA values. We can do so by dropping rows with such values. Additionally, we will want to split our data into training and testing subsets. For now, a 50/50 split is good enough for me:

# Lets remove the NAs
all_wine_data = all_wine_data |> drop_na(quality, 
                                         alcohol, 
                                         `volatile acidity`, 
                                         chlorides)

# train and testing
set.seed(1)
training_data_rows = sample(1:nrow(all_wine_data), 
                            size = nrow(all_wine_data)/2)

training_data = all_wine_data[training_data_rows, ]
testing_data = all_wine_data[-training_data_rows, ]

We are not quite done yet! Previously, we saw that our labels are reported on a scale 3-9. Firstly, at testing, we don’t really care about the “average” wines, so why should I include performance over them in any of the metrics I end up calculating? Additionally, when we are gauging our models’ performance, we merely care about whether wines fall into the “good” bucket or the “bad” bucket; whether the wine is rated as a 3 or a 4 is of little concern. I will create a new label column that will be used at inference that reflects these considerations in the “testing_data”. However, I will still find “average” wines to be useful for training purposes in the “training_data”:

# filter testing data for only 3-4 and 7-9 quality wines. Create a new column
# with hard quality labels for each bucket of "good", "avg", and "bad"
labels = c("bad", "average", "good")
testing_data = testing_data |>
  filter(quality %in% c(3, 4, 7, 8, 9)) |>
  mutate(hard_quality = cut(quality, c(0, 4, 6, Inf), labels = labels))

# We will do same thing for training_data w/o excluding the avg wines this time:
labels = c("bad", "average", "good")
training_data = training_data |>
  mutate(hard_quality = cut(quality, c(0, 4, 6, Inf), labels = labels))

LDA

Let us start by fitting our LDA model on our features of interest with the training data. We will worry about the precision/recall trade-off at a later stage.

LDA_model = lda(quality ~ alcohol + `volatile acidity` + chlorides,
                   data=training_data)

LDA_preds = predict(LDA_model, testing_data)

Without doing any thresholding of our predictions, let us first see how our model performs:

# Confusion matrix
table(LDA_preds$class, testing_data$quality)

##    
##       3   4   7   8   9
##   3   1   1   0   0   0
##   4   0   0   0   0   0
##   5   6  52  37  10   0
##   6   8  39 313  44   1
##   7   1   5 141  40   4
##   8   0   0   0   0   0

# Accuracy
mean(LDA_preds$class == testing_data$quality)

## [1] 0.2019915

As previously explained, our testing dataset only consists of “good” and “bad” wines. As you can see, our model is only 20% accurate on this testing subset. Additionally, many of the “good” wines (7s & 8s) are being predicted as average. That is not to say, however, that we should necessarily give up on LDA already. I claim this is still salvageable; our model can have terrible accuracy, as long as we are fairly certain that all of the positive predictions it makes are absolutely correct. In other words, we need to play around with the thresholding for an LDA for “good” wines, and an LDA for “bad” wines. First, I will use a dirty trick of summing up the probabilities for predictions of “3 or 4” (bad) together, “5 or 6” (avg) together, and “7, 8 or 9” (good) together; we do not care about distinctions on quality within each category.

# https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/cbind
# https://forum.posit.co/t/sum-of-columns-into-a-new-column/86500 
summed_posterior = cbind(
  bad_wine = rowSums(LDA_preds$posterior[, c("3", "4")]), 
  average_wine = rowSums(LDA_preds$posterior[, c("5", "6")]), 
  good_wine = rowSums(LDA_preds$posterior[, c("7", "8")]))

LDA and Thresholding for “Good” Wines

Let us start with LDA over the “good” wines. I define my thresholding as such:

threshold = 0.7

good_wine_LDA_thresholded_preds = summed_posterior[, "good_wine"] > threshold
best_wine_targets = testing_data$hard_quality == "good"
table(good_wine_LDA_thresholded_preds, best_wine_targets)

##                                best_wine_targets
## good_wine_LDA_thresholded_preds FALSE TRUE
##                           FALSE   113  562
##                           TRUE      0   28

Our model is still pretty awful, in my opinion. But, notice that if we set our threshold really high, LDA rarely, if ever, predicts a non-best wine as a best wine (even if many best wines aren’t also predicted as best wines). Nevertheless, the small number of actually good wines that we predicted as good is possibly enough to work with.

LDA and Thresholding for “Bad” Wines

Let us repeat the same thing but with bad wines this time…

threshold = 0.21

bad_wine_LDA_thresholded_preds = summed_posterior[, "bad_wine"] > threshold
worst_wine_targets = testing_data$hard_quality == "bad"
table(bad_wine_LDA_thresholded_preds, worst_wine_targets)

##                               worst_wine_targets
## bad_wine_LDA_thresholded_preds FALSE TRUE
##                          FALSE   590  101
##                          TRUE      0   12

Notice that I had to set a much lower threshold value… This reflects the fact that predicting bad wines seems to be the harder task for our LDA model. Nevertheless, this threshold is sufficiently high enough to never predict an actually non-bad wine as a bad wine, even if we miss many of the actually bad wines.

Great! We now have an LDA model with sufficiently high precision for good and bad wines each, though terrible recall (determined by confusion matrix visual inspection as opposed to calculation). We will now consider a different approach.

Decision Trees

We again fit a model. I pass “prob” to predict such that it should output probabilities as opposed to hard labels. We will again use these probabilities for thresholding to maximize precision.

bagging_model = bagging(factor(quality) ~ alcohol + `volatile acidity` + chlorides, 
                        data=training_data)
bagging_preds = predict(bagging_model, testing_data, type="prob")

Again, I bin the prediction probabilities into their corresponding “bad”, “average”, and “good” bins.

bagging_preds = cbind(
  bad_wine = rowSums(bagging_preds[, c("3", "4")]), 
  average_wine = rowSums(bagging_preds[, c("5", "6")]), 
  good_wine = rowSums(bagging_preds[, c("7", "8")]))

Good Wines Decision Trees

Again, I apply my thresholding. This time I choose a very high threshold such that no false positives appear.

threshold = 0.95

good_wine_bag_thresholded_preds = bagging_preds[, "good_wine"] > threshold
best_wine_targets = testing_data$hard_quality == "good"
table(good_wine_bag_thresholded_preds, best_wine_targets)

##                                best_wine_targets
## good_wine_bag_thresholded_preds FALSE TRUE
##                           FALSE   113  558
##                           TRUE      0   32

mean(good_wine_bag_thresholded_preds == best_wine_targets)

## [1] 0.2062589

Again, also not a very good accuracy, but it hardly will ever make a false positive prediction.

Bad Wines Decision Trees

For the bad wines, we repeat the same, though with a lower threshold (“bad” predictions is the harder task):

threshold = 0.25

bad_wine_bag_thresholded_preds = bagging_preds[, "bad_wine"] > threshold
worst_wine_targets = testing_data$hard_quality == "bad"
table(bad_wine_bag_thresholded_preds, worst_wine_targets)

##                               worst_wine_targets
## bad_wine_bag_thresholded_preds FALSE TRUE
##                          FALSE   586   88
##                          TRUE      4   25

mean(bad_wine_bag_thresholded_preds == worst_wine_targets)

## [1] 0.8691323

While the accuracy may appear good, this is only because our model is really good at excluding not-bad wines from the bad wines prediction category. Nevertheless, it is still very precise for bad wines.

k-NN

We will now repeat the same using kNN. The probabilities are a function of proportions of nearest neighbors. I refer to this thread: https://stackoverflow.com/questions/30102017/probabilities-of-all-classifications-in-knn-in-r for understanding how to get probabilities.

k = 3

# https://rdrr.io/cran/caret/man/knn3.html
# https://stackoverflow.com/questions/30102017/probabilities-of-all-classifications-in-knn-in-r
knn3_model = caret::knn3(x = training_data[, c("alcohol", "volatile acidity", "chlorides")], 
                          y = factor(training_data$quality), 
                          k = 3)
knn_probs = predict(knn3_model, testing_data[, c("alcohol", "volatile acidity", "chlorides")], type = "prob")

We repeat the same “good”, “avg”, “bad” binning scheme for probabilities over 3, 4, 5, 6, 7, 8, 9:

knn_preds = cbind(
  bad_wine = rowSums(knn_probs[, c("3", "4")]), 
  average_wine = rowSums(knn_probs[, c("5", "6")]), 
  good_wine = rowSums(knn_probs[, c("7", "8")])
)

Good Wines kNN

Note that my choice of threshold here is not only arbitrarily chosen to whatever best maximizes my precision, but to also consider discrete possible probabilities given the value of k (my probabilities are always 0, 0.333, 0.666, 1.0 for \(k=3\)).

threshold = 0.6

good_wine_knn_thresholded_preds = knn_preds[, "good_wine"] > threshold
best_wine_targets = testing_data$hard_quality == "good"
table(good_wine_knn_thresholded_preds, best_wine_targets)

##                                best_wine_targets
## good_wine_knn_thresholded_preds FALSE TRUE
##                           FALSE   104  365
##                           TRUE      9  225

Bad Wines Decision kNN

Since bad wines are fewer in abundance in my training set, I am obliged to use a smaller threshold if I want to catch any of them:

threshold = 0.3

bad_wine_knn_thresholded_preds = knn_preds[, "bad_wine"] > threshold
worst_wine_targets = testing_data$hard_quality == "bad"
table(bad_wine_knn_thresholded_preds, worst_wine_targets)

##                               worst_wine_targets
## bad_wine_knn_thresholded_preds FALSE TRUE
##                          FALSE   572   84
##                          TRUE     18   29

Evidently, this the model with the worst precision of all. I couldn’t really improve this by increasing k either.

Combining Models/Overall Test Results

Wouldn’t it be nice to combine the results of the previous models we just discussed? Each model has its own set of issues: decision trees have high variance, and low bias. LDA has high bias, and lower variance. It would be nice to cross-reference the predicted “good” and “bad” wines across each model to give us a little bit more confidence in those predictions.

table((good_wine_bag_thresholded_preds & good_wine_LDA_thresholded_preds & good_wine_knn_thresholded_preds), best_wine_targets)

##        best_wine_targets
##         FALSE TRUE
##   FALSE   113  586
##   TRUE      0    4

table((bad_wine_bag_thresholded_preds & bad_wine_LDA_thresholded_preds & bad_wine_knn_thresholded_preds), worst_wine_targets)

##        worst_wine_targets
##         FALSE TRUE
##   FALSE   590  108
##   TRUE      0    5

It appears that our models thresholded on the best wine data are collectively able to predict 24 positives in the exact same manner! Our negative thresholding for models was worse… It was only able to agree on 6 bad wines.

Another thing we could consider is to count the number of models that are in agreement on a positive predictions for “good” wines. This will require some logical operations, where we treat booleans as numerics:

# Situations where all models predict true are given a score of 3, 2 models together predicting true are given 2, etc...
good_wine_agreement_score =  (good_wine_bag_thresholded_preds &     
                              good_wine_LDA_thresholded_preds & 
                              good_wine_knn_thresholded_preds) * 3 + 
                             ((good_wine_bag_thresholded_preds +   
                               good_wine_LDA_thresholded_preds + 
                              good_wine_knn_thresholded_preds) == 2) * 2 + 
                            ((good_wine_bag_thresholded_preds + 
                              good_wine_LDA_thresholded_preds + 
                              good_wine_knn_thresholded_preds) == 1) * 1

good_wine_result = data.frame(good_wine_agreement_score, best_wine_targets)
sorted_good_wine_result = good_wine_result[order(-good_wine_result$good_wine_agreement_score), ]
head(sorted_good_wine_result, 30)

##     good_wine_agreement_score best_wine_targets
## 166                         3              TRUE
## 167                         3              TRUE
## 310                         3              TRUE
## 375                         3              TRUE
## 44                          2              TRUE
## 74                          2              TRUE
## 92                          2              TRUE
## 98                          2              TRUE
## 129                         2              TRUE
## 146                         2              TRUE
## 177                         2              TRUE
## 281                         2              TRUE
## 286                         2              TRUE
## 305                         2              TRUE
## 311                         2              TRUE
## 313                         2              TRUE
## 320                         2              TRUE
## 339                         2              TRUE
## 340                         2              TRUE
## 341                         2              TRUE
## 353                         2              TRUE
## 354                         2              TRUE
## 360                         2              TRUE
## 366                         2              TRUE
## 377                         2              TRUE
## 378                         2              TRUE
## 391                         2              TRUE
## 417                         2              TRUE
## 424                         2              TRUE
## 447                         2              TRUE

As you can see, for good wines, when all three models agree on positive predictions (3s under good_wine_agreement_score), the actual target (best_wine_target) is also truly positive in pretty much every case. The same holds for when 2 of the models agree. Indices of the instances are on the left.

Same for positive predictions on “bad” wines:

bad_wine_agreement_score = (bad_wine_bag_thresholded_preds & 
                            bad_wine_LDA_thresholded_preds & 
                            bad_wine_knn_thresholded_preds) * 3 + 
                            ((bad_wine_bag_thresholded_preds + 
                              bad_wine_LDA_thresholded_preds + 
                              bad_wine_knn_thresholded_preds) == 2) * 2 + 
                            ((bad_wine_bag_thresholded_preds + 
                              bad_wine_LDA_thresholded_preds + 
                              bad_wine_knn_thresholded_preds) == 1) * 1

bad_wine_result = data.frame(bad_wine_agreement_score, worst_wine_targets)
sorted_bad_wine_result = bad_wine_result[order(-bad_wine_result$bad_wine_agreement_score), ]
head(sorted_bad_wine_result, 30)

##     bad_wine_agreement_score worst_wine_targets
## 244                        3               TRUE
## 571                        3               TRUE
## 615                        3               TRUE
## 635                        3               TRUE
## 683                        3               TRUE
## 14                         2               TRUE
## 37                         2              FALSE
## 106                        2              FALSE
## 119                        2               TRUE
## 262                        2               TRUE
## 270                        2               TRUE
## 275                        2               TRUE
## 280                        2               TRUE
## 545                        2               TRUE
## 561                        2               TRUE
## 563                        2               TRUE
## 567                        2               TRUE
## 611                        2               TRUE
## 621                        2               TRUE
## 625                        2               TRUE
## 638                        2               TRUE
## 677                        2               TRUE
## 687                        2               TRUE
## 3                          1              FALSE
## 11                         1              FALSE
## 16                         1               TRUE
## 60                         1              FALSE
## 64                         1              FALSE
## 78                         1              FALSE
## 114                        1              FALSE

For “bad” wines, when all three models are in agreement (3s), our positive predictions are generally correct. However, when only 2 agree, there seems to be a little room for error. I claim that this has to do with the relatively poor precision from our k-NN model being treated as an equally “good”/precise estimator against the other models when tallying predictions.

Inference on New Data

New Good Wines

We now apply the exact same models/thresholds on our_wine_data to determine the best wines:

LDA_new_preds = predict(LDA_model, our_wine_data)
bagging_new_preds = predict(bagging_model, our_wine_data, type="prob")
knn_new_preds = predict(knn3_model, our_wine_data[, c("alcohol", "volatile acidity", "chlorides")], type = "prob")

bagging_new_preds = cbind(
  bad_wine = rowSums(bagging_new_preds[, c("3", "4")]), 
  average_wine = rowSums(bagging_new_preds[, c("5", "6")]), 
  good_wine = rowSums(bagging_new_preds[, c("7", "8")]))

LDA_new_posterior = cbind(
  bad_wine = rowSums(LDA_new_preds$posterior[, c("3", "4")]), 
  average_wine = rowSums(LDA_new_preds$posterior[, c("5", "6")]), 
  good_wine = rowSums(LDA_new_preds$posterior[, c("7", "8")]))

knn_new_preds = cbind(
  bad_wine = rowSums(knn_new_preds[, c("3", "4")]), 
  average_wine = rowSums(knn_new_preds[, c("5", "6")]), 
  good_wine = rowSums(knn_new_preds[, c("7", "8")]))

threshold = 0.6
good_wine_LDA_thresholded_new_preds = LDA_new_posterior[, "good_wine"] > threshold

threshold = 0.85
good_wine_bag_thresholded_new_preds = bagging_new_preds[, "good_wine"] > threshold

threshold = 0.6
good_wine_knn_thresholded_new_preds = knn_new_preds[, "good_wine"] > threshold

good_wine_agreement_new_score = (good_wine_LDA_thresholded_new_preds & good_wine_bag_thresholded_new_preds & good_wine_knn_thresholded_new_preds) * 3 + 
  ((good_wine_LDA_thresholded_new_preds + good_wine_bag_thresholded_new_preds + good_wine_knn_thresholded_new_preds) == 2) * 2 + 
  ((good_wine_LDA_thresholded_new_preds + good_wine_bag_thresholded_new_preds + good_wine_knn_thresholded_new_preds) == 1) * 1
good_wine_new_result = data.frame(good_wine_agreement_new_score, our_wine_data$wine_id)
sorted_good_wine_new_result = good_wine_new_result[order(-good_wine_new_result$good_wine_agreement_new_score), ]
head(sorted_good_wine_new_result, 10)

##     good_wine_agreement_new_score our_wine_data.wine_id
## 275                             3                   275
## 359                             3                   359
## 415                             3                   415
## 21                              2                    21
## 28                              2                    28
## 39                              2                    39
## 48                              2                    48
## 63                              2                    63
## 86                              2                    86
## 91                              2                    91

New Bad Wines

And for bad wines:

threshold = 0.10
bad_wine_LDA_thresholded_new_preds = LDA_new_posterior[, "bad_wine"] > threshold

threshold = 0.20
bad_wine_bag_thresholded_new_preds = bagging_new_preds[, "bad_wine"] > threshold

threshold = 0.3
bad_wine_knn_thresholded_new_preds = knn_new_preds[, "bad_wine"] > threshold

bad_wine_agreement_new_score = (bad_wine_LDA_thresholded_new_preds & bad_wine_bag_thresholded_new_preds & bad_wine_knn_thresholded_new_preds) * 3 + 
  ((bad_wine_LDA_thresholded_new_preds + bad_wine_bag_thresholded_new_preds + bad_wine_knn_thresholded_new_preds) == 2) * 2 + 
  ((bad_wine_LDA_thresholded_new_preds + bad_wine_bag_thresholded_new_preds + bad_wine_knn_thresholded_new_preds) == 1) * 1
bad_wine_new_result = data.frame(bad_wine_agreement_new_score, our_wine_data$wine_id)
sorted_bad_wine_new_result = bad_wine_new_result[order(-bad_wine_new_result$bad_wine_agreement_new_score), ]
head(sorted_bad_wine_new_result, 10)

##     bad_wine_agreement_new_score our_wine_data.wine_id
## 141                            3                   141
## 156                            3                   156
## 3                              2                     3
## 9                              2                     9
## 35                             2                    35
## 52                             2                    52
## 55                             2                    55
## 75                             2                    75
## 136                            2                   136
## 170                            2                   170

Our Final List of Best and Worst Wines by I.D.

Good Wine I.D.s

Wine I.D.	Num. of Models in Agreement on Good Wine
275	3
359	3
415	3
21	2
28	2
39	2
48	2
63	2
86	2
91	2

Bad Wine I.D.s

Wine I.D.	Num. of Models in Agreement on Bad Wine
141	3
156	3
3	2
9	2
35	2
52	2
55	2
75	2
136	2
170	2