Estimating Wine Quality
In this activity I will use the dataset winequality-red. Activity 7 includes the dataset whitewinws.csv
Exploring and preparing the data
wine <- read.csv("winequality-red.csv")
Examine the wine data
str(wine)
## 'data.frame': 1599 obs. of 12 variables:
## $ fixed.acidity : num 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
## $ volatile.acidity : num 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
## $ citric.acid : num 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
## $ residual.sugar : num 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
## $ chlorides : num 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
## $ free.sulfur.dioxide : num 11 25 15 17 11 13 15 15 9 17 ...
## $ total.sulfur.dioxide: num 34 67 54 60 34 40 59 21 18 102 ...
## $ density : num 0.998 0.997 0.997 0.998 0.998 ...
## $ pH : num 3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
## $ sulphates : num 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
## $ alcohol : num 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
## $ quality : int 5 5 5 6 5 5 5 7 7 5 ...
The distribution of quality ratings
hist(wine$quality)
Summary statistics of the wine data
summary(wine)
## fixed.acidity volatile.acidity citric.acid residual.sugar
## Min. : 4.60 Min. :0.1200 Min. :0.000 Min. : 0.900
## 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090 1st Qu.: 1.900
## Median : 7.90 Median :0.5200 Median :0.260 Median : 2.200
## Mean : 8.32 Mean :0.5278 Mean :0.271 Mean : 2.539
## 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420 3rd Qu.: 2.600
## Max. :15.90 Max. :1.5800 Max. :1.000 Max. :15.500
## chlorides free.sulfur.dioxide total.sulfur.dioxide density
## Min. :0.01200 Min. : 1.00 Min. : 6.00 Min. :0.9901
## 1st Qu.:0.07000 1st Qu.: 7.00 1st Qu.: 22.00 1st Qu.:0.9956
## Median :0.07900 Median :14.00 Median : 38.00 Median :0.9968
## Mean :0.08747 Mean :15.87 Mean : 46.47 Mean :0.9967
## 3rd Qu.:0.09000 3rd Qu.:21.00 3rd Qu.: 62.00 3rd Qu.:0.9978
## Max. :0.61100 Max. :72.00 Max. :289.00 Max. :1.0037
## pH sulphates alcohol quality
## Min. :2.740 Min. :0.3300 Min. : 8.40 Min. :3.000
## 1st Qu.:3.210 1st Qu.:0.5500 1st Qu.: 9.50 1st Qu.:5.000
## Median :3.310 Median :0.6200 Median :10.20 Median :6.000
## Mean :3.311 Mean :0.6581 Mean :10.42 Mean :5.636
## 3rd Qu.:3.400 3rd Qu.:0.7300 3rd Qu.:11.10 3rd Qu.:6.000
## Max. :4.010 Max. :2.0000 Max. :14.90 Max. :8.000
Split the dataset: 80% for training and 20% for testing
wine_train <- wine[1:1279, ]
wine_test <- wine[1280:1599, ]
Training a model on the data Regression tree using rpart Build a tree model to predict quality using all other variables
library(rpart)
m.rpart <- rpart(quality ~ ., data = wine_train)
Get basic information about the tree
m.rpart
## n= 1279
##
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 1279 843.43390 5.663800
## 2) alcohol< 10.55 804 332.70020 5.386816
## 4) sulphates< 0.575 303 93.07591 5.171617 *
## 5) sulphates>=0.575 501 217.10580 5.516966
## 10) volatile.acidity>=0.405 372 141.51610 5.403226
## 20) total.sulfur.dioxide>=46.5 171 48.42105 5.210526 *
## 21) total.sulfur.dioxide< 46.5 201 81.34328 5.567164 *
## 11) volatile.acidity< 0.405 129 56.89922 5.844961 *
## 3) alcohol>=10.55 475 344.64420 6.132632
## 6) sulphates< 0.625 183 132.92900 5.743169
## 12) volatile.acidity>=1.015 8 4.87500 4.125000 *
## 13) volatile.acidity< 1.015 175 106.14860 5.817143
## 26) volatile.acidity>=0.385 123 70.26016 5.642276 *
## 27) volatile.acidity< 0.385 52 23.23077 6.230769 *
## 7) sulphates>=0.625 292 166.56160 6.376712
## 14) alcohol< 11.55 161 82.26087 6.130435
## 28) total.sulfur.dioxide>=85.5 8 2.00000 5.000000 *
## 29) total.sulfur.dioxide< 85.5 153 69.50327 6.189542
## 58) volatile.acidity>=0.395 83 28.89157 5.963855 *
## 59) volatile.acidity< 0.395 70 31.37143 6.457143 *
## 15) alcohol>=11.55 131 62.53435 6.679389 *
Get more detailed information about the tree
summary(m.rpart)
## Call:
## rpart(formula = quality ~ ., data = wine_train)
## n= 1279
##
## CP nsplit rel error xerror xstd
## 1 0.19692055 0 1.0000000 1.0022141 0.04019729
## 2 0.05353544 1 0.8030795 0.8197999 0.03930843
## 3 0.02669866 2 0.7495440 0.8100993 0.03826911
## 4 0.02597167 3 0.7228454 0.8127205 0.03914630
## 5 0.02580691 4 0.6968737 0.8033330 0.03910486
## 6 0.02215993 5 0.6710668 0.7966474 0.03903822
## 7 0.01500727 6 0.6489068 0.7619556 0.03628075
## 8 0.01393327 7 0.6338996 0.7477414 0.03621780
## 9 0.01275453 8 0.6199663 0.7470788 0.03623467
## 10 0.01095554 9 0.6072118 0.7406798 0.03558996
## 11 0.01000000 10 0.5962562 0.7379698 0.03546135
##
## Variable importance
## alcohol volatile.acidity density
## 33 16 14
## sulphates total.sulfur.dioxide chlorides
## 12 8 6
## fixed.acidity citric.acid free.sulfur.dioxide
## 5 4 1
## residual.sugar pH
## 1 1
##
## Node number 1: 1279 observations, complexity param=0.1969205
## mean=5.6638, MSE=0.659448
## left son=2 (804 obs) right son=3 (475 obs)
## Primary splits:
## alcohol < 10.55 to the left, improve=0.19692050, (0 missing)
## volatile.acidity < 0.425 to the right, improve=0.11756210, (0 missing)
## sulphates < 0.645 to the left, improve=0.11517210, (0 missing)
## density < 0.995565 to the right, improve=0.08420511, (0 missing)
## citric.acid < 0.305 to the left, improve=0.07136804, (0 missing)
## Surrogate splits:
## density < 0.995585 to the right, agree=0.774, adj=0.392, (0 split)
## chlorides < 0.0685 to the right, agree=0.689, adj=0.162, (0 split)
## volatile.acidity < 0.3675 to the right, agree=0.673, adj=0.120, (0 split)
## total.sulfur.dioxide < 12.5 to the right, agree=0.660, adj=0.084, (0 split)
## fixed.acidity < 6.55 to the right, agree=0.655, adj=0.072, (0 split)
##
## Node number 2: 804 observations, complexity param=0.02669866
## mean=5.386816, MSE=0.4138063
## left son=4 (303 obs) right son=5 (501 obs)
## Primary splits:
## sulphates < 0.575 to the left, improve=0.06768421, (0 missing)
## volatile.acidity < 0.325 to the right, improve=0.06764124, (0 missing)
## alcohol < 9.85 to the left, improve=0.06585239, (0 missing)
## total.sulfur.dioxide < 83.5 to the right, improve=0.03973483, (0 missing)
## fixed.acidity < 10.85 to the left, improve=0.03522151, (0 missing)
## Surrogate splits:
## density < 0.996285 to the left, agree=0.670, adj=0.125, (0 split)
## volatile.acidity < 0.7975 to the right, agree=0.649, adj=0.069, (0 split)
## fixed.acidity < 6.15 to the left, agree=0.633, adj=0.026, (0 split)
## citric.acid < 0.105 to the left, agree=0.632, adj=0.023, (0 split)
## chlorides < 0.055 to the left, agree=0.629, adj=0.017, (0 split)
##
## Node number 3: 475 observations, complexity param=0.05353544
## mean=6.132632, MSE=0.7255668
## left son=6 (183 obs) right son=7 (292 obs)
## Primary splits:
## sulphates < 0.625 to the left, improve=0.13101510, (0 missing)
## volatile.acidity < 0.87 to the right, improve=0.12433980, (0 missing)
## alcohol < 11.55 to the left, improve=0.12135200, (0 missing)
## citric.acid < 0.295 to the left, improve=0.10370610, (0 missing)
## pH < 3.355 to the right, improve=0.05891088, (0 missing)
## Surrogate splits:
## citric.acid < 0.205 to the left, agree=0.716, adj=0.262, (0 split)
## fixed.acidity < 8.15 to the left, agree=0.695, adj=0.208, (0 split)
## volatile.acidity < 0.665 to the right, agree=0.691, adj=0.197, (0 split)
## total.sulfur.dioxide < 14.5 to the left, agree=0.665, adj=0.131, (0 split)
## density < 0.99493 to the left, agree=0.644, adj=0.077, (0 split)
##
## Node number 4: 303 observations
## mean=5.171617, MSE=0.3071812
##
## Node number 5: 501 observations, complexity param=0.02215993
## mean=5.516966, MSE=0.4333449
## left son=10 (372 obs) right son=11 (129 obs)
## Primary splits:
## volatile.acidity < 0.405 to the right, improve=0.08608907, (0 missing)
## total.sulfur.dioxide < 50.5 to the right, improve=0.07568808, (0 missing)
## fixed.acidity < 10.95 to the left, improve=0.06485526, (0 missing)
## alcohol < 9.85 to the left, improve=0.06078331, (0 missing)
## free.sulfur.dioxide < 14.5 to the right, improve=0.03644560, (0 missing)
## Surrogate splits:
## fixed.acidity < 10.45 to the left, agree=0.780, adj=0.147, (0 split)
## citric.acid < 0.365 to the left, agree=0.754, adj=0.047, (0 split)
## chlorides < 0.0595 to the right, agree=0.754, adj=0.047, (0 split)
## free.sulfur.dioxide < 2.5 to the right, agree=0.750, adj=0.031, (0 split)
##
## Node number 6: 183 observations, complexity param=0.02597167
## mean=5.743169, MSE=0.7263878
## left son=12 (8 obs) right son=13 (175 obs)
## Primary splits:
## volatile.acidity < 1.015 to the right, improve=0.16479020, (0 missing)
## alcohol < 11.65 to the left, improve=0.13019140, (0 missing)
## citric.acid < 0.255 to the left, improve=0.12880670, (0 missing)
## pH < 3.435 to the right, improve=0.11508110, (0 missing)
## density < 0.99548 to the right, improve=0.07174394, (0 missing)
##
## Node number 7: 292 observations, complexity param=0.02580691
## mean=6.376712, MSE=0.5704166
## left son=14 (161 obs) right son=15 (131 obs)
## Primary splits:
## alcohol < 11.55 to the left, improve=0.13068090, (0 missing)
## total.sulfur.dioxide < 96 to the right, improve=0.07207060, (0 missing)
## volatile.acidity < 0.335 to the right, improve=0.05598551, (0 missing)
## chlorides < 0.0785 to the right, improve=0.05341253, (0 missing)
## density < 0.99985 to the right, improve=0.05290635, (0 missing)
## Surrogate splits:
## density < 0.995315 to the right, agree=0.695, adj=0.321, (0 split)
## fixed.acidity < 5.75 to the right, agree=0.610, adj=0.130, (0 split)
## chlorides < 0.053 to the right, agree=0.606, adj=0.122, (0 split)
## residual.sugar < 4.25 to the left, agree=0.596, adj=0.099, (0 split)
## total.sulfur.dioxide < 21.5 to the right, agree=0.596, adj=0.099, (0 split)
##
## Node number 10: 372 observations, complexity param=0.01393327
## mean=5.403226, MSE=0.3804197
## left son=20 (171 obs) right son=21 (201 obs)
## Primary splits:
## total.sulfur.dioxide < 46.5 to the right, improve=0.08304207, (0 missing)
## alcohol < 9.85 to the left, improve=0.05705613, (0 missing)
## free.sulfur.dioxide < 26.5 to the right, improve=0.04378900, (0 missing)
## fixed.acidity < 11 to the left, improve=0.03353691, (0 missing)
## chlorides < 0.0975 to the right, improve=0.02770083, (0 missing)
## Surrogate splits:
## free.sulfur.dioxide < 14.5 to the right, agree=0.801, adj=0.567, (0 split)
## residual.sugar < 2.55 to the right, agree=0.637, adj=0.211, (0 split)
## chlorides < 0.0975 to the right, agree=0.610, adj=0.152, (0 split)
## pH < 3.235 to the left, agree=0.610, adj=0.152, (0 split)
## citric.acid < 0.255 to the right, agree=0.602, adj=0.135, (0 split)
##
## Node number 11: 129 observations
## mean=5.844961, MSE=0.4410793
##
## Node number 12: 8 observations
## mean=4.125, MSE=0.609375
##
## Node number 13: 175 observations, complexity param=0.01500727
## mean=5.817143, MSE=0.6065633
## left son=26 (123 obs) right son=27 (52 obs)
## Primary splits:
## volatile.acidity < 0.385 to the right, improve=0.11924460, (0 missing)
## alcohol < 11.65 to the left, improve=0.10922390, (0 missing)
## citric.acid < 0.255 to the left, improve=0.10301000, (0 missing)
## pH < 3.265 to the right, improve=0.08144755, (0 missing)
## density < 0.99548 to the right, improve=0.07341612, (0 missing)
## Surrogate splits:
## citric.acid < 0.255 to the left, agree=0.800, adj=0.327, (0 split)
## pH < 3.275 to the right, agree=0.777, adj=0.250, (0 split)
## density < 0.99156 to the right, agree=0.737, adj=0.115, (0 split)
## free.sulfur.dioxide < 35 to the left, agree=0.731, adj=0.096, (0 split)
## total.sulfur.dioxide < 8.5 to the right, agree=0.731, adj=0.096, (0 split)
##
## Node number 14: 161 observations, complexity param=0.01275453
## mean=6.130435, MSE=0.5109371
## left son=28 (8 obs) right son=29 (153 obs)
## Primary splits:
## total.sulfur.dioxide < 85.5 to the right, improve=0.13077420, (0 missing)
## volatile.acidity < 0.395 to the right, improve=0.11559190, (0 missing)
## citric.acid < 0.335 to the left, improve=0.06879310, (0 missing)
## density < 1.0009 to the right, improve=0.06347986, (0 missing)
## chlorides < 0.0945 to the right, improve=0.05919751, (0 missing)
## Surrogate splits:
## volatile.acidity < 0.8525 to the right, agree=0.957, adj=0.125, (0 split)
##
## Node number 15: 131 observations
## mean=6.679389, MSE=0.4773615
##
## Node number 20: 171 observations
## mean=5.210526, MSE=0.2831641
##
## Node number 21: 201 observations
## mean=5.567164, MSE=0.404693
##
## Node number 26: 123 observations
## mean=5.642276, MSE=0.5712208
##
## Node number 27: 52 observations
## mean=6.230769, MSE=0.4467456
##
## Node number 28: 8 observations
## mean=5, MSE=0.25
##
## Node number 29: 153 observations, complexity param=0.01095554
## mean=6.189542, MSE=0.4542697
## left son=58 (83 obs) right son=59 (70 obs)
## Primary splits:
## volatile.acidity < 0.395 to the right, improve=0.13294730, (0 missing)
## chlorides < 0.0945 to the right, improve=0.05520964, (0 missing)
## citric.acid < 0.525 to the left, improve=0.05486624, (0 missing)
## total.sulfur.dioxide < 49.5 to the right, improve=0.03973369, (0 missing)
## sulphates < 0.885 to the left, improve=0.03706797, (0 missing)
## Surrogate splits:
## citric.acid < 0.315 to the left, agree=0.706, adj=0.357, (0 split)
## residual.sugar < 1.85 to the right, agree=0.647, adj=0.229, (0 split)
## chlorides < 0.0775 to the right, agree=0.641, adj=0.214, (0 split)
## sulphates < 0.805 to the left, agree=0.641, adj=0.214, (0 split)
## alcohol < 11.05 to the right, agree=0.641, adj=0.214, (0 split)
##
## Node number 58: 83 observations
## mean=5.963855, MSE=0.3480912
##
## Node number 59: 70 observations
## mean=6.457143, MSE=0.4481633
Installing packages
install.packages("rpart.plot")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
Use the rpart.plot package to create a visualization
library(rpart.plot)
A basic decision tree diagram
rpart.plot(m.rpart, digits = 3)
A few adjustments to the diagram
rpart.plot(m.rpart, digits = 4, fallen.leaves = TRUE, type = 3, extra = 101)
Step 4: Evaluate model performance
# generate predictions for the testing dataset
p.rpart <- predict(m.rpart, wine_test)
Compare the distribution of predicted values vs. actual values
summary(p.rpart)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.125 5.172 5.567 5.632 5.964 6.679
summary(wine_test$quality)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 5.000 6.000 5.525 6.000 8.000
Compare the correlation
cor(p.rpart, wine_test$quality)
## [1] 0.4901703
Function to calculate the mean absolute error
MAE <- function(actual, predicted) {
mean(abs(actual - predicted))
}
Mean absolute error between predicted and actual values
MAE(p.rpart, wine_test$quality)
## [1] 0.5332276
Mean absolute error between actual values and mean value
mean(wine_train$quality)
## [1] 5.6638
MAE(5.66, wine_test$quality)
## [1] 0.652125
Improving model performance
install.packages("plyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
install.packages("Cubist")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
Train a Cubist Model Tree
library(Cubist)
## Loading required package: lattice
m.cubist <- cubist(x = wine_train[-12], y = wine_train$quality)
Display basic information about the model tree
m.cubist
##
## Call:
## cubist.default(x = wine_train[-12], y = wine_train$quality)
##
## Number of samples: 1279
## Number of predictors: 11
##
## Number of committees: 1
## Number of rules: 13
Display the tree itself
summary(m.cubist)
##
## Call:
## cubist.default(x = wine_train[-12], y = wine_train$quality)
##
##
## Cubist [Release 2.07 GPL Edition] Thu Feb 5 20:27:12 2026
## ---------------------------------
##
## Target attribute `outcome'
##
## Read 1279 cases (12 attributes) from undefined.data
##
## Model:
##
## Rule 1: [18 cases, mean 5.2, range 3 to 7, est err 0.7]
##
## if
## volatile.acidity > 0.31
## chlorides > 0.092
## total.sulfur.dioxide > 32
## density > 0.99824
## sulphates > 0.63
## alcohol > 9.8
## then
## outcome = -454.5 - 26.3 chlorides + 464 density - 2.64 volatile.acidity
## + 0.012 alcohol - 0.0003 total.sulfur.dioxide + 0.05 sulphates
##
## Rule 2: [541 cases, mean 5.3, range 3 to 8, est err 0.4]
##
## if
## alcohol <= 9.8
## then
## outcome = 5
##
## Rule 3: [502 cases, mean 5.3, range 3 to 8, est err 0.4]
##
## if
## total.sulfur.dioxide <= 119
## alcohol <= 9.8
## then
## outcome = 5.2 - 1.31 volatile.acidity - 0.69 citric.acid
## + 0.47 sulphates + 0.07 alcohol + 0.039 fixed.acidity
## - 0.0011 total.sulfur.dioxide - 0.0032 free.sulfur.dioxide
## - 0.4 chlorides - 0.09 pH
##
## Rule 4: [44 cases, mean 5.4, range 4 to 7, est err 0.5]
##
## if
## volatile.acidity > 0.31
## pH <= 3.17
## sulphates <= 0.63
## alcohol > 9.8
## then
## outcome = 5.7 + 0.327 alcohol + 1.44 sulphates - 1.15 pH
## - 0.0032 total.sulfur.dioxide - 0.06 fixed.acidity
## + 0.0101 free.sulfur.dioxide - 0.41 volatile.acidity
## - 0.04 residual.sugar
##
## Rule 5: [145 cases, mean 5.6, range 4 to 7, est err 0.5]
##
## if
## volatile.acidity > 0.31
## total.sulfur.dioxide > 25
## pH > 3.17
## sulphates <= 0.63
## alcohol > 9.8
## then
## outcome = 6.6 + 3.02 sulphates + 0.267 alcohol - 1.55 pH
## - 0.0061 total.sulfur.dioxide + 0.0042 free.sulfur.dioxide
##
## Rule 6: [95 cases, mean 5.6, range 3 to 7, est err 0.7]
##
## if
## total.sulfur.dioxide <= 17
## sulphates <= 0.63
## alcohol > 9.8
## then
## outcome = 23.4 + 0.46 alcohol - 2.95 pH - 0.174 fixed.acidity
## - 0.95 volatile.acidity - 0.106 residual.sugar
## + 0.18 sulphates - 11 density - 0.0003 total.sulfur.dioxide
## + 0.0009 free.sulfur.dioxide
##
## Rule 7: [34 cases, mean 5.7, range 4 to 7, est err 0.5]
##
## if
## volatile.acidity > 0.31
## total.sulfur.dioxide > 17
## total.sulfur.dioxide <= 25
## pH > 3.17
## sulphates <= 0.63
## alcohol > 9.8
## then
## outcome = 20 - 0.2439 total.sulfur.dioxide + 0.1262 free.sulfur.dioxide
## - 2.42 volatile.acidity - 2.68 pH
##
## Rule 8: [27 cases, mean 5.9, range 5 to 7, est err 0.6]
##
## if
## chlorides > 0.092
## total.sulfur.dioxide > 32
## density <= 0.99824
## sulphates > 0.63
## alcohol > 9.8
## then
## outcome = 7 - 6.2 chlorides - 0.59 volatile.acidity
##
## Rule 9: [135 cases, mean 6.1, range 5 to 8, est err 0.5]
##
## if
## volatile.acidity > 0.31
## total.sulfur.dioxide <= 32
## pH > 3.17
## sulphates > 0.63
## then
## outcome = 67.7 + 0.291 alcohol + 0.164 residual.sugar - 65 density
## - 0.69 volatile.acidity
##
## Rule 10: [150 cases, mean 6.1, range 5 to 8, est err 0.5]
##
## if
## volatile.acidity > 0.31
## chlorides <= 0.092
## total.sulfur.dioxide > 32
## sulphates > 0.63
## alcohol > 9.8
## then
## outcome = 1.8 - 0.0144 total.sulfur.dioxide + 0.359 alcohol
## + 1.2 sulphates + 0.072 residual.sugar
##
## Rule 11: [35 cases, mean 6.3, range 5 to 8, est err 0.5]
##
## if
## volatile.acidity > 0.31
## total.sulfur.dioxide <= 32
## pH <= 3.17
## sulphates > 0.63
## alcohol > 9.8
## then
## outcome = 151.3 + 4.11 pH + 11.2 chlorides - 159 density
## + 0.014 residual.sugar - 0.09 volatile.acidity + 0.013 alcohol
##
## Rule 12: [56 cases, mean 6.3, range 5 to 8, est err 0.6]
##
## if
## volatile.acidity <= 0.31
## sulphates <= 0.73
## alcohol > 9.8
## then
## outcome = 327.8 + 7.64 volatile.acidity + 3.95 sulphates - 315 density
## - 3.07 pH - 0.229 alcohol - 0.0052 total.sulfur.dioxide
## + 0.13 residual.sugar
##
## Rule 13: [75 cases, mean 6.5, range 5 to 8, est err 0.4]
##
## if
## volatile.acidity <= 0.31
## sulphates > 0.73
## then
## outcome = 1.7 + 0.407 alcohol + 1.48 citric.acid + 0.22 volatile.acidity
## + 0.13 sulphates - 0.1 pH
##
##
## Evaluation on training data (1279 cases):
##
## Average |error| 0.4
## Relative |error| 0.57
## Correlation coefficient 0.66
##
##
## Attribute usage:
## Conds Model
##
## 89% 68% alcohol
## 61% 56% total.sulfur.dioxide
## 44% 58% sulphates
## 37% 55% volatile.acidity
## 21% 53% pH
## 11% 31% chlorides
## 2% 18% density
## 44% free.sulfur.dioxide
## 35% fixed.acidity
## 31% citric.acid
## 28% residual.sugar
##
##
## Time: 0.0 secs
Generate predictions for the model
p.cubist <- predict(m.cubist, wine_test)
Summary statistics about the predictions
summary(p.cubist)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.174 5.095 5.607 5.621 6.021 7.400
Correlation between the predicted and true values
cor(p.cubist, wine_test$quality)
## [1] 0.5310026
Mean absolute error of predicted and true values
# (uses a custom function defined above)
MAE(wine_test$quality, p.cubist)
## [1] 0.5037106