Overview
In this homework assignment, we will explore, analyze and model a data set containing information on approximately 12,000 commercially available wines. The variables are mostly related to the chemical properties of the wine being sold. The response variable is the number of sample cases of wine that were purchased by wine distribution companies after sampling a wine. These cases would be used to provide tasting samples to restaurants and wine stores around the United States. The more sample cases purchased, the more likely is a wine to be sold at a high end restaurant. A large wine manufacturer is studying the data in order to predict the number of wine cases ordered based upon the wine characteristics. If the wine manufacturer can predict the number of cases, then that manufacturer will be able to adjust their wine offering to maximize sales.
Our objective is to build a count regression model to predict the number of cases of wine that will be sold given certain properties of the wine. HINT: Sometimes, the fact that a variable is missing is actually predictive of the target. We will only use the variables given to us (or variables that we derive from the variables provided). Below is a short description of the variables of interest in the data set:
VARIABLE NAME DEFINITION THEORETICAL EFFECT
INDEX
: Identification Variable (do not use)
TARGET
Number of Cases Purchased
AcidIndex
: Proprietary method of testing total acidity of wine by using a weighted average
Alcohol
: Alcohol Content
Chlorides
: Chloride content of wine
CitricAcid
: Citric Acid Content
Density
: Density of Wine
FixedAcidity
: Fixed Acidity of Wine
FreeSulfurDioxide
: Sulfur Dioxide content of wine
LabelAppeal
: Marketing Score indicating the appeal of label design for consumers. High numbers suggest customers like the label design. Negative numbers suggest customes don’t like the design.
ResidualSugar
: Residual Sugar of wine STARS Wine rating by a team of experts. 4 Stars = Excellent, 1 Star = Poor
Sulphates
: Sulfate conten of wine
TotalSulfurDioxide
: Total Sulfur Dioxide of Wine
VolatileAcidity
: Volatile Acid content of wine
pH
: pH of wine
library(tidyverse)
library(caret)
library(e1071)
library(pracma)
library(pROC)
library(psych)
library(kableExtra)
library(Hmisc)
library(VIF)
library(FactoMineR)
library(corrplot)
library(purrr)
library(dplyr)
library(MASS)
library(mice)
library(tidyverse)
library(gridExtra)
library(kableExtra)
library(MASS)
library(lindia)
library(DT)
library(corrplot)
library(psych)
library(VIM)
library(mice)
library(car)
library(caret)
library(e1071)
wine_train <- read.csv("https://raw.githubusercontent.com/javernw/DATA621-Business-Analytics-and-Data-Mining/master/wine-training-data.csv")
wine_eval <- read.csv("https://raw.githubusercontent.com/javernw/DATA621-Business-Analytics-and-Data-Mining/master/wine-evaluation-data.csv")
## 'data.frame': 12795 obs. of 16 variables:
## $ ï..INDEX : int 1 2 4 5 6 7 8 11 12 13 ...
## $ TARGET : int 3 3 5 3 4 0 0 4 3 6 ...
## $ FixedAcidity : num 3.2 4.5 7.1 5.7 8 11.3 7.7 6.5 14.8 5.5 ...
## $ VolatileAcidity : num 1.16 0.16 2.64 0.385 0.33 0.32 0.29 -1.22 0.27 -0.22 ...
## $ CitricAcid : num -0.98 -0.81 -0.88 0.04 -1.26 0.59 -0.4 0.34 1.05 0.39 ...
## $ ResidualSugar : num 54.2 26.1 14.8 18.8 9.4 ...
## $ Chlorides : num -0.567 -0.425 0.037 -0.425 NA 0.556 0.06 0.04 -0.007 -0.277 ...
## $ FreeSulfurDioxide : num NA 15 214 22 -167 -37 287 523 -213 62 ...
## $ TotalSulfurDioxide: num 268 -327 142 115 108 15 156 551 NA 180 ...
## $ Density : num 0.993 1.028 0.995 0.996 0.995 ...
## $ pH : num 3.33 3.38 3.12 2.24 3.12 3.2 3.49 3.2 4.93 3.09 ...
## $ Sulphates : num -0.59 0.7 0.48 1.83 1.77 1.29 1.21 NA 0.26 0.75 ...
## $ Alcohol : num 9.9 NA 22 6.2 13.7 15.4 10.3 11.6 15 12.6 ...
## $ LabelAppeal : int 0 -1 -1 -1 0 0 0 1 0 0 ...
## $ AcidIndex : int 8 7 8 6 9 11 8 7 6 8 ...
## $ STARS : int 2 3 3 1 2 NA NA 3 NA 4 ...
## ï..INDEX TARGET FixedAcidity VolatileAcidity
## Min. : 1 Min. :0.000 Min. :-18.100 Min. :-2.7900
## 1st Qu.: 4038 1st Qu.:2.000 1st Qu.: 5.200 1st Qu.: 0.1300
## Median : 8110 Median :3.000 Median : 6.900 Median : 0.2800
## Mean : 8070 Mean :3.029 Mean : 7.076 Mean : 0.3241
## 3rd Qu.:12106 3rd Qu.:4.000 3rd Qu.: 9.500 3rd Qu.: 0.6400
## Max. :16129 Max. :8.000 Max. : 34.400 Max. : 3.6800
##
## CitricAcid ResidualSugar Chlorides FreeSulfurDioxide
## Min. :-3.2400 Min. :-127.800 Min. :-1.1710 Min. :-555.00
## 1st Qu.: 0.0300 1st Qu.: -2.000 1st Qu.:-0.0310 1st Qu.: 0.00
## Median : 0.3100 Median : 3.900 Median : 0.0460 Median : 30.00
## Mean : 0.3084 Mean : 5.419 Mean : 0.0548 Mean : 30.85
## 3rd Qu.: 0.5800 3rd Qu.: 15.900 3rd Qu.: 0.1530 3rd Qu.: 70.00
## Max. : 3.8600 Max. : 141.150 Max. : 1.3510 Max. : 623.00
## NA's :616 NA's :638 NA's :647
## TotalSulfurDioxide Density pH Sulphates
## Min. :-823.0 Min. :0.8881 Min. :0.480 Min. :-3.1300
## 1st Qu.: 27.0 1st Qu.:0.9877 1st Qu.:2.960 1st Qu.: 0.2800
## Median : 123.0 Median :0.9945 Median :3.200 Median : 0.5000
## Mean : 120.7 Mean :0.9942 Mean :3.208 Mean : 0.5271
## 3rd Qu.: 208.0 3rd Qu.:1.0005 3rd Qu.:3.470 3rd Qu.: 0.8600
## Max. :1057.0 Max. :1.0992 Max. :6.130 Max. : 4.2400
## NA's :682 NA's :395 NA's :1210
## Alcohol LabelAppeal AcidIndex STARS
## Min. :-4.70 Min. :-2.000000 Min. : 4.000 Min. :1.000
## 1st Qu.: 9.00 1st Qu.:-1.000000 1st Qu.: 7.000 1st Qu.:1.000
## Median :10.40 Median : 0.000000 Median : 8.000 Median :2.000
## Mean :10.49 Mean :-0.009066 Mean : 7.773 Mean :2.042
## 3rd Qu.:12.40 3rd Qu.: 1.000000 3rd Qu.: 8.000 3rd Qu.:3.000
## Max. :26.50 Max. : 2.000000 Max. :17.000 Max. :4.000
## NA's :653 NA's :3359
cases_purchased <- table(wine_train$TARGET) %>% data.frame()
cases_purchased %>% ggplot(aes(x = Var1, y = Freq)) + geom_bar(stat = "identity", fill = "blue") + labs(x = "Cases", y = "Counts")
w1 = melt(wine_train[,-1])
ggplot(w1, aes(x= value)) +
geom_density(fill='blue') + facet_wrap(~variable, scales = 'free')
A few of the variables have multimodal distribution (TARGET
, LabelAppeal
, STARS
) while the others seem to be normally distrbuted due to bell curve they display.
m_scores <- wine_train$LabelAppeal %>% table() %>% data.frame() %>% mutate(per = (Freq/sum(Freq))*100)
names(m_scores)[1]<-"score"
lbls <- paste(m_scores$score, "\n", round(m_scores$per, 2)) # add percents to labels
lbls <- paste(lbls,"%",sep="") # ad % to labels
pie(m_scores$Freq,labels = lbls, col= c("#990000", "#336600", "#CC6600", "#CCCC00", "#4CC099"), main="Marketing Scores Proportioned")
About 28% of the wine are not favored by customers based on their label designs
ggplot(stack(wine_train[,-1]), aes(x = ind, y = values)) +
geom_boxplot() +
theme(legend.position="none") +
theme(axis.text.x=element_text(angle=45, hjust=1)) +
theme(panel.background = element_rect(fill = 'grey'))
## Warning: Removed 8200 rows containing non-finite values (stat_boxplot).
We can see that there is come moderate but postive corrleation among the target variable and predictors STARS
and LabelAppeal
.
(at least two for each)
##
## Call:
## glm(formula = TARGET ~ ., family = "poisson", data = wine_train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.8802 -0.4949 0.2216 0.6259 2.6109
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.167e+00 1.952e-01 11.099 < 2e-16 ***
## FixedAcidity -4.152e-04 8.196e-04 -0.507 0.61240
## VolatileAcidity -5.151e-02 6.487e-03 -7.940 2.02e-15 ***
## CitricAcid 1.393e-02 5.894e-03 2.364 0.01810 *
## ResidualSugar 1.854e-04 1.505e-04 1.232 0.21811
## Chlorides -6.561e-02 1.593e-02 -4.119 3.80e-05 ***
## FreeSulfurDioxide 1.360e-04 3.431e-05 3.965 7.35e-05 ***
## TotalSulfurDioxide 9.759e-05 2.196e-05 4.445 8.80e-06 ***
## Density -4.446e-01 1.919e-01 -2.317 0.02051 *
## pH -2.442e-02 7.520e-03 -3.247 0.00117 **
## Sulphates -1.481e-02 5.460e-03 -2.712 0.00668 **
## Alcohol 5.430e-03 1.373e-03 3.954 7.70e-05 ***
## LabelAppeal 1.999e-01 6.118e-03 32.680 < 2e-16 ***
## AcidIndex -1.235e-01 4.464e-03 -27.672 < 2e-16 ***
## STARS 1.606e-01 5.836e-03 27.523 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 22861 on 12794 degrees of freedom
## Residual deviance: 18809 on 12780 degrees of freedom
## AIC: 50782
##
## Number of Fisher Scoring iterations: 5
##
## Call:
## glm(formula = TARGET ~ VolatileAcidity + CitricAcid + Chlorides +
## FreeSulfurDioxide + TotalSulfurDioxide + Density + pH + Sulphates +
## Alcohol + LabelAppeal + AcidIndex + STARS, family = "poisson",
## data = wine_train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.8956 -0.4974 0.2212 0.6256 2.6176
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.168e+00 1.952e-01 11.102 < 2e-16 ***
## VolatileAcidity -5.161e-02 6.487e-03 -7.956 1.78e-15 ***
## CitricAcid 1.383e-02 5.893e-03 2.346 0.01898 *
## Chlorides -6.560e-02 1.593e-02 -4.119 3.80e-05 ***
## FreeSulfurDioxide 1.363e-04 3.431e-05 3.974 7.08e-05 ***
## TotalSulfurDioxide 9.806e-05 2.195e-05 4.467 7.94e-06 ***
## Density -4.444e-01 1.919e-01 -2.316 0.02057 *
## pH -2.427e-02 7.519e-03 -3.228 0.00125 **
## Sulphates -1.491e-02 5.459e-03 -2.732 0.00630 **
## Alcohol 5.397e-03 1.373e-03 3.931 8.46e-05 ***
## LabelAppeal 2.000e-01 6.118e-03 32.689 < 2e-16 ***
## AcidIndex -1.239e-01 4.411e-03 -28.088 < 2e-16 ***
## STARS 1.607e-01 5.835e-03 27.535 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 22861 on 12794 degrees of freedom
## Residual deviance: 18811 on 12782 degrees of freedom
## AIC: 50779
##
## Number of Fisher Scoring iterations: 5
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
##
## Call:
## glm.nb(formula = TARGET ~ ., data = wine_train, init.theta = 32958.95846,
## link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.8800 -0.4949 0.2215 0.6259 2.6109
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.167e+00 1.952e-01 11.099 < 2e-16 ***
## FixedAcidity -4.153e-04 8.197e-04 -0.507 0.61240
## VolatileAcidity -5.151e-02 6.488e-03 -7.940 2.02e-15 ***
## CitricAcid 1.393e-02 5.894e-03 2.364 0.01810 *
## ResidualSugar 1.854e-04 1.505e-04 1.232 0.21812
## Chlorides -6.561e-02 1.593e-02 -4.119 3.80e-05 ***
## FreeSulfurDioxide 1.360e-04 3.431e-05 3.965 7.35e-05 ***
## TotalSulfurDioxide 9.759e-05 2.196e-05 4.445 8.80e-06 ***
## Density -4.446e-01 1.919e-01 -2.317 0.02052 *
## pH -2.442e-02 7.521e-03 -3.247 0.00117 **
## Sulphates -1.481e-02 5.460e-03 -2.712 0.00668 **
## Alcohol 5.430e-03 1.373e-03 3.953 7.71e-05 ***
## LabelAppeal 1.999e-01 6.118e-03 32.678 < 2e-16 ***
## AcidIndex -1.235e-01 4.464e-03 -27.671 < 2e-16 ***
## STARS 1.606e-01 5.836e-03 27.522 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(32958.96) family taken to be 1)
##
## Null deviance: 22859 on 12794 degrees of freedom
## Residual deviance: 18808 on 12780 degrees of freedom
## AIC: 50784
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 32959
## Std. Err.: 59343
## Warning while fitting theta: iteration limit reached
##
## 2 x log-likelihood: -50751.61
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in glm.nb(formula = TARGET ~ FixedAcidity + VolatileAcidity + CitricAcid
## + : alternation limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
##
## Call:
## glm.nb(formula = TARGET ~ VolatileAcidity + CitricAcid + Chlorides +
## FreeSulfurDioxide + TotalSulfurDioxide + Density + pH + Sulphates +
## Alcohol + LabelAppeal + AcidIndex + STARS, data = wine_train,
## init.theta = 32956.86552, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.8954 -0.4974 0.2212 0.6256 2.6176
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.168e+00 1.953e-01 11.101 < 2e-16 ***
## VolatileAcidity -5.161e-02 6.487e-03 -7.956 1.78e-15 ***
## CitricAcid 1.383e-02 5.894e-03 2.346 0.01899 *
## Chlorides -6.561e-02 1.593e-02 -4.119 3.81e-05 ***
## FreeSulfurDioxide 1.363e-04 3.431e-05 3.973 7.08e-05 ***
## TotalSulfurDioxide 9.806e-05 2.195e-05 4.467 7.94e-06 ***
## Density -4.444e-01 1.919e-01 -2.316 0.02057 *
## pH -2.427e-02 7.519e-03 -3.228 0.00125 **
## Sulphates -1.491e-02 5.459e-03 -2.732 0.00630 **
## Alcohol 5.397e-03 1.373e-03 3.931 8.47e-05 ***
## LabelAppeal 2.000e-01 6.118e-03 32.687 < 2e-16 ***
## AcidIndex -1.239e-01 4.411e-03 -28.087 < 2e-16 ***
## STARS 1.607e-01 5.835e-03 27.534 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(32956.87) family taken to be 1)
##
## Null deviance: 22859 on 12794 degrees of freedom
## Residual deviance: 18810 on 12782 degrees of freedom
## AIC: 50781
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 32957
## Std. Err.: 59361
## Warning while fitting theta: iteration limit reached
##
## 2 x log-likelihood: -50753.41
##
## Call:
## lm(formula = TARGET ~ ., data = wine_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.9212 -0.7015 0.3945 1.1197 4.3514
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.829e+00 5.628e-01 10.356 < 2e-16 ***
## FixedAcidity -9.759e-04 2.365e-03 -0.413 0.67985
## VolatileAcidity -1.555e-01 1.878e-02 -8.283 < 2e-16 ***
## CitricAcid 3.964e-02 1.709e-02 2.320 0.02038 *
## ResidualSugar 5.833e-04 4.355e-04 1.339 0.18045
## Chlorides -2.042e-01 4.587e-02 -4.452 8.59e-06 ***
## FreeSulfurDioxide 4.137e-04 9.914e-05 4.173 3.03e-05 ***
## TotalSulfurDioxide 2.792e-04 6.324e-05 4.415 1.02e-05 ***
## Density -1.286e+00 5.544e-01 -2.319 0.02042 *
## pH -6.329e-02 2.166e-02 -2.922 0.00348 **
## Sulphates -4.275e-02 1.576e-02 -2.712 0.00669 **
## Alcohol 1.857e-02 3.954e-03 4.697 2.67e-06 ***
## LabelAppeal 6.010e-01 1.756e-02 34.217 < 2e-16 ***
## AcidIndex -3.256e-01 1.144e-02 -28.463 < 2e-16 ***
## STARS 5.223e-01 1.751e-02 29.826 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.661 on 12780 degrees of freedom
## Multiple R-squared: 0.257, Adjusted R-squared: 0.2562
## F-statistic: 315.7 on 14 and 12780 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = TARGET ~ VolatileAcidity + CitricAcid + Chlorides +
## FreeSulfurDioxide + TotalSulfurDioxide + Density + pH + Sulphates +
## Alcohol + LabelAppeal + AcidIndex + STARS, data = wine_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.9462 -0.7008 0.3930 1.1189 4.3585
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.828e+00 5.628e-01 10.356 < 2e-16 ***
## VolatileAcidity -1.557e-01 1.878e-02 -8.290 < 2e-16 ***
## CitricAcid 3.945e-02 1.709e-02 2.308 0.02099 *
## Chlorides -2.046e-01 4.587e-02 -4.460 8.25e-06 ***
## FreeSulfurDioxide 4.151e-04 9.912e-05 4.188 2.84e-05 ***
## TotalSulfurDioxide 2.808e-04 6.323e-05 4.441 9.04e-06 ***
## Density -1.282e+00 5.544e-01 -2.313 0.02075 *
## pH -6.294e-02 2.166e-02 -2.906 0.00367 **
## Sulphates -4.308e-02 1.575e-02 -2.735 0.00625 **
## Alcohol 1.847e-02 3.953e-03 4.672 3.01e-06 ***
## LabelAppeal 6.010e-01 1.756e-02 34.221 < 2e-16 ***
## AcidIndex -3.265e-01 1.126e-02 -28.999 < 2e-16 ***
## STARS 5.226e-01 1.751e-02 29.844 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.661 on 12782 degrees of freedom
## Multiple R-squared: 0.2569, Adjusted R-squared: 0.2562
## F-statistic: 368.2 on 12 and 12782 DF, p-value: < 2.2e-16