Starbucks Nutritional Menu Analysis: An application of linear regression, support vector machines, CART, and random forests algorithms

Introduction

We conducted an analysis on the nutritonal facts of the Starbucks Drink Menu. The data set was pulled from the online repository kaggle.com and the following report outlines a brief breakdown of said data using various regression and classification models. The uncleaned data set itself is composed of 18 variables with 242 observations. The two variables of interest to us in this analysis is sugar content (measured in grams) and the beverage category (nine possible classifications). For our linear regression, we regress sugar content against various independent variables to determine what nutritional characterisitcs have the largest effect on determining the amount of sugar in an iten from the Starbucks Drink Menu. For our classification models, we use various algorithms to try and predict what beverage category a drink is placed into based on its nutritional aspects.

Libraries and Reproductability

Below are the libraries used in this analyis as well as the function set.seed which allows for consistent reproctability throughout the project.

library(caret)

## Loading required package: lattice

## Loading required package: ggplot2

library(rpart) 
library(rattle)

## Rattle: A free graphical interface for data science with R.
## Version 5.2.0 Copyright (c) 2006-2018 Togaware Pty Ltd.
## Type 'rattle()' to shake, rattle, and roll your data.

library(e1071)

library(randomForest)

## randomForest 4.6-14

## Type rfNews() to see new features/changes/bug fixes.

## 
## Attaching package: 'randomForest'

## The following object is masked from 'package:rattle':
## 
##     importance

## The following object is masked from 'package:ggplot2':
## 
##     margin

set.seed(123)

Importing the Data

We need to import the raw data set into our R enviroment so that we may clean and preprocess the set before we begin partitioning the data.

SBUX_Data<-read.csv('/Users/samizaia/Downloads/starbucks-menu (1)/starbucks_drinkMenu_expanded_revised.csv')
head(SBUX_Data)

##         Beverage_category      Beverage     Beverage_prep Calories
## 1                  Coffee Brewed Coffee             Short        3
## 2                  Coffee Brewed Coffee              Tall        4
## 3                  Coffee Brewed Coffee            Grande        5
## 4                  Coffee Brewed Coffee             Venti        5
## 5 Classic Espresso Drinks   Caffè Latte Short Nonfat Milk       70
## 6 Classic Espresso Drinks   Caffè Latte           2% Milk      100
##   Total.Fat..g. Trans.Fat..g. Saturated.Fat..g. Sodium..mg.
## 1           0.1           0.0               0.0           0
## 2           0.1           0.0               0.0           0
## 3           0.1           0.0               0.0           0
## 4           0.1           0.0               0.0           0
## 5           0.1           0.1               0.0           5
## 6           3.5           2.0               0.1          15
##   Total.Carbohydrates..g. Cholesterol..mg. Dietary.Fibre..g. Sugars..g.
## 1                       5                0                 0          0
## 2                      10                0                 0          0
## 3                      10                0                 0          0
## 4                      10                0                 0          0
## 5                      75               10                 0          9
## 6                      85               10                 0          9
##   Protein..g. Vitamin.A....DV. Vitamin.C....DV. Calcium....DV. Iron....DV.
## 1         0.3              0.0                0           0.00           0
## 2         0.5              0.0                0           0.00           0
## 3         1.0              0.0                0           0.00           0
## 4         1.0              0.0                0           0.02           0
## 5         6.0              0.1                0           0.20           0
## 6         6.0              0.1                0           0.20           0
##   Caffeine..mg.
## 1           175
## 2           260
## 3           330
## 4           410
## 5            75
## 6            75

dim(SBUX_Data)

## [1] 242  18

Cleaning and Pre-Processing the Data Set

In order to manipulate the data using various algorithms, we must first delete unecessary variables and observations. Furthermore, for the sake of efficiency, we will rename the variable headers so as to make it easier to call specific varaibles in our code.

###Deletion###
SBUX_Data$Beverage<-NULL
SBUX_Data$Beverage_prep<-NULL
dim(SBUX_Data)

## [1] 242  16

###Given that we only needed one 'factor' variable for our analysis we decided to delete the above two variables.

###Renaming Headers###
SBUX_Data$Total_Fat<-SBUX_Data$Total.Fat..g.
SBUX_Data$Trans_Fat<-SBUX_Data$Trans.Fat..g.
SBUX_Data$Saturated_Fat<-SBUX_Data$Saturated.Fat..g.
SBUX_Data$Sodium<-SBUX_Data$Sodium..mg.
SBUX_Data$Carbs<-SBUX_Data$Total.Carbohydrates..g.
SBUX_Data$Cholesterol<-SBUX_Data$Cholesterol..mg.
SBUX_Data$Fibre<-SBUX_Data$Dietary.Fibre..g.
SBUX_Data$Sugars<-SBUX_Data$Sugars..g.
SBUX_Data$Protein<-SBUX_Data$Protein..g.
SBUX_Data$Vitamin_A<-SBUX_Data$Vitamin.A....DV.
SBUX_Data$Vitamin_C<-SBUX_Data$Vitamin.C....DV.
SBUX_Data$Calcium<-SBUX_Data$Calcium....DV.
SBUX_Data$Iron<-SBUX_Data$Iron....DV.
SBUX_Data$Caffeine<-SBUX_Data$Caffeine..mg.

dim(SBUX_Data)

## [1] 242  30

###Renaming the headers duplicated the variable so now we have to delete the columns with the old headers.

###Deleting all the original columns that we renamed above
SBUX_Data$Total.Fat..g.<-NULL
SBUX_Data$Trans.Fat..g.<-NULL
SBUX_Data$Saturated.Fat..g.<-NULL
SBUX_Data$Sodium..mg.<-NULL
SBUX_Data$Total.Carbohydrates..g.<-NULL
SBUX_Data$Cholesterol..mg.<-NULL
SBUX_Data$Dietary.Fibre..g.<-NULL
SBUX_Data$Sugars..g.<-NULL
SBUX_Data$Protein..g.<-NULL
SBUX_Data$Vitamin.A....DV.<-NULL
SBUX_Data$Vitamin.C....DV.<-NULL
SBUX_Data$Calcium....DV.<-NULL
SBUX_Data$Iron....DV.<-NULL
SBUX_Data$Caffeine..mg.<-NULL
dim(SBUX_Data)

## [1] 242  16

###Confirming our new data set
head(SBUX_Data)

##         Beverage_category Calories Total_Fat Trans_Fat Saturated_Fat
## 1                  Coffee        3       0.1       0.0           0.0
## 2                  Coffee        4       0.1       0.0           0.0
## 3                  Coffee        5       0.1       0.0           0.0
## 4                  Coffee        5       0.1       0.0           0.0
## 5 Classic Espresso Drinks       70       0.1       0.1           0.0
## 6 Classic Espresso Drinks      100       3.5       2.0           0.1
##   Sodium Carbs Cholesterol Fibre Sugars Protein Vitamin_A Vitamin_C
## 1      0     5           0     0      0     0.3       0.0         0
## 2      0    10           0     0      0     0.5       0.0         0
## 3      0    10           0     0      0     1.0       0.0         0
## 4      0    10           0     0      0     1.0       0.0         0
## 5      5    75          10     0      9     6.0       0.1         0
## 6     15    85          10     0      9     6.0       0.1         0
##   Calcium Iron Caffeine
## 1    0.00    0      175
## 2    0.00    0      260
## 3    0.00    0      330
## 4    0.02    0      410
## 5    0.20    0       75
## 6    0.20    0       75

###Compared to the original data set, we now have a cleaned set ready to be partitioned.

Data Partitioning

Below we will be partitioning our cleaned data set so as to allow for regularization and validation of the models we will build.

###Creates vector that randomly samples from Sugar Column
Row_Train<-createDataPartition(SBUX_Data$Sugars, p=.70, list = FALSE) 
###Creating the Training Set
Train_SBUX<-SBUX_Data[Row_Train,] 
###Stores Row-Train in a training set
dim(Train_SBUX)

## [1] 171  16

###Creating the Holdout set (30% of the cleaned data set)
Holdout_SBUX<-SBUX_Data[-Row_Train,]
dim(Holdout_SBUX)

## [1] 71 16

###Partitioning the 30% data from the holdout data set to create a validation and test set
Validation_Interval<-createDataPartition(y=Holdout_SBUX$Sugars, p=.50, list = FALSE) 
Val_SBUX<-Holdout_SBUX[Validation_Interval,] 
dim(Val_SBUX)

## [1] 36 16

###Validation Set: stores 50% of the data pulled from the holdout data into our validation set

###Test Set
Test_SBUX<-Holdout_SBUX[-Validation_Interval,] 
dim(Test_SBUX)

## [1] 35 16

###stores the other 50% of the data pulled from the holdout data, creating our test set

Linear Regression Models

Now we will run several linear regression models where sugars (grams) will be the dependent variable. In other words, we are looking to see what model best explains the variations in sugar content for items on the Starbucks Drink Menu.

###Regression #1
Model_1<-lm(Sugars~Calories+Total_Fat+Trans_Fat+Saturated_Fat+Sodium+Carbs+Cholesterol+Fibre+Protein+Vitamin_A+Vitamin_C+Calcium+Iron+Caffeine, Train_SBUX)
summary(Model_1)

## 
## Call:
## lm(formula = Sugars ~ Calories + Total_Fat + Trans_Fat + Saturated_Fat + 
##     Sodium + Carbs + Cholesterol + Fibre + Protein + Vitamin_A + 
##     Vitamin_C + Calcium + Iron + Caffeine, data = Train_SBUX)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9427 -0.5494 -0.0476  0.5464  3.8558 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -0.060401   0.193071  -0.313  0.75482    
## Calories       0.049534   0.015906   3.114  0.00219 ** 
## Total_Fat     -1.060619   0.201247  -5.270 4.47e-07 ***
## Trans_Fat      0.691896   0.166270   4.161 5.21e-05 ***
## Saturated_Fat  6.873646   3.618059   1.900  0.05930 .  
## Sodium         0.018111   0.035287   0.513  0.60850    
## Carbs          0.003359   0.001655   2.030  0.04408 *  
## Cholesterol    0.786661   0.062293  12.628  < 2e-16 ***
## Fibre         -0.973950   0.216018  -4.509 1.27e-05 ***
## Protein       -0.853339   0.097704  -8.734 3.59e-15 ***
## Vitamin_A     -2.654221   1.980987  -1.340  0.18224    
## Vitamin_C      1.546975   1.000513   1.546  0.12409    
## Calcium       18.050381   2.901700   6.221 4.35e-09 ***
## Iron          -2.192690   1.887465  -1.162  0.24713    
## Caffeine      -0.002025   0.001171  -1.728  0.08591 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9339 on 156 degrees of freedom
## Multiple R-squared:  0.998,  Adjusted R-squared:  0.9978 
## F-statistic:  5466 on 14 and 156 DF,  p-value: < 2.2e-16

###R-squared=.998
###left out beverage category bc its categorical
###Calories significant @ the .01 level
###Total_Fat & Trans_Fat signifcant @ the .001 level
###Carbs significant @ the .05 level
###Cholesterol & Fibre & Protein & Calcium signifiant @ the .001 level

##Refined Model
Model_2<-lm(Sugars~Calories+Total_Fat+Trans_Fat+Carbs+Cholesterol+Fibre+Protein+Calcium, Train_SBUX)
summary(Model_2) ###R-squared=.9973

## 
## Call:
## lm(formula = Sugars ~ Calories + Total_Fat + Trans_Fat + Carbs + 
##     Cholesterol + Fibre + Protein + Calcium, data = Train_SBUX)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.2379 -0.5144  0.0822  0.4502  5.1405 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.1855653  0.1929857  -0.962 0.337709    
## Calories     0.0689675  0.0123318   5.593 9.32e-08 ***
## Total_Fat   -1.0208702  0.1708485  -5.975 1.41e-08 ***
## Trans_Fat    0.6673095  0.1829494   3.648 0.000357 ***
## Carbs        0.0008673  0.0017518   0.495 0.621204    
## Cholesterol  0.7156455  0.0485084  14.753  < 2e-16 ***
## Fibre       -1.3930039  0.1614641  -8.627 5.53e-15 ***
## Protein     -0.6727513  0.0903477  -7.446 5.38e-12 ***
## Calcium      8.7572744  2.4723594   3.542 0.000519 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.058 on 162 degrees of freedom
## Multiple R-squared:  0.9973, Adjusted R-squared:  0.9972 
## F-statistic:  7453 on 8 and 162 DF,  p-value: < 2.2e-16

###all variables are significant @ the .001 level except for Carbs

##Refined Model 2
Model_3<-lm(Sugars~Calories+Total_Fat+Trans_Fat+Cholesterol+Fibre+Protein+Calcium, Train_SBUX)
summary(Model_3) ###R-squared=.9973

## 
## Call:
## lm(formula = Sugars ~ Calories + Total_Fat + Trans_Fat + Cholesterol + 
##     Fibre + Protein + Calcium, data = Train_SBUX)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.3937 -0.5017  0.0818  0.4861  5.0824 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.18411    0.19252  -0.956 0.340328    
## Calories     0.07003    0.01212   5.780 3.71e-08 ***
## Total_Fat   -1.03773    0.16703  -6.213 4.18e-09 ***
## Trans_Fat    0.68529    0.17889   3.831 0.000182 ***
## Cholesterol  0.71386    0.04826  14.791  < 2e-16 ***
## Fibre       -1.38765    0.16073  -8.634 5.16e-15 ***
## Protein     -0.67938    0.08914  -7.621 1.95e-12 ***
## Calcium      8.93430    2.44070   3.661 0.000340 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.055 on 163 degrees of freedom
## Multiple R-squared:  0.9973, Adjusted R-squared:  0.9972 
## F-statistic:  8558 on 7 and 163 DF,  p-value: < 2.2e-16

##Refined Model 3
Model_4<-lm(Sugars~Calories+Total_Fat+Trans_Fat+Cholesterol, Train_SBUX)
summary(Model_4) ###R-Squared = .9712

## 
## Call:
## lm(formula = Sugars ~ Calories + Total_Fat + Trans_Fat + Cholesterol, 
##     data = Train_SBUX)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.1589  -0.6712   0.7179   1.8484   7.1784 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.95001    0.59819   1.588  0.11416    
## Calories    -0.04912    0.01760  -2.791  0.00588 ** 
## Total_Fat   -0.15619    0.30382  -0.514  0.60787    
## Trans_Fat    1.07691    0.37517   2.870  0.00463 ** 
## Cholesterol  1.12713    0.07180  15.699  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.404 on 166 degrees of freedom
## Multiple R-squared:  0.9712, Adjusted R-squared:  0.9705 
## F-statistic:  1401 on 4 and 166 DF,  p-value: < 2.2e-16

###going to add back in Fibre, Protein, and Calcium and going to remove Total_Fat, Trans_Fat & Calories

##Refined Model 4
Model_5<-lm(Sugars~Cholesterol+Fibre+Protein+Calcium, Train_SBUX)
summary(Model_5) ##R-Squared = .9966

## 
## Call:
## lm(formula = Sugars ~ Cholesterol + Fibre + Protein + Calcium, 
##     data = Train_SBUX)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.3553 -0.5608  0.0685  0.5227  6.2126 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.19990    0.19704  -1.015   0.3118    
## Cholesterol  0.99365    0.00475 209.183  < 2e-16 ***
## Fibre       -1.97531    0.11325 -17.442  < 2e-16 ***
## Protein     -0.27442    0.05914  -4.640 7.04e-06 ***
## Calcium      4.12106    1.73566   2.374   0.0187 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.17 on 166 degrees of freedom
## Multiple R-squared:  0.9966, Adjusted R-squared:  0.9965 
## F-statistic: 1.217e+04 on 4 and 166 DF,  p-value: < 2.2e-16

###Going to revise again and take out Calcium (Significant @ the .05 level) & add back calories 

##Refined Model 5
Model_6<-lm(Sugars~Cholesterol+Fibre+Protein+ Calories, Train_SBUX)
summary(Model_6) ###R-Squared=.9965

## 
## Call:
## lm(formula = Sugars ~ Cholesterol + Fibre + Protein + Calories, 
##     data = Train_SBUX)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.3547 -0.4882  0.0859  0.5290  6.0513 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.141873   0.198191  -0.716    0.475    
## Cholesterol  0.965883   0.018700  51.653  < 2e-16 ***
## Fibre       -2.130770   0.080511 -26.466  < 2e-16 ***
## Protein     -0.189444   0.036012  -5.261 4.38e-07 ***
## Calories     0.006731   0.004196   1.604    0.111    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.181 on 166 degrees of freedom
## Multiple R-squared:  0.9965, Adjusted R-squared:  0.9965 
## F-statistic: 1.195e+04 on 4 and 166 DF,  p-value: < 2.2e-16

##Refined Model 6
Model_7<-lm(Sugars~Cholesterol+Fibre+Protein+Total_Fat+Trans_Fat, Train_SBUX)
summary(Model_7) ###R-Squared=.9965

## 
## Call:
## lm(formula = Sugars ~ Cholesterol + Fibre + Protein + Total_Fat + 
##     Trans_Fat, data = Train_SBUX)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5032 -0.5277  0.0765  0.4953  6.3331 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.181411   0.215063  -0.844    0.400    
## Cholesterol  0.995494   0.005366 185.511  < 2e-16 ***
## Fibre       -2.187983   0.081768 -26.759  < 2e-16 ***
## Protein     -0.142006   0.026962  -5.267 4.28e-07 ***
## Total_Fat    0.017771   0.076874   0.231    0.817    
## Trans_Fat   -0.042968   0.139006  -0.309    0.758    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.193 on 165 degrees of freedom
## Multiple R-squared:  0.9965, Adjusted R-squared:  0.9964 
## F-statistic:  9366 on 5 and 165 DF,  p-value: < 2.2e-16

##Refined Model 7
Model_8<-lm(Sugars~I(Fibre^2)+Fibre+Cholesterol+Protein+Calcium, Train_SBUX)
summary(Model_8) ###R-Squared=.9967

## 
## Call:
## lm(formula = Sugars ~ I(Fibre^2) + Fibre + Cholesterol + Protein + 
##     Calcium, data = Train_SBUX)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.4897 -0.5659  0.0341  0.4942  6.4665 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.217999   0.195414  -1.116  0.26623    
## I(Fibre^2)   0.073778   0.036316   2.032  0.04380 *  
## Fibre       -2.277312   0.186246 -12.227  < 2e-16 ***
## Cholesterol  0.995824   0.004827 206.319  < 2e-16 ***
## Protein     -0.378106   0.077705  -4.866 2.64e-06 ***
## Calcium      7.478941   2.385104   3.136  0.00203 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.159 on 165 degrees of freedom
## Multiple R-squared:  0.9967, Adjusted R-squared:  0.9966 
## F-statistic:  9923 on 5 and 165 DF,  p-value: < 2.2e-16

###Will validate Model 1,2,3,5 because they have the highest R-Squared values

Validation of Linear Regression Models

We will use our partitioned data to attempt to estimate out of sample error by running the models against our holdout data set.

###Validating Model_1
pred_values<-predict(Model_1, Holdout_SBUX)

#compute in-sample error
E_IN<-sqrt(sum((Model_1$fitted.values-Train_SBUX$Sugars)^2)/(length(Model_1$residuals)-3))
E_IN ##Returns .8999143

## [1] 0.8999143

E_OUT<-sqrt(sum((pred_values-Holdout_SBUX$Sugars)^2)/(length(pred_values)-3))
E_OUT ##Returns 1.063469

## [1] 1.063469

###Validating Model_2
pred_values<-predict(Model_2, Holdout_SBUX)

#compute in-sample error
E_IN<-sqrt(sum((Model_2$fitted.values-Train_SBUX$Sugars)^2)/(length(Model_2$residuals)-3))
E_IN ##Returns 1.038591

## [1] 1.038591

E_OUT<-sqrt(sum((pred_values-Holdout_SBUX$Sugars)^2)/(length(pred_values)-3))
E_OUT ##Returns 1.281684... Model 1, although it is more complicated, returns a lower out of sample error

## [1] 1.281684

###Validating Model_3
pred_values<-predict(Model_3, Holdout_SBUX)

#compute in-sample error
E_IN<-sqrt(sum((Model_3$fitted.values-Train_SBUX$Sugars)^2)/(length(Model_3$residuals)-3))
E_IN ##Returns 1.039376

## [1] 1.039376

E_OUT<-sqrt(sum((pred_values-Holdout_SBUX$Sugars)^2)/(length(pred_values)-3))
E_OUT ##Returns 1.290094

## [1] 1.290094

###Validating Model_5
pred_values<-predict(Model_5, Holdout_SBUX)

#compute in-sample error
E_IN<-sqrt(sum((Model_5$fitted.values-Train_SBUX$Sugars)^2)/(length(Model_5$residuals)-3))
E_IN ##Returns 1.163006

## [1] 1.163006

E_OUT<-sqrt(sum((pred_values-Holdout_SBUX$Sugars)^2)/(length(pred_values)-3))
E_OUT ##Returns 1.563584

## [1] 1.563584

##conclusionary comments: Model_1 has the lowest E_Out value, followed by models 2, 3, 5 in that order

As of right now, Model_1 is the most efficient and is the chosen as the best candidate for regression. While it is our most complex model, it returns the lowest out of sample error when tested against the holdout data set. In the case of sugar in Starbucks drinks, the other variables are all significant enough to have to include in our regression model. When compared to the other models we tested, the out of sample error begins to increase as variables are removed. It follows that we will run the model against our Test_SBUX data–partitioned earlier–so as to get the most accurate prediction of the models out-of-sample error.

pred_values<-predict(Model_1, Test_SBUX)
E_OUT<-sqrt(sum((pred_values-Test_SBUX$Sugars)^2)/(length(pred_values)-3))
E_OUT

## [1] 0.958966

What we see is that Model_1, when ran against the Test_SBUX data, returns an E_Out value of .9589 which is exceptionally low. Therefore, Model_1 is our best linear regression model when trying to explain variations in the amount of sugar in a Starbucks menu item.

Algorithms

We will now be imlementing CART, Random Forests, and SVM models for our predictive analysis in an attempt to classify which beverage category a set of observations is placed into.

CART Implementation

SBUX_Data$Beverage_category<-factor(SBUX_Data$Beverage_category)
##sets our classification variable to a factor variable
class(Test_SBUX$Beverage_category)

## [1] "factor"

CART_Model <- train(Beverage_category ~ ., data = Train_SBUX, method = "rpart",
                trControl = trainControl("cv", number = 10),
                tuneLength = 10) #increasing tunelength increases regularization penalty
##the "cv", number = 10 refers to 10-fold cross validation on the training data
plot(CART_Model) #produces plot of cross-validation results

CART_Model$bestTune #returns optimal complexity parameter

##   cp
## 1  0

confusionMatrix(predict(CART_Model, Test_SBUX), Test_SBUX$Beverage_category) ##Validation

## Confusion Matrix and Statistics
## 
##                                    Reference
## Prediction                          Classic Espresso Drinks Coffee
##   Classic Espresso Drinks                                 8      0
##   Coffee                                                  0      0
##   Frappuccino® Blended Coffee                             0      0
##   Frappuccino® Blended Crème                              0      0
##   Frappuccino® Light Blended Coffee                       0      0
##   Shaken Iced Beverages                                   0      0
##   Signature Espresso Drinks                               2      0
##   Smoothies                                               0      0
##   Tazo® Tea Drinks                                        0      0
##                                    Reference
## Prediction                          Frappuccino® Blended Coffee
##   Classic Espresso Drinks                                     0
##   Coffee                                                      0
##   Frappuccino® Blended Coffee                                 5
##   Frappuccino® Blended Crème                                  0
##   Frappuccino® Light Blended Coffee                           0
##   Shaken Iced Beverages                                       0
##   Signature Espresso Drinks                                   0
##   Smoothies                                                   0
##   Tazo® Tea Drinks                                            1
##                                    Reference
## Prediction                          Frappuccino® Blended Crème
##   Classic Espresso Drinks                                    0
##   Coffee                                                     0
##   Frappuccino® Blended Coffee                                0
##   Frappuccino® Blended Crème                                 0
##   Frappuccino® Light Blended Coffee                          0
##   Shaken Iced Beverages                                      0
##   Signature Espresso Drinks                                  0
##   Smoothies                                                  1
##   Tazo® Tea Drinks                                           0
##                                    Reference
## Prediction                          Frappuccino® Light Blended Coffee
##   Classic Espresso Drinks                                           0
##   Coffee                                                            0
##   Frappuccino® Blended Coffee                                       0
##   Frappuccino® Blended Crème                                        0
##   Frappuccino® Light Blended Coffee                                 0
##   Shaken Iced Beverages                                             3
##   Signature Espresso Drinks                                         0
##   Smoothies                                                         0
##   Tazo® Tea Drinks                                                  0
##                                    Reference
## Prediction                          Shaken Iced Beverages
##   Classic Espresso Drinks                               0
##   Coffee                                                0
##   Frappuccino® Blended Coffee                           1
##   Frappuccino® Blended Crème                            0
##   Frappuccino® Light Blended Coffee                     0
##   Shaken Iced Beverages                                 1
##   Signature Espresso Drinks                             0
##   Smoothies                                             0
##   Tazo® Tea Drinks                                      0
##                                    Reference
## Prediction                          Signature Espresso Drinks Smoothies
##   Classic Espresso Drinks                                   3         0
##   Coffee                                                    0         0
##   Frappuccino® Blended Coffee                               0         0
##   Frappuccino® Blended Crème                                0         0
##   Frappuccino® Light Blended Coffee                         0         0
##   Shaken Iced Beverages                                     0         0
##   Signature Espresso Drinks                                 1         0
##   Smoothies                                                 1         1
##   Tazo® Tea Drinks                                          0         0
##                                    Reference
## Prediction                          Tazo® Tea Drinks
##   Classic Espresso Drinks                          0
##   Coffee                                           0
##   Frappuccino® Blended Coffee                      0
##   Frappuccino® Blended Crème                       0
##   Frappuccino® Light Blended Coffee                0
##   Shaken Iced Beverages                            0
##   Signature Espresso Drinks                        2
##   Smoothies                                        0
##   Tazo® Tea Drinks                                 5
## 
## Overall Statistics
##                                           
##                Accuracy : 0.6             
##                  95% CI : (0.4211, 0.7613)
##     No Information Rate : 0.2857          
##     P-Value [Acc > NIR] : 0.0001039       
##                                           
##                   Kappa : 0.5105          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: Classic Espresso Drinks Class: Coffee
## Sensitivity                                  0.8000            NA
## Specificity                                  0.8800             1
## Pos Pred Value                               0.7273            NA
## Neg Pred Value                               0.9167            NA
## Prevalence                                   0.2857             0
## Detection Rate                               0.2286             0
## Detection Prevalence                         0.3143             0
## Balanced Accuracy                            0.8400            NA
##                      Class: Frappuccino® Blended Coffee
## Sensitivity                                      0.8333
## Specificity                                      0.9655
## Pos Pred Value                                   0.8333
## Neg Pred Value                                   0.9655
## Prevalence                                       0.1714
## Detection Rate                                   0.1429
## Detection Prevalence                             0.1714
## Balanced Accuracy                                0.8994
##                      Class: Frappuccino® Blended Crème
## Sensitivity                                    0.00000
## Specificity                                    1.00000
## Pos Pred Value                                     NaN
## Neg Pred Value                                 0.97143
## Prevalence                                     0.02857
## Detection Rate                                 0.00000
## Detection Prevalence                           0.00000
## Balanced Accuracy                              0.50000
##                      Class: Frappuccino® Light Blended Coffee
## Sensitivity                                           0.00000
## Specificity                                           1.00000
## Pos Pred Value                                            NaN
## Neg Pred Value                                        0.91429
## Prevalence                                            0.08571
## Detection Rate                                        0.00000
## Detection Prevalence                                  0.00000
## Balanced Accuracy                                     0.50000
##                      Class: Shaken Iced Beverages
## Sensitivity                               0.50000
## Specificity                               0.90909
## Pos Pred Value                            0.25000
## Neg Pred Value                            0.96774
## Prevalence                                0.05714
## Detection Rate                            0.02857
## Detection Prevalence                      0.11429
## Balanced Accuracy                         0.70455
##                      Class: Signature Espresso Drinks Class: Smoothies
## Sensitivity                                   0.20000          1.00000
## Specificity                                   0.86667          0.94118
## Pos Pred Value                                0.20000          0.33333
## Neg Pred Value                                0.86667          1.00000
## Prevalence                                    0.14286          0.02857
## Detection Rate                                0.02857          0.02857
## Detection Prevalence                          0.14286          0.08571
## Balanced Accuracy                             0.53333          0.97059
##                      Class: Tazo® Tea Drinks
## Sensitivity                           0.7143
## Specificity                           0.9643
## Pos Pred Value                        0.8333
## Neg Pred Value                        0.9310
## Prevalence                            0.2000
## Detection Rate                        0.1429
## Detection Prevalence                  0.1714
## Balanced Accuracy                     0.8393

par(xpd=NA)
plot(CART_Model$finalModel)
text(CART_Model$finalModel, digits = 3)

###above command creates a decision tree for the CART_Model

The above confusion matrix gives us an accuracy output of .6 meaning that 60% of the time the CART model places the data into the right beverage category.

Random Forest Implementation

#caret package implementation with 3-fold cross validation
Forest_Model <- train(Beverage_category ~ ., method="rf", trControl=trainControl(method = "cv", number = 3), preProcess=c("center", "scale"), data=Train_SBUX)
print(Forest_Model)

## Random Forest 
## 
## 171 samples
##  15 predictor
##   9 classes: 'Classic Espresso Drinks', 'Coffee', 'Frappuccino® Blended Coffee', 'Frappuccino® Blended Crème', 'Frappuccino® Light Blended Coffee', 'Shaken Iced Beverages', 'Signature Espresso Drinks', 'Smoothies', 'Tazo® Tea Drinks' 
## 
## Pre-processing: centered (15), scaled (15) 
## Resampling: Cross-Validated (3 fold) 
## Summary of sample sizes: 115, 114, 113 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa    
##    2    0.7716778  0.7268637
##    8    0.8010256  0.7623052
##   15    0.8012272  0.7627555
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 15.

Note the above Forest_Model, ran with the caret implementation, exhibits a 80.2% accuracy when mtry=15. Now we will run it against the test data to see how well it performs against new data.

confusionMatrix(predict(Forest_Model, Test_SBUX), Test_SBUX$Beverage_category)

## Confusion Matrix and Statistics
## 
##                                    Reference
## Prediction                          Classic Espresso Drinks Coffee
##   Classic Espresso Drinks                                 8      0
##   Coffee                                                  0      0
##   Frappuccino® Blended Coffee                             0      0
##   Frappuccino® Blended Crème                              0      0
##   Frappuccino® Light Blended Coffee                       0      0
##   Shaken Iced Beverages                                   0      0
##   Signature Espresso Drinks                               2      0
##   Smoothies                                               0      0
##   Tazo® Tea Drinks                                        0      0
##                                    Reference
## Prediction                          Frappuccino® Blended Coffee
##   Classic Espresso Drinks                                     0
##   Coffee                                                      0
##   Frappuccino® Blended Coffee                                 5
##   Frappuccino® Blended Crème                                  1
##   Frappuccino® Light Blended Coffee                           0
##   Shaken Iced Beverages                                       0
##   Signature Espresso Drinks                                   0
##   Smoothies                                                   0
##   Tazo® Tea Drinks                                            0
##                                    Reference
## Prediction                          Frappuccino® Blended Crème
##   Classic Espresso Drinks                                    0
##   Coffee                                                     0
##   Frappuccino® Blended Coffee                                0
##   Frappuccino® Blended Crème                                 1
##   Frappuccino® Light Blended Coffee                          0
##   Shaken Iced Beverages                                      0
##   Signature Espresso Drinks                                  0
##   Smoothies                                                  0
##   Tazo® Tea Drinks                                           0
##                                    Reference
## Prediction                          Frappuccino® Light Blended Coffee
##   Classic Espresso Drinks                                           0
##   Coffee                                                            0
##   Frappuccino® Blended Coffee                                       0
##   Frappuccino® Blended Crème                                        0
##   Frappuccino® Light Blended Coffee                                 3
##   Shaken Iced Beverages                                             0
##   Signature Espresso Drinks                                         0
##   Smoothies                                                         0
##   Tazo® Tea Drinks                                                  0
##                                    Reference
## Prediction                          Shaken Iced Beverages
##   Classic Espresso Drinks                               0
##   Coffee                                                0
##   Frappuccino® Blended Coffee                           0
##   Frappuccino® Blended Crème                            0
##   Frappuccino® Light Blended Coffee                     0
##   Shaken Iced Beverages                                 2
##   Signature Espresso Drinks                             0
##   Smoothies                                             0
##   Tazo® Tea Drinks                                      0
##                                    Reference
## Prediction                          Signature Espresso Drinks Smoothies
##   Classic Espresso Drinks                                   3         0
##   Coffee                                                    0         0
##   Frappuccino® Blended Coffee                               0         0
##   Frappuccino® Blended Crème                                0         0
##   Frappuccino® Light Blended Coffee                         0         0
##   Shaken Iced Beverages                                     0         0
##   Signature Espresso Drinks                                 2         0
##   Smoothies                                                 0         1
##   Tazo® Tea Drinks                                          0         0
##                                    Reference
## Prediction                          Tazo® Tea Drinks
##   Classic Espresso Drinks                          0
##   Coffee                                           0
##   Frappuccino® Blended Coffee                      0
##   Frappuccino® Blended Crème                       0
##   Frappuccino® Light Blended Coffee                0
##   Shaken Iced Beverages                            0
##   Signature Espresso Drinks                        0
##   Smoothies                                        0
##   Tazo® Tea Drinks                                 7
## 
## Overall Statistics
##                                           
##                Accuracy : 0.8286          
##                  95% CI : (0.6635, 0.9344)
##     No Information Rate : 0.2857          
##     P-Value [Acc > NIR] : 3.901e-11       
##                                           
##                   Kappa : 0.79            
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: Classic Espresso Drinks Class: Coffee
## Sensitivity                                  0.8000            NA
## Specificity                                  0.8800             1
## Pos Pred Value                               0.7273            NA
## Neg Pred Value                               0.9167            NA
## Prevalence                                   0.2857             0
## Detection Rate                               0.2286             0
## Detection Prevalence                         0.3143             0
## Balanced Accuracy                            0.8400            NA
##                      Class: Frappuccino® Blended Coffee
## Sensitivity                                      0.8333
## Specificity                                      1.0000
## Pos Pred Value                                   1.0000
## Neg Pred Value                                   0.9667
## Prevalence                                       0.1714
## Detection Rate                                   0.1429
## Detection Prevalence                             0.1429
## Balanced Accuracy                                0.9167
##                      Class: Frappuccino® Blended Crème
## Sensitivity                                    1.00000
## Specificity                                    0.97059
## Pos Pred Value                                 0.50000
## Neg Pred Value                                 1.00000
## Prevalence                                     0.02857
## Detection Rate                                 0.02857
## Detection Prevalence                           0.05714
## Balanced Accuracy                              0.98529
##                      Class: Frappuccino® Light Blended Coffee
## Sensitivity                                           1.00000
## Specificity                                           1.00000
## Pos Pred Value                                        1.00000
## Neg Pred Value                                        1.00000
## Prevalence                                            0.08571
## Detection Rate                                        0.08571
## Detection Prevalence                                  0.08571
## Balanced Accuracy                                     1.00000
##                      Class: Shaken Iced Beverages
## Sensitivity                               1.00000
## Specificity                               1.00000
## Pos Pred Value                            1.00000
## Neg Pred Value                            1.00000
## Prevalence                                0.05714
## Detection Rate                            0.05714
## Detection Prevalence                      0.05714
## Balanced Accuracy                         1.00000
##                      Class: Signature Espresso Drinks Class: Smoothies
## Sensitivity                                   0.40000          1.00000
## Specificity                                   0.93333          1.00000
## Pos Pred Value                                0.50000          1.00000
## Neg Pred Value                                0.90323          1.00000
## Prevalence                                    0.14286          0.02857
## Detection Rate                                0.05714          0.02857
## Detection Prevalence                          0.11429          0.02857
## Balanced Accuracy                             0.66667          1.00000
##                      Class: Tazo® Tea Drinks
## Sensitivity                              1.0
## Specificity                              1.0
## Pos Pred Value                           1.0
## Neg Pred Value                           1.0
## Prevalence                               0.2
## Detection Rate                           0.2
## Detection Prevalence                     0.2
## Balanced Accuracy                        1.0

The accuracy of Forest_Model increases to 82.86%.

#random forest package implementation
Forest_Model_2 <- randomForest(Beverage_category ~., Train_SBUX)
print(Forest_Model_2)

## 
## Call:
##  randomForest(formula = Beverage_category ~ ., data = Train_SBUX) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 3
## 
##         OOB estimate of  error rate: 17.54%
## Confusion matrix:
##                                   Classic Espresso Drinks Coffee
## Classic Espresso Drinks                                30      2
## Coffee                                                  0      4
## Frappuccino® Blended Coffee                             0      0
## Frappuccino® Blended Crème                              0      0
## Frappuccino® Light Blended Coffee                       0      0
## Shaken Iced Beverages                                   0      0
## Signature Espresso Drinks                               8      0
## Smoothies                                               0      0
## Tazo® Tea Drinks                                        0      0
##                                   Frappuccino® Blended Coffee
## Classic Espresso Drinks                                     0
## Coffee                                                      0
## Frappuccino® Blended Coffee                                23
## Frappuccino® Blended Crème                                  0
## Frappuccino® Light Blended Coffee                           3
## Shaken Iced Beverages                                       0
## Signature Espresso Drinks                                   0
## Smoothies                                                   0
## Tazo® Tea Drinks                                            0
##                                   Frappuccino® Blended Crème
## Classic Espresso Drinks                                    0
## Coffee                                                     0
## Frappuccino® Blended Coffee                                0
## Frappuccino® Blended Crème                                 9
## Frappuccino® Light Blended Coffee                          0
## Shaken Iced Beverages                                      0
## Signature Espresso Drinks                                  0
## Smoothies                                                  0
## Tazo® Tea Drinks                                           0
##                                   Frappuccino® Light Blended Coffee
## Classic Espresso Drinks                                           0
## Coffee                                                            0
## Frappuccino® Blended Coffee                                       0
## Frappuccino® Blended Crème                                        0
## Frappuccino® Light Blended Coffee                                 3
## Shaken Iced Beverages                                             0
## Signature Espresso Drinks                                         0
## Smoothies                                                         0
## Tazo® Tea Drinks                                                  0
##                                   Shaken Iced Beverages
## Classic Espresso Drinks                               0
## Coffee                                                0
## Frappuccino® Blended Coffee                           1
## Frappuccino® Blended Crème                            0
## Frappuccino® Light Blended Coffee                     0
## Shaken Iced Beverages                                15
## Signature Espresso Drinks                             1
## Smoothies                                             0
## Tazo® Tea Drinks                                      0
##                                   Signature Espresso Drinks Smoothies
## Classic Espresso Drinks                                   8         0
## Coffee                                                    0         0
## Frappuccino® Blended Coffee                               0         0
## Frappuccino® Blended Crème                                0         0
## Frappuccino® Light Blended Coffee                         0         0
## Shaken Iced Beverages                                     1         0
## Signature Espresso Drinks                                16         0
## Smoothies                                                 0         7
## Tazo® Tea Drinks                                          3         0
##                                   Tazo® Tea Drinks class.error
## Classic Espresso Drinks                          0  0.25000000
## Coffee                                           0  0.00000000
## Frappuccino® Blended Coffee                      0  0.04166667
## Frappuccino® Blended Crème                       0  0.00000000
## Frappuccino® Light Blended Coffee                1  0.57142857
## Shaken Iced Beverages                            0  0.06250000
## Signature Espresso Drinks                        2  0.40740741
## Smoothies                                        0  0.00000000
## Tazo® Tea Drinks                                34  0.08108108

The OOB (Out of Box) implementation provides an error rate of 17.54%, implying model accuracy of 82.46%. Now we will run Forest_Model_2 against the testing set.

confusionMatrix(predict(Forest_Model_2, Test_SBUX), Test_SBUX$Beverage_category)

## Confusion Matrix and Statistics
## 
##                                    Reference
## Prediction                          Classic Espresso Drinks Coffee
##   Classic Espresso Drinks                                 8      0
##   Coffee                                                  0      0
##   Frappuccino® Blended Coffee                             0      0
##   Frappuccino® Blended Crème                              0      0
##   Frappuccino® Light Blended Coffee                       0      0
##   Shaken Iced Beverages                                   0      0
##   Signature Espresso Drinks                               2      0
##   Smoothies                                               0      0
##   Tazo® Tea Drinks                                        0      0
##                                    Reference
## Prediction                          Frappuccino® Blended Coffee
##   Classic Espresso Drinks                                     0
##   Coffee                                                      0
##   Frappuccino® Blended Coffee                                 5
##   Frappuccino® Blended Crème                                  1
##   Frappuccino® Light Blended Coffee                           0
##   Shaken Iced Beverages                                       0
##   Signature Espresso Drinks                                   0
##   Smoothies                                                   0
##   Tazo® Tea Drinks                                            0
##                                    Reference
## Prediction                          Frappuccino® Blended Crème
##   Classic Espresso Drinks                                    0
##   Coffee                                                     0
##   Frappuccino® Blended Coffee                                0
##   Frappuccino® Blended Crème                                 1
##   Frappuccino® Light Blended Coffee                          0
##   Shaken Iced Beverages                                      0
##   Signature Espresso Drinks                                  0
##   Smoothies                                                  0
##   Tazo® Tea Drinks                                           0
##                                    Reference
## Prediction                          Frappuccino® Light Blended Coffee
##   Classic Espresso Drinks                                           0
##   Coffee                                                            0
##   Frappuccino® Blended Coffee                                       0
##   Frappuccino® Blended Crème                                        0
##   Frappuccino® Light Blended Coffee                                 3
##   Shaken Iced Beverages                                             0
##   Signature Espresso Drinks                                         0
##   Smoothies                                                         0
##   Tazo® Tea Drinks                                                  0
##                                    Reference
## Prediction                          Shaken Iced Beverages
##   Classic Espresso Drinks                               0
##   Coffee                                                0
##   Frappuccino® Blended Coffee                           0
##   Frappuccino® Blended Crème                            0
##   Frappuccino® Light Blended Coffee                     0
##   Shaken Iced Beverages                                 2
##   Signature Espresso Drinks                             0
##   Smoothies                                             0
##   Tazo® Tea Drinks                                      0
##                                    Reference
## Prediction                          Signature Espresso Drinks Smoothies
##   Classic Espresso Drinks                                   3         0
##   Coffee                                                    0         0
##   Frappuccino® Blended Coffee                               0         0
##   Frappuccino® Blended Crème                                0         0
##   Frappuccino® Light Blended Coffee                         0         0
##   Shaken Iced Beverages                                     0         0
##   Signature Espresso Drinks                                 2         0
##   Smoothies                                                 0         1
##   Tazo® Tea Drinks                                          0         0
##                                    Reference
## Prediction                          Tazo® Tea Drinks
##   Classic Espresso Drinks                          0
##   Coffee                                           0
##   Frappuccino® Blended Coffee                      0
##   Frappuccino® Blended Crème                       0
##   Frappuccino® Light Blended Coffee                0
##   Shaken Iced Beverages                            0
##   Signature Espresso Drinks                        1
##   Smoothies                                        0
##   Tazo® Tea Drinks                                 6
## 
## Overall Statistics
##                                           
##                Accuracy : 0.8             
##                  95% CI : (0.6306, 0.9156)
##     No Information Rate : 0.2857          
##     P-Value [Acc > NIR] : 4.113e-10       
##                                           
##                   Kappa : 0.7555          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: Classic Espresso Drinks Class: Coffee
## Sensitivity                                  0.8000            NA
## Specificity                                  0.8800             1
## Pos Pred Value                               0.7273            NA
## Neg Pred Value                               0.9167            NA
## Prevalence                                   0.2857             0
## Detection Rate                               0.2286             0
## Detection Prevalence                         0.3143             0
## Balanced Accuracy                            0.8400            NA
##                      Class: Frappuccino® Blended Coffee
## Sensitivity                                      0.8333
## Specificity                                      1.0000
## Pos Pred Value                                   1.0000
## Neg Pred Value                                   0.9667
## Prevalence                                       0.1714
## Detection Rate                                   0.1429
## Detection Prevalence                             0.1429
## Balanced Accuracy                                0.9167
##                      Class: Frappuccino® Blended Crème
## Sensitivity                                    1.00000
## Specificity                                    0.97059
## Pos Pred Value                                 0.50000
## Neg Pred Value                                 1.00000
## Prevalence                                     0.02857
## Detection Rate                                 0.02857
## Detection Prevalence                           0.05714
## Balanced Accuracy                              0.98529
##                      Class: Frappuccino® Light Blended Coffee
## Sensitivity                                           1.00000
## Specificity                                           1.00000
## Pos Pred Value                                        1.00000
## Neg Pred Value                                        1.00000
## Prevalence                                            0.08571
## Detection Rate                                        0.08571
## Detection Prevalence                                  0.08571
## Balanced Accuracy                                     1.00000
##                      Class: Shaken Iced Beverages
## Sensitivity                               1.00000
## Specificity                               1.00000
## Pos Pred Value                            1.00000
## Neg Pred Value                            1.00000
## Prevalence                                0.05714
## Detection Rate                            0.05714
## Detection Prevalence                      0.05714
## Balanced Accuracy                         1.00000
##                      Class: Signature Espresso Drinks Class: Smoothies
## Sensitivity                                   0.40000          1.00000
## Specificity                                   0.90000          1.00000
## Pos Pred Value                                0.40000          1.00000
## Neg Pred Value                                0.90000          1.00000
## Prevalence                                    0.14286          0.02857
## Detection Rate                                0.05714          0.02857
## Detection Prevalence                          0.14286          0.02857
## Balanced Accuracy                             0.65000          1.00000
##                      Class: Tazo® Tea Drinks
## Sensitivity                           0.8571
## Specificity                           1.0000
## Pos Pred Value                        1.0000
## Neg Pred Value                        0.9655
## Prevalence                            0.2000
## Detection Rate                        0.1714
## Detection Prevalence                  0.1714
## Balanced Accuracy                     0.9286

The accuracy of the model decreases to 80%.

Support Vector Machines Implementation

SVM1<-svm(Beverage_category~., data = Train_SBUX, cost=1000, cross = 10, gamma=.001)
confusionMatrix(predict(SVM1, Test_SBUX), Test_SBUX$Beverage_category)

## Confusion Matrix and Statistics
## 
##                                    Reference
## Prediction                          Classic Espresso Drinks Coffee
##   Classic Espresso Drinks                                 9      0
##   Coffee                                                  1      0
##   Frappuccino® Blended Coffee                             0      0
##   Frappuccino® Blended Crème                              0      0
##   Frappuccino® Light Blended Coffee                       0      0
##   Shaken Iced Beverages                                   0      0
##   Signature Espresso Drinks                               0      0
##   Smoothies                                               0      0
##   Tazo® Tea Drinks                                        0      0
##                                    Reference
## Prediction                          Frappuccino® Blended Coffee
##   Classic Espresso Drinks                                     0
##   Coffee                                                      0
##   Frappuccino® Blended Coffee                                 5
##   Frappuccino® Blended Crème                                  1
##   Frappuccino® Light Blended Coffee                           0
##   Shaken Iced Beverages                                       0
##   Signature Espresso Drinks                                   0
##   Smoothies                                                   0
##   Tazo® Tea Drinks                                            0
##                                    Reference
## Prediction                          Frappuccino® Blended Crème
##   Classic Espresso Drinks                                    0
##   Coffee                                                     0
##   Frappuccino® Blended Coffee                                0
##   Frappuccino® Blended Crème                                 1
##   Frappuccino® Light Blended Coffee                          0
##   Shaken Iced Beverages                                      0
##   Signature Espresso Drinks                                  0
##   Smoothies                                                  0
##   Tazo® Tea Drinks                                           0
##                                    Reference
## Prediction                          Frappuccino® Light Blended Coffee
##   Classic Espresso Drinks                                           0
##   Coffee                                                            0
##   Frappuccino® Blended Coffee                                       0
##   Frappuccino® Blended Crème                                        0
##   Frappuccino® Light Blended Coffee                                 3
##   Shaken Iced Beverages                                             0
##   Signature Espresso Drinks                                         0
##   Smoothies                                                         0
##   Tazo® Tea Drinks                                                  0
##                                    Reference
## Prediction                          Shaken Iced Beverages
##   Classic Espresso Drinks                               0
##   Coffee                                                0
##   Frappuccino® Blended Coffee                           0
##   Frappuccino® Blended Crème                            0
##   Frappuccino® Light Blended Coffee                     0
##   Shaken Iced Beverages                                 2
##   Signature Espresso Drinks                             0
##   Smoothies                                             0
##   Tazo® Tea Drinks                                      0
##                                    Reference
## Prediction                          Signature Espresso Drinks Smoothies
##   Classic Espresso Drinks                                   3         0
##   Coffee                                                    0         0
##   Frappuccino® Blended Coffee                               0         0
##   Frappuccino® Blended Crème                                0         0
##   Frappuccino® Light Blended Coffee                         0         0
##   Shaken Iced Beverages                                     0         0
##   Signature Espresso Drinks                                 2         0
##   Smoothies                                                 0         1
##   Tazo® Tea Drinks                                          0         0
##                                    Reference
## Prediction                          Tazo® Tea Drinks
##   Classic Espresso Drinks                          0
##   Coffee                                           0
##   Frappuccino® Blended Coffee                      0
##   Frappuccino® Blended Crème                       0
##   Frappuccino® Light Blended Coffee                0
##   Shaken Iced Beverages                            0
##   Signature Espresso Drinks                        0
##   Smoothies                                        0
##   Tazo® Tea Drinks                                 7
## 
## Overall Statistics
##                                           
##                Accuracy : 0.8571          
##                  95% CI : (0.6974, 0.9519)
##     No Information Rate : 0.2857          
##     P-Value [Acc > NIR] : 3.071e-12       
##                                           
##                   Kappa : 0.825           
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: Classic Espresso Drinks Class: Coffee
## Sensitivity                                  0.9000            NA
## Specificity                                  0.8800       0.97143
## Pos Pred Value                               0.7500            NA
## Neg Pred Value                               0.9565            NA
## Prevalence                                   0.2857       0.00000
## Detection Rate                               0.2571       0.00000
## Detection Prevalence                         0.3429       0.02857
## Balanced Accuracy                            0.8900            NA
##                      Class: Frappuccino® Blended Coffee
## Sensitivity                                      0.8333
## Specificity                                      1.0000
## Pos Pred Value                                   1.0000
## Neg Pred Value                                   0.9667
## Prevalence                                       0.1714
## Detection Rate                                   0.1429
## Detection Prevalence                             0.1429
## Balanced Accuracy                                0.9167
##                      Class: Frappuccino® Blended Crème
## Sensitivity                                    1.00000
## Specificity                                    0.97059
## Pos Pred Value                                 0.50000
## Neg Pred Value                                 1.00000
## Prevalence                                     0.02857
## Detection Rate                                 0.02857
## Detection Prevalence                           0.05714
## Balanced Accuracy                              0.98529
##                      Class: Frappuccino® Light Blended Coffee
## Sensitivity                                           1.00000
## Specificity                                           1.00000
## Pos Pred Value                                        1.00000
## Neg Pred Value                                        1.00000
## Prevalence                                            0.08571
## Detection Rate                                        0.08571
## Detection Prevalence                                  0.08571
## Balanced Accuracy                                     1.00000
##                      Class: Shaken Iced Beverages
## Sensitivity                               1.00000
## Specificity                               1.00000
## Pos Pred Value                            1.00000
## Neg Pred Value                            1.00000
## Prevalence                                0.05714
## Detection Rate                            0.05714
## Detection Prevalence                      0.05714
## Balanced Accuracy                         1.00000
##                      Class: Signature Espresso Drinks Class: Smoothies
## Sensitivity                                   0.40000          1.00000
## Specificity                                   1.00000          1.00000
## Pos Pred Value                                1.00000          1.00000
## Neg Pred Value                                0.90909          1.00000
## Prevalence                                    0.14286          0.02857
## Detection Rate                                0.05714          0.02857
## Detection Prevalence                          0.05714          0.02857
## Balanced Accuracy                             0.70000          1.00000
##                      Class: Tazo® Tea Drinks
## Sensitivity                              1.0
## Specificity                              1.0
## Pos Pred Value                           1.0
## Neg Pred Value                           1.0
## Prevalence                               0.2
## Detection Rate                           0.2
## Detection Prevalence                     0.2
## Balanced Accuracy                        1.0

The SVM1 model produces an accuracy rate of 85.71%, or, contrastingly, a 14.29 out-of-sample error rate when attempting to classify beverage category based on nutritional characteristics.

#tuning the SVM (validation)
svm_tune <- tune(svm, train.x=Train_SBUX[,-1], train.y=Train_SBUX[,1], 
                 kernel="radial", ranges=list(cost=10^(-1:2), gamma=c(.5,1,2)))
print(svm_tune) ###printed cott=10 and gamma=.5

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost gamma
##    10   0.5
## 
## - best performance: 0.2689542

#re-estimate the model with the optimally tuned parameters
SVM_RETUNE<-svm(Beverage_category~., data = Train_SBUX, cost=10, cross = 10, gamma=.5)
confusionMatrix(predict(SVM_RETUNE, Test_SBUX), Test_SBUX$Beverage_category)

## Confusion Matrix and Statistics
## 
##                                    Reference
## Prediction                          Classic Espresso Drinks Coffee
##   Classic Espresso Drinks                                 7      0
##   Coffee                                                  1      0
##   Frappuccino® Blended Coffee                             0      0
##   Frappuccino® Blended Crème                              0      0
##   Frappuccino® Light Blended Coffee                       0      0
##   Shaken Iced Beverages                                   0      0
##   Signature Espresso Drinks                               2      0
##   Smoothies                                               0      0
##   Tazo® Tea Drinks                                        0      0
##                                    Reference
## Prediction                          Frappuccino® Blended Coffee
##   Classic Espresso Drinks                                     0
##   Coffee                                                      0
##   Frappuccino® Blended Coffee                                 4
##   Frappuccino® Blended Crème                                  1
##   Frappuccino® Light Blended Coffee                           0
##   Shaken Iced Beverages                                       0
##   Signature Espresso Drinks                                   1
##   Smoothies                                                   0
##   Tazo® Tea Drinks                                            0
##                                    Reference
## Prediction                          Frappuccino® Blended Crème
##   Classic Espresso Drinks                                    0
##   Coffee                                                     0
##   Frappuccino® Blended Coffee                                0
##   Frappuccino® Blended Crème                                 1
##   Frappuccino® Light Blended Coffee                          0
##   Shaken Iced Beverages                                      0
##   Signature Espresso Drinks                                  0
##   Smoothies                                                  0
##   Tazo® Tea Drinks                                           0
##                                    Reference
## Prediction                          Frappuccino® Light Blended Coffee
##   Classic Espresso Drinks                                           0
##   Coffee                                                            0
##   Frappuccino® Blended Coffee                                       1
##   Frappuccino® Blended Crème                                        0
##   Frappuccino® Light Blended Coffee                                 2
##   Shaken Iced Beverages                                             0
##   Signature Espresso Drinks                                         0
##   Smoothies                                                         0
##   Tazo® Tea Drinks                                                  0
##                                    Reference
## Prediction                          Shaken Iced Beverages
##   Classic Espresso Drinks                               0
##   Coffee                                                0
##   Frappuccino® Blended Coffee                           0
##   Frappuccino® Blended Crème                            0
##   Frappuccino® Light Blended Coffee                     0
##   Shaken Iced Beverages                                 1
##   Signature Espresso Drinks                             1
##   Smoothies                                             0
##   Tazo® Tea Drinks                                      0
##                                    Reference
## Prediction                          Signature Espresso Drinks Smoothies
##   Classic Espresso Drinks                                   4         0
##   Coffee                                                    0         0
##   Frappuccino® Blended Coffee                               0         0
##   Frappuccino® Blended Crème                                0         0
##   Frappuccino® Light Blended Coffee                         0         0
##   Shaken Iced Beverages                                     0         0
##   Signature Espresso Drinks                                 1         0
##   Smoothies                                                 0         1
##   Tazo® Tea Drinks                                          0         0
##                                    Reference
## Prediction                          Tazo® Tea Drinks
##   Classic Espresso Drinks                          1
##   Coffee                                           0
##   Frappuccino® Blended Coffee                      0
##   Frappuccino® Blended Crème                       0
##   Frappuccino® Light Blended Coffee                0
##   Shaken Iced Beverages                            0
##   Signature Espresso Drinks                        1
##   Smoothies                                        0
##   Tazo® Tea Drinks                                 5
## 
## Overall Statistics
##                                           
##                Accuracy : 0.6286          
##                  95% CI : (0.4492, 0.7853)
##     No Information Rate : 0.2857          
##     P-Value [Acc > NIR] : 2.555e-05       
##                                           
##                   Kappa : 0.5445          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: Classic Espresso Drinks Class: Coffee
## Sensitivity                                  0.7000            NA
## Specificity                                  0.8000       0.97143
## Pos Pred Value                               0.5833            NA
## Neg Pred Value                               0.8696            NA
## Prevalence                                   0.2857       0.00000
## Detection Rate                               0.2000       0.00000
## Detection Prevalence                         0.3429       0.02857
## Balanced Accuracy                            0.7500            NA
##                      Class: Frappuccino® Blended Coffee
## Sensitivity                                      0.6667
## Specificity                                      0.9655
## Pos Pred Value                                   0.8000
## Neg Pred Value                                   0.9333
## Prevalence                                       0.1714
## Detection Rate                                   0.1143
## Detection Prevalence                             0.1429
## Balanced Accuracy                                0.8161
##                      Class: Frappuccino® Blended Crème
## Sensitivity                                    1.00000
## Specificity                                    0.97059
## Pos Pred Value                                 0.50000
## Neg Pred Value                                 1.00000
## Prevalence                                     0.02857
## Detection Rate                                 0.02857
## Detection Prevalence                           0.05714
## Balanced Accuracy                              0.98529
##                      Class: Frappuccino® Light Blended Coffee
## Sensitivity                                           0.66667
## Specificity                                           1.00000
## Pos Pred Value                                        1.00000
## Neg Pred Value                                        0.96970
## Prevalence                                            0.08571
## Detection Rate                                        0.05714
## Detection Prevalence                                  0.05714
## Balanced Accuracy                                     0.83333
##                      Class: Shaken Iced Beverages
## Sensitivity                               0.50000
## Specificity                               1.00000
## Pos Pred Value                            1.00000
## Neg Pred Value                            0.97059
## Prevalence                                0.05714
## Detection Rate                            0.02857
## Detection Prevalence                      0.02857
## Balanced Accuracy                         0.75000
##                      Class: Signature Espresso Drinks Class: Smoothies
## Sensitivity                                   0.20000          1.00000
## Specificity                                   0.83333          1.00000
## Pos Pred Value                                0.16667          1.00000
## Neg Pred Value                                0.86207          1.00000
## Prevalence                                    0.14286          0.02857
## Detection Rate                                0.02857          0.02857
## Detection Prevalence                          0.17143          0.02857
## Balanced Accuracy                             0.51667          1.00000
##                      Class: Tazo® Tea Drinks
## Sensitivity                           0.7143
## Specificity                           1.0000
## Pos Pred Value                        1.0000
## Neg Pred Value                        0.9333
## Prevalence                            0.2000
## Detection Rate                        0.1429
## Detection Prevalence                  0.1429
## Balanced Accuracy                     0.8571

After re-running the SVM model with the optimally tuned parameters, we see an accuracy rate for SVM_RETUNE of 62.86%, a decrease of almost 20% from our orignal SVM1 model.

Comaprison of Classification Models

When performing our analysis for the classification models we used the following methods: CART, Random Forest, and SVM. We ended up running two models of the Random Forest because we wanted to test for a difference between the packages (caret vs random forest). In the end the SVM model was the most accurate, so we are choosing it as the best candidate for classification. That being said, our SVM model was more accurate at its first stage. When we performed the retune command, we were given a model with lower accuracy. This could be a result of the true optimal parameters being outside of the range we gave the model. Following a similar pattern was our (Cohen’s) Kappa value. It was also highest in our SVM model, meaning that this model would still be the most accurate even if it came from random predictions.

Conclusion

Our linear regression model did a solid job of explaining the variation in sugar content for different items on the Starbucks Drink menu. Some variables, such as calcium and fibre, did a better job at explaining the variation than one would expect. Contrastingly, other variables such as Carbs and Calorie content seemed to vary in their explanative ability as we tweaked the regression models.

For our classification model, we see that the Support Vector Machine algorithm does a good job at classifying a row of observations into the correct beverage category.