In this paper we are going to look at data that was collected using a fitbit type device. In the study subjects were asked to perform barbell lifts correctly and four ways incorrectly. These were different techniques were recorded in the variable class and labeled “A”, “B”, “C”, “D”, and “E”. We are going to try to create a machine learning model that will use the data collected from the device to correctly predict which way the subject performed the exercise. We will build two different models, then choose the best one to predict classes based on a separate set of data.
Here we set the seed for the project. We did this to make sure that the model is reproducible. We then loaded libraries that we would use to manipulate, graph and create machine learning algorithms for the data. We also stored the data in the variables data. The data that we will use at the end of the paper to test our model is stored in the variable test_test.
set.seed(628436)
library(dplyr)
library(ggplot2)
library(gridExtra)
library(caret)
data <- read.csv("traindata.csv", stringsAsFactors = FALSE)
test_test <- read.csv("testdata.csv", stringsAsFactors = FALSE)
Here we begin exploratory analysis on the data. First we break the data into two parts train and test. The train set contains 80% of the data and the test set contains the other 20%. The model will be built on the train set. We will then check the accuracy of our model by trying to predict the classe variable in the test set. We check the dimensions of both the train and test variables.
inTrain <- createDataPartition(y = data$classe, p = 0.8, list = FALSE)
train <- data[inTrain, ]
test <- data[-inTrain, ]
dim(train)
## [1] 15699 160
dim(test)
## [1] 3923 160
We see that this is a large dataset. So we decide to remove variables that have near zero variance. These are variables that are pretty much uniform and will not have much effect on the model.
nzv <- nearZeroVar(test_test, saveMetrics=TRUE)
train <- train[,nzv$nzv==FALSE]
test <- test[,nzv$nzv==FALSE]
glimpse(train)
## Observations: 15,699
## Variables: 59
## $ X (int) 1, 2, 3, 5, 6, 7, 10, 12, 13, 14, 16, 17,...
## $ user_name (chr) "carlitos", "carlitos", "carlitos", "carl...
## $ raw_timestamp_part_1 (int) 1323084231, 1323084231, 1323084231, 13230...
## $ raw_timestamp_part_2 (int) 788290, 808298, 820366, 196328, 304277, 3...
## $ cvtd_timestamp (chr) "05/12/2011 11:23", "05/12/2011 11:23", "...
## $ num_window (int) 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 1...
## $ roll_belt (dbl) 1.41, 1.41, 1.42, 1.48, 1.45, 1.42, 1.45,...
## $ pitch_belt (dbl) 8.07, 8.07, 8.07, 8.07, 8.06, 8.09, 8.17,...
## $ yaw_belt (dbl) -94.4, -94.4, -94.4, -94.4, -94.4, -94.4,...
## $ total_accel_belt (int) 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,...
## $ gyros_belt_x (dbl) 0.00, 0.02, 0.00, 0.02, 0.02, 0.02, 0.03,...
## $ gyros_belt_y (dbl) 0.00, 0.00, 0.00, 0.02, 0.00, 0.00, 0.00,...
## $ gyros_belt_z (dbl) -0.02, -0.02, -0.02, -0.02, -0.02, -0.02,...
## $ accel_belt_x (int) -21, -22, -20, -21, -21, -22, -21, -22, -...
## $ accel_belt_y (int) 4, 4, 5, 2, 4, 3, 4, 2, 4, 4, 4, 4, 5, 5,...
## $ accel_belt_z (int) 22, 22, 23, 24, 21, 21, 22, 23, 21, 21, 2...
## $ magnet_belt_x (int) -3, -7, -2, -6, 0, -4, -3, -2, -3, -8, 0,...
## $ magnet_belt_y (int) 599, 608, 600, 600, 603, 599, 609, 602, 6...
## $ magnet_belt_z (int) -313, -311, -305, -302, -312, -311, -308,...
## $ roll_arm (dbl) -128, -128, -128, -128, -128, -128, -128,...
## $ pitch_arm (dbl) 22.5, 22.5, 22.5, 22.1, 22.0, 21.9, 21.6,...
## $ yaw_arm (dbl) -161, -161, -161, -161, -161, -161, -161,...
## $ total_accel_arm (int) 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 3...
## $ gyros_arm_x (dbl) 0.00, 0.02, 0.02, 0.00, 0.02, 0.00, 0.02,...
## $ gyros_arm_y (dbl) 0.00, -0.02, -0.02, -0.03, -0.03, -0.03, ...
## $ gyros_arm_z (dbl) -0.02, -0.02, -0.02, 0.00, 0.00, 0.00, -0...
## $ accel_arm_x (int) -288, -290, -289, -289, -289, -289, -288,...
## $ accel_arm_y (int) 109, 110, 110, 111, 111, 111, 110, 111, 1...
## $ accel_arm_z (int) -123, -125, -126, -123, -122, -125, -124,...
## $ magnet_arm_x (int) -368, -369, -368, -374, -369, -373, -376,...
## $ magnet_arm_y (int) 337, 337, 344, 337, 342, 336, 334, 343, 3...
## $ magnet_arm_z (int) 516, 513, 513, 506, 513, 509, 516, 520, 5...
## $ roll_dumbbell (dbl) 13.05217, 13.13074, 12.85075, 13.37872, 1...
## $ pitch_dumbbell (dbl) -70.49400, -70.63751, -70.27812, -70.4285...
## $ yaw_dumbbell (dbl) -84.87394, -84.71065, -85.14078, -84.8530...
## $ total_accel_dumbbell (int) 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 3...
## $ gyros_dumbbell_x (dbl) 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00,...
## $ gyros_dumbbell_y (dbl) -0.02, -0.02, -0.02, -0.02, -0.02, -0.02,...
## $ gyros_dumbbell_z (dbl) 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00,...
## $ accel_dumbbell_x (int) -234, -233, -232, -233, -234, -232, -235,...
## $ accel_dumbbell_y (int) 47, 47, 46, 48, 48, 47, 48, 47, 48, 48, 4...
## $ accel_dumbbell_z (int) -271, -269, -270, -270, -269, -270, -270,...
## $ magnet_dumbbell_x (int) -559, -555, -561, -554, -558, -551, -558,...
## $ magnet_dumbbell_y (int) 293, 296, 298, 292, 294, 295, 291, 291, 3...
## $ magnet_dumbbell_z (dbl) -65, -64, -63, -68, -66, -70, -69, -65, -...
## $ roll_forearm (dbl) 28.4, 28.3, 28.3, 28.0, 27.9, 27.9, 27.7,...
## $ pitch_forearm (dbl) -63.9, -63.9, -63.9, -63.9, -63.9, -63.9,...
## $ yaw_forearm (dbl) -153, -153, -152, -152, -152, -152, -152,...
## $ total_accel_forearm (int) 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 3...
## $ gyros_forearm_x (dbl) 0.03, 0.02, 0.03, 0.02, 0.02, 0.02, 0.02,...
## $ gyros_forearm_y (dbl) 0.00, 0.00, -0.02, 0.00, -0.02, 0.00, 0.0...
## $ gyros_forearm_z (dbl) -0.02, -0.02, 0.00, -0.02, -0.03, -0.02, ...
## $ accel_forearm_x (int) 192, 192, 196, 189, 193, 195, 190, 191, 1...
## $ accel_forearm_y (int) 203, 203, 204, 206, 203, 205, 205, 203, 2...
## $ accel_forearm_z (int) -215, -216, -213, -214, -215, -215, -215,...
## $ magnet_forearm_x (int) -17, -18, -18, -17, -9, -18, -22, -11, -1...
## $ magnet_forearm_y (dbl) 654, 661, 658, 655, 660, 659, 656, 657, 6...
## $ magnet_forearm_z (dbl) 476, 473, 469, 473, 478, 470, 473, 478, 4...
## $ classe (chr) "A", "A", "A", "A", "A", "A", "A", "A", "...
After taking a look at the variables we decide to remove the first five column of the data, which contain ID type variables. These include the variables X, which just numbers all the observations. We also remove user_name, which records the name of the person performing the test. We also remove raw_timestamp_part_1, raw_timestamp_part_2, and cvtd_timestamp, which are variables that record the time and date the activities were performed. We do these transformations to both the train variable and the test variable because whatever you do to train set you have to do to the test set.
train <- select(train, -(1:5))
test <- select(test, -(1:5))
We then change the classe variable from a character variable to a factor variable.
train$classe <- as.factor(train$classe)
test$classe <- as.factor(test$classe)
We check the dimesions of the data frame again.
dim(train)
## [1] 15699 54
dim(test)
## [1] 3923 54
We now have the manipulated the data so that we can use it to build our models.
We build the model using the folowing code. It is a Classification Tree model. We use knn imputing to impute any missing values. We also use ten fold cross-validation. So during modeling our dataset is split into ten folds. These folds represent 10 test sets, containing one instance of our original data selected randomly. The out of sample error is then used for each test set and applied to the overall model. This method should reduce the out of sample error on the whole model.
model_rpart <- train(classe ~ ., train, method = "rpart", preProcess = "knnImpute", trControl = trainControl(method = "cv", number = 10))
model_rpart
## CART
##
## 15699 samples
## 53 predictor
## 5 classes: 'A', 'B', 'C', 'D', 'E'
##
## Pre-processing: num_window nearest neighbor imputation (53),
## centered (53), scaled (53)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 14129, 14129, 14128, 14128, 14129, 14129, ...
## Resampling results across tuning parameters:
##
## cp Accuracy Kappa Accuracy SD Kappa SD
## 0.03929684 0.5494083 0.42258991 0.02795134 0.04210466
## 0.05536271 0.4808658 0.31245587 0.06294576 0.10190795
## 0.11526480 0.3313507 0.07157808 0.04035547 0.06164077
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was cp = 0.03929684.
When looking at this model we see that it used 10 fold cross-validation as its resampling too. The accuracy that it predicts to have is about 53.21%.
We then use the predict function to test the model on the test split of the dataset. We use a Confusion Matrix to compare our predictions with the actual values of classe in the test set.
predictions_rpart <- predict(model_rpart, test)
confusionMatrix(predictions_rpart, test$classe)
## Confusion Matrix and Statistics
##
## Reference
## Prediction A B C D E
## A 1027 320 323 272 73
## B 9 255 20 123 60
## C 77 184 341 221 156
## D 0 0 0 0 0
## E 3 0 0 27 432
##
## Overall Statistics
##
## Accuracy : 0.5238
## 95% CI : (0.5081, 0.5396)
## No Information Rate : 0.2845
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.3781
## Mcnemar's Test P-Value : < 2.2e-16
##
## Statistics by Class:
##
## Class: A Class: B Class: C Class: D Class: E
## Sensitivity 0.9203 0.3360 0.49854 0.0000 0.5992
## Specificity 0.6480 0.9330 0.80303 1.0000 0.9906
## Pos Pred Value 0.5097 0.5460 0.34831 NaN 0.9351
## Neg Pred Value 0.9534 0.8542 0.88349 0.8361 0.9165
## Prevalence 0.2845 0.1935 0.17436 0.1639 0.1838
## Detection Rate 0.2618 0.0650 0.08692 0.0000 0.1101
## Detection Prevalence 0.5136 0.1190 0.24955 0.0000 0.1178
## Balanced Accuracy 0.7841 0.6345 0.65078 0.5000 0.7949
From the confusion matrix we see that the predictions were 49.58% accurate. These predictions are within the 95% confidence interval that says that his model will accurately predict the classe variable. According to this confidence interval, with about 95% certainty we can say that this model will predict with between 48% and 51.16% percent accuracy.
Next we build a random forest model
We build the Random Forrest model with the following code. Again we use 10 fold cross-validation and impute missing values with knn Imputation.
model_rf <- train(classe ~ ., train, method = "ranger", preProcess = "knnImpute", trControl = trainControl(method = "cv", number = 10))
## Growing trees.. Progress: 64%. Estimated remaining time: 17 seconds.
## Growing trees.. Progress: 71%. Estimated remaining time: 12 seconds.
## Growing trees.. Progress: 85%. Estimated remaining time: 5 seconds.
## Growing trees.. Progress: 90%. Estimated remaining time: 3 seconds.
## Growing trees.. Progress: 74%. Estimated remaining time: 11 seconds.
## Growing trees.. Progress: 71%. Estimated remaining time: 12 seconds.
## Growing trees.. Progress: 82%. Estimated remaining time: 6 seconds.
## Growing trees.. Progress: 89%. Estimated remaining time: 3 seconds.
## Growing trees.. Progress: 93%. Estimated remaining time: 2 seconds.
## Growing trees.. Progress: 91%. Estimated remaining time: 3 seconds.
model_rf
## Random Forest
##
## 15699 samples
## 53 predictor
## 5 classes: 'A', 'B', 'C', 'D', 'E'
##
## Pre-processing: num_window nearest neighbor imputation (53),
## centered (53), scaled (53)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 14130, 14128, 14129, 14129, 14128, 14127, ...
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa Accuracy SD Kappa SD
## 2 0.9957316 0.9946005 0.001902188 0.002406826
## 27 0.9978336 0.9972597 0.001651312 0.002088905
## 53 0.9958593 0.9947620 0.001783387 0.002256155
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 27.
This model shows that it used ten fold cross validation as its sampling tool. It predicts that it will be about 99.54% accurate.
We then test the model with the following code and compare our predicted outcomes with the actual outcomes using the Confusion Matrix.
predictions_rf <- predict(model_rf, test)
confusionMatrix(predictions_rf, test$classe)
## Confusion Matrix and Statistics
##
## Reference
## Prediction A B C D E
## A 1115 1 0 0 0
## B 0 758 1 0 0
## C 0 0 683 1 0
## D 0 0 0 641 0
## E 1 0 0 1 721
##
## Overall Statistics
##
## Accuracy : 0.9987
## 95% CI : (0.997, 0.9996)
## No Information Rate : 0.2845
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9984
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: A Class: B Class: C Class: D Class: E
## Sensitivity 0.9991 0.9987 0.9985 0.9969 1.0000
## Specificity 0.9996 0.9997 0.9997 1.0000 0.9994
## Pos Pred Value 0.9991 0.9987 0.9985 1.0000 0.9972
## Neg Pred Value 0.9996 0.9997 0.9997 0.9994 1.0000
## Prevalence 0.2845 0.1935 0.1744 0.1639 0.1838
## Detection Rate 0.2842 0.1932 0.1741 0.1634 0.1838
## Detection Prevalence 0.2845 0.1935 0.1744 0.1634 0.1843
## Balanced Accuracy 0.9994 0.9992 0.9991 0.9984 0.9997
The Confusion Matrix shows that this model predicted with 99.95% accuracy. There was only one time where the predicted outcome and the actual outcome differed. According to the 95% confidence interval, this model will predict with 99.82% to 99.99% accuracy about 95% of the time.
We then build a GBM model.
We build the GBM model with the following code. Again we use 10 fold-cross validation and knn Imputation.
model_gbm <- train(classe ~ ., train, method = "gbm", preProcess = "knnImpute", trControl = trainControl(method = "cv", number = 10))
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1305
## 2 1.5240 nan 0.1000 0.0876
## 3 1.4651 nan 0.1000 0.0665
## 4 1.4200 nan 0.1000 0.0538
## 5 1.3845 nan 0.1000 0.0470
## 6 1.3527 nan 0.1000 0.0414
## 7 1.3264 nan 0.1000 0.0377
## 8 1.3018 nan 0.1000 0.0416
## 9 1.2745 nan 0.1000 0.0299
## 10 1.2543 nan 0.1000 0.0310
## 20 1.0926 nan 0.1000 0.0181
## 40 0.9127 nan 0.1000 0.0109
## 60 0.8004 nan 0.1000 0.0074
## 80 0.7199 nan 0.1000 0.0053
## 100 0.6543 nan 0.1000 0.0043
## 120 0.6002 nan 0.1000 0.0030
## 140 0.5539 nan 0.1000 0.0042
## 150 0.5315 nan 0.1000 0.0032
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1861
## 2 1.4873 nan 0.1000 0.1321
## 3 1.4026 nan 0.1000 0.1132
## 4 1.3317 nan 0.1000 0.0846
## 5 1.2772 nan 0.1000 0.0764
## 6 1.2294 nan 0.1000 0.0686
## 7 1.1857 nan 0.1000 0.0590
## 8 1.1480 nan 0.1000 0.0596
## 9 1.1109 nan 0.1000 0.0414
## 10 1.0837 nan 0.1000 0.0411
## 20 0.8516 nan 0.1000 0.0246
## 40 0.6231 nan 0.1000 0.0125
## 60 0.4862 nan 0.1000 0.0076
## 80 0.3974 nan 0.1000 0.0060
## 100 0.3287 nan 0.1000 0.0042
## 120 0.2786 nan 0.1000 0.0040
## 140 0.2329 nan 0.1000 0.0038
## 150 0.2135 nan 0.1000 0.0009
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.2433
## 2 1.4574 nan 0.1000 0.1654
## 3 1.3542 nan 0.1000 0.1363
## 4 1.2711 nan 0.1000 0.1026
## 5 1.2071 nan 0.1000 0.0934
## 6 1.1487 nan 0.1000 0.0836
## 7 1.0963 nan 0.1000 0.0698
## 8 1.0506 nan 0.1000 0.0702
## 9 1.0070 nan 0.1000 0.0513
## 10 0.9740 nan 0.1000 0.0737
## 20 0.6973 nan 0.1000 0.0259
## 40 0.4482 nan 0.1000 0.0143
## 60 0.3192 nan 0.1000 0.0065
## 80 0.2399 nan 0.1000 0.0036
## 100 0.1900 nan 0.1000 0.0024
## 120 0.1514 nan 0.1000 0.0031
## 140 0.1206 nan 0.1000 0.0009
## 150 0.1104 nan 0.1000 0.0010
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1285
## 2 1.5231 nan 0.1000 0.0888
## 3 1.4643 nan 0.1000 0.0667
## 4 1.4205 nan 0.1000 0.0561
## 5 1.3843 nan 0.1000 0.0461
## 6 1.3541 nan 0.1000 0.0439
## 7 1.3261 nan 0.1000 0.0380
## 8 1.3013 nan 0.1000 0.0336
## 9 1.2769 nan 0.1000 0.0352
## 10 1.2542 nan 0.1000 0.0329
## 20 1.0911 nan 0.1000 0.0191
## 40 0.9108 nan 0.1000 0.0115
## 60 0.7999 nan 0.1000 0.0060
## 80 0.7199 nan 0.1000 0.0069
## 100 0.6550 nan 0.1000 0.0042
## 120 0.6003 nan 0.1000 0.0038
## 140 0.5541 nan 0.1000 0.0025
## 150 0.5340 nan 0.1000 0.0019
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1951
## 2 1.4837 nan 0.1000 0.1328
## 3 1.3979 nan 0.1000 0.1039
## 4 1.3311 nan 0.1000 0.0868
## 5 1.2756 nan 0.1000 0.0768
## 6 1.2268 nan 0.1000 0.0628
## 7 1.1863 nan 0.1000 0.0695
## 8 1.1433 nan 0.1000 0.0555
## 9 1.1085 nan 0.1000 0.0405
## 10 1.0818 nan 0.1000 0.0474
## 20 0.8534 nan 0.1000 0.0283
## 40 0.6188 nan 0.1000 0.0114
## 60 0.4875 nan 0.1000 0.0079
## 80 0.3908 nan 0.1000 0.0048
## 100 0.3234 nan 0.1000 0.0047
## 120 0.2741 nan 0.1000 0.0033
## 140 0.2302 nan 0.1000 0.0039
## 150 0.2126 nan 0.1000 0.0023
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.2436
## 2 1.4559 nan 0.1000 0.1682
## 3 1.3496 nan 0.1000 0.1267
## 4 1.2705 nan 0.1000 0.1041
## 5 1.2045 nan 0.1000 0.1024
## 6 1.1416 nan 0.1000 0.0748
## 7 1.0949 nan 0.1000 0.0708
## 8 1.0497 nan 0.1000 0.0701
## 9 1.0062 nan 0.1000 0.0722
## 10 0.9618 nan 0.1000 0.0460
## 20 0.7050 nan 0.1000 0.0326
## 40 0.4565 nan 0.1000 0.0090
## 60 0.3335 nan 0.1000 0.0050
## 80 0.2540 nan 0.1000 0.0060
## 100 0.2001 nan 0.1000 0.0040
## 120 0.1619 nan 0.1000 0.0038
## 140 0.1299 nan 0.1000 0.0025
## 150 0.1171 nan 0.1000 0.0009
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1291
## 2 1.5213 nan 0.1000 0.0897
## 3 1.4625 nan 0.1000 0.0664
## 4 1.4181 nan 0.1000 0.0552
## 5 1.3817 nan 0.1000 0.0426
## 6 1.3533 nan 0.1000 0.0469
## 7 1.3239 nan 0.1000 0.0377
## 8 1.2989 nan 0.1000 0.0319
## 9 1.2776 nan 0.1000 0.0391
## 10 1.2523 nan 0.1000 0.0276
## 20 1.0934 nan 0.1000 0.0162
## 40 0.9138 nan 0.1000 0.0104
## 60 0.8007 nan 0.1000 0.0066
## 80 0.7175 nan 0.1000 0.0060
## 100 0.6545 nan 0.1000 0.0045
## 120 0.5999 nan 0.1000 0.0032
## 140 0.5552 nan 0.1000 0.0035
## 150 0.5331 nan 0.1000 0.0020
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1895
## 2 1.4873 nan 0.1000 0.1316
## 3 1.4016 nan 0.1000 0.1131
## 4 1.3307 nan 0.1000 0.0831
## 5 1.2768 nan 0.1000 0.0788
## 6 1.2270 nan 0.1000 0.0603
## 7 1.1877 nan 0.1000 0.0715
## 8 1.1441 nan 0.1000 0.0516
## 9 1.1112 nan 0.1000 0.0488
## 10 1.0795 nan 0.1000 0.0436
## 20 0.8469 nan 0.1000 0.0204
## 40 0.6175 nan 0.1000 0.0148
## 60 0.4851 nan 0.1000 0.0057
## 80 0.3904 nan 0.1000 0.0071
## 100 0.3224 nan 0.1000 0.0062
## 120 0.2719 nan 0.1000 0.0028
## 140 0.2326 nan 0.1000 0.0026
## 150 0.2152 nan 0.1000 0.0021
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.2398
## 2 1.4553 nan 0.1000 0.1640
## 3 1.3513 nan 0.1000 0.1210
## 4 1.2750 nan 0.1000 0.1125
## 5 1.2050 nan 0.1000 0.0941
## 6 1.1467 nan 0.1000 0.0889
## 7 1.0916 nan 0.1000 0.0679
## 8 1.0484 nan 0.1000 0.0608
## 9 1.0102 nan 0.1000 0.0620
## 10 0.9701 nan 0.1000 0.0666
## 20 0.6937 nan 0.1000 0.0356
## 40 0.4470 nan 0.1000 0.0093
## 60 0.3260 nan 0.1000 0.0062
## 80 0.2487 nan 0.1000 0.0039
## 100 0.1944 nan 0.1000 0.0023
## 120 0.1551 nan 0.1000 0.0034
## 140 0.1220 nan 0.1000 0.0018
## 150 0.1118 nan 0.1000 0.0014
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1280
## 2 1.5242 nan 0.1000 0.0918
## 3 1.4638 nan 0.1000 0.0678
## 4 1.4192 nan 0.1000 0.0550
## 5 1.3840 nan 0.1000 0.0523
## 6 1.3505 nan 0.1000 0.0385
## 7 1.3251 nan 0.1000 0.0386
## 8 1.3007 nan 0.1000 0.0370
## 9 1.2753 nan 0.1000 0.0374
## 10 1.2520 nan 0.1000 0.0309
## 20 1.0934 nan 0.1000 0.0200
## 40 0.9121 nan 0.1000 0.0089
## 60 0.7991 nan 0.1000 0.0061
## 80 0.7168 nan 0.1000 0.0047
## 100 0.6511 nan 0.1000 0.0042
## 120 0.5983 nan 0.1000 0.0037
## 140 0.5524 nan 0.1000 0.0033
## 150 0.5326 nan 0.1000 0.0031
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1975
## 2 1.4842 nan 0.1000 0.1307
## 3 1.4008 nan 0.1000 0.1079
## 4 1.3321 nan 0.1000 0.0893
## 5 1.2762 nan 0.1000 0.0755
## 6 1.2280 nan 0.1000 0.0590
## 7 1.1882 nan 0.1000 0.0670
## 8 1.1468 nan 0.1000 0.0570
## 9 1.1112 nan 0.1000 0.0462
## 10 1.0811 nan 0.1000 0.0510
## 20 0.8519 nan 0.1000 0.0194
## 40 0.6268 nan 0.1000 0.0153
## 60 0.4806 nan 0.1000 0.0067
## 80 0.3903 nan 0.1000 0.0052
## 100 0.3243 nan 0.1000 0.0052
## 120 0.2712 nan 0.1000 0.0036
## 140 0.2296 nan 0.1000 0.0018
## 150 0.2122 nan 0.1000 0.0019
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.2403
## 2 1.4553 nan 0.1000 0.1656
## 3 1.3523 nan 0.1000 0.1283
## 4 1.2708 nan 0.1000 0.1112
## 5 1.1987 nan 0.1000 0.0903
## 6 1.1420 nan 0.1000 0.0789
## 7 1.0918 nan 0.1000 0.0700
## 8 1.0468 nan 0.1000 0.0776
## 9 0.9993 nan 0.1000 0.0574
## 10 0.9628 nan 0.1000 0.0630
## 20 0.6911 nan 0.1000 0.0248
## 40 0.4510 nan 0.1000 0.0108
## 60 0.3294 nan 0.1000 0.0070
## 80 0.2488 nan 0.1000 0.0069
## 100 0.1963 nan 0.1000 0.0032
## 120 0.1542 nan 0.1000 0.0025
## 140 0.1253 nan 0.1000 0.0022
## 150 0.1112 nan 0.1000 0.0012
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1351
## 2 1.5220 nan 0.1000 0.0906
## 3 1.4634 nan 0.1000 0.0677
## 4 1.4194 nan 0.1000 0.0548
## 5 1.3829 nan 0.1000 0.0447
## 6 1.3531 nan 0.1000 0.0464
## 7 1.3240 nan 0.1000 0.0376
## 8 1.3004 nan 0.1000 0.0350
## 9 1.2765 nan 0.1000 0.0326
## 10 1.2557 nan 0.1000 0.0367
## 20 1.0920 nan 0.1000 0.0193
## 40 0.9134 nan 0.1000 0.0102
## 60 0.8034 nan 0.1000 0.0066
## 80 0.7213 nan 0.1000 0.0046
## 100 0.6563 nan 0.1000 0.0053
## 120 0.6013 nan 0.1000 0.0044
## 140 0.5547 nan 0.1000 0.0031
## 150 0.5335 nan 0.1000 0.0029
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1930
## 2 1.4846 nan 0.1000 0.1291
## 3 1.4012 nan 0.1000 0.1090
## 4 1.3304 nan 0.1000 0.0888
## 5 1.2736 nan 0.1000 0.0704
## 6 1.2283 nan 0.1000 0.0764
## 7 1.1812 nan 0.1000 0.0668
## 8 1.1390 nan 0.1000 0.0533
## 9 1.1057 nan 0.1000 0.0500
## 10 1.0745 nan 0.1000 0.0420
## 20 0.8535 nan 0.1000 0.0289
## 40 0.6285 nan 0.1000 0.0100
## 60 0.4957 nan 0.1000 0.0082
## 80 0.4028 nan 0.1000 0.0080
## 100 0.3316 nan 0.1000 0.0066
## 120 0.2808 nan 0.1000 0.0043
## 140 0.2383 nan 0.1000 0.0024
## 150 0.2201 nan 0.1000 0.0026
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.2404
## 2 1.4564 nan 0.1000 0.1664
## 3 1.3541 nan 0.1000 0.1274
## 4 1.2724 nan 0.1000 0.1046
## 5 1.2055 nan 0.1000 0.0876
## 6 1.1500 nan 0.1000 0.0897
## 7 1.0957 nan 0.1000 0.0705
## 8 1.0513 nan 0.1000 0.0648
## 9 1.0111 nan 0.1000 0.0626
## 10 0.9718 nan 0.1000 0.0564
## 20 0.7041 nan 0.1000 0.0421
## 40 0.4548 nan 0.1000 0.0115
## 60 0.3337 nan 0.1000 0.0068
## 80 0.2484 nan 0.1000 0.0036
## 100 0.1928 nan 0.1000 0.0029
## 120 0.1547 nan 0.1000 0.0019
## 140 0.1254 nan 0.1000 0.0029
## 150 0.1115 nan 0.1000 0.0020
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1286
## 2 1.5223 nan 0.1000 0.0885
## 3 1.4643 nan 0.1000 0.0662
## 4 1.4204 nan 0.1000 0.0553
## 5 1.3839 nan 0.1000 0.0496
## 6 1.3521 nan 0.1000 0.0429
## 7 1.3245 nan 0.1000 0.0365
## 8 1.3009 nan 0.1000 0.0355
## 9 1.2758 nan 0.1000 0.0346
## 10 1.2536 nan 0.1000 0.0285
## 20 1.0904 nan 0.1000 0.0159
## 40 0.9128 nan 0.1000 0.0091
## 60 0.8035 nan 0.1000 0.0056
## 80 0.7189 nan 0.1000 0.0041
## 100 0.6538 nan 0.1000 0.0047
## 120 0.6003 nan 0.1000 0.0034
## 140 0.5547 nan 0.1000 0.0020
## 150 0.5345 nan 0.1000 0.0039
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1911
## 2 1.4856 nan 0.1000 0.1304
## 3 1.4014 nan 0.1000 0.1044
## 4 1.3360 nan 0.1000 0.0904
## 5 1.2790 nan 0.1000 0.0813
## 6 1.2278 nan 0.1000 0.0650
## 7 1.1860 nan 0.1000 0.0547
## 8 1.1506 nan 0.1000 0.0586
## 9 1.1142 nan 0.1000 0.0523
## 10 1.0817 nan 0.1000 0.0433
## 20 0.8592 nan 0.1000 0.0225
## 40 0.6279 nan 0.1000 0.0106
## 60 0.4946 nan 0.1000 0.0065
## 80 0.3962 nan 0.1000 0.0045
## 100 0.3251 nan 0.1000 0.0059
## 120 0.2695 nan 0.1000 0.0025
## 140 0.2301 nan 0.1000 0.0038
## 150 0.2113 nan 0.1000 0.0020
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.2398
## 2 1.4582 nan 0.1000 0.1622
## 3 1.3566 nan 0.1000 0.1207
## 4 1.2812 nan 0.1000 0.1133
## 5 1.2104 nan 0.1000 0.0876
## 6 1.1545 nan 0.1000 0.0792
## 7 1.1048 nan 0.1000 0.0745
## 8 1.0567 nan 0.1000 0.0796
## 9 1.0080 nan 0.1000 0.0617
## 10 0.9696 nan 0.1000 0.0548
## 20 0.6894 nan 0.1000 0.0236
## 40 0.4529 nan 0.1000 0.0104
## 60 0.3291 nan 0.1000 0.0082
## 80 0.2447 nan 0.1000 0.0037
## 100 0.1913 nan 0.1000 0.0036
## 120 0.1528 nan 0.1000 0.0014
## 140 0.1256 nan 0.1000 0.0019
## 150 0.1135 nan 0.1000 0.0023
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1288
## 2 1.5241 nan 0.1000 0.0888
## 3 1.4650 nan 0.1000 0.0640
## 4 1.4211 nan 0.1000 0.0561
## 5 1.3844 nan 0.1000 0.0504
## 6 1.3526 nan 0.1000 0.0404
## 7 1.3262 nan 0.1000 0.0384
## 8 1.3017 nan 0.1000 0.0365
## 9 1.2773 nan 0.1000 0.0305
## 10 1.2574 nan 0.1000 0.0368
## 20 1.0951 nan 0.1000 0.0196
## 40 0.9120 nan 0.1000 0.0102
## 60 0.8020 nan 0.1000 0.0059
## 80 0.7210 nan 0.1000 0.0054
## 100 0.6566 nan 0.1000 0.0036
## 120 0.6036 nan 0.1000 0.0037
## 140 0.5564 nan 0.1000 0.0032
## 150 0.5353 nan 0.1000 0.0026
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1823
## 2 1.4876 nan 0.1000 0.1344
## 3 1.4013 nan 0.1000 0.1106
## 4 1.3300 nan 0.1000 0.0884
## 5 1.2740 nan 0.1000 0.0714
## 6 1.2287 nan 0.1000 0.0748
## 7 1.1831 nan 0.1000 0.0574
## 8 1.1461 nan 0.1000 0.0608
## 9 1.1078 nan 0.1000 0.0519
## 10 1.0761 nan 0.1000 0.0426
## 20 0.8559 nan 0.1000 0.0205
## 40 0.6170 nan 0.1000 0.0087
## 60 0.4890 nan 0.1000 0.0081
## 80 0.3938 nan 0.1000 0.0072
## 100 0.3224 nan 0.1000 0.0053
## 120 0.2726 nan 0.1000 0.0050
## 140 0.2341 nan 0.1000 0.0032
## 150 0.2144 nan 0.1000 0.0022
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.2389
## 2 1.4560 nan 0.1000 0.1585
## 3 1.3548 nan 0.1000 0.1280
## 4 1.2735 nan 0.1000 0.1033
## 5 1.2089 nan 0.1000 0.0976
## 6 1.1477 nan 0.1000 0.0849
## 7 1.0952 nan 0.1000 0.0724
## 8 1.0499 nan 0.1000 0.0662
## 9 1.0086 nan 0.1000 0.0551
## 10 0.9740 nan 0.1000 0.0606
## 20 0.7022 nan 0.1000 0.0296
## 40 0.4591 nan 0.1000 0.0141
## 60 0.3309 nan 0.1000 0.0069
## 80 0.2550 nan 0.1000 0.0047
## 100 0.2007 nan 0.1000 0.0036
## 120 0.1604 nan 0.1000 0.0026
## 140 0.1285 nan 0.1000 0.0016
## 150 0.1157 nan 0.1000 0.0011
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1287
## 2 1.5219 nan 0.1000 0.0874
## 3 1.4626 nan 0.1000 0.0670
## 4 1.4179 nan 0.1000 0.0550
## 5 1.3825 nan 0.1000 0.0448
## 6 1.3525 nan 0.1000 0.0459
## 7 1.3234 nan 0.1000 0.0367
## 8 1.2990 nan 0.1000 0.0339
## 9 1.2750 nan 0.1000 0.0361
## 10 1.2528 nan 0.1000 0.0311
## 20 1.0921 nan 0.1000 0.0168
## 40 0.9126 nan 0.1000 0.0102
## 60 0.8020 nan 0.1000 0.0059
## 80 0.7206 nan 0.1000 0.0051
## 100 0.6554 nan 0.1000 0.0032
## 120 0.6034 nan 0.1000 0.0036
## 140 0.5563 nan 0.1000 0.0027
## 150 0.5347 nan 0.1000 0.0035
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1910
## 2 1.4851 nan 0.1000 0.1310
## 3 1.4002 nan 0.1000 0.1049
## 4 1.3330 nan 0.1000 0.0896
## 5 1.2767 nan 0.1000 0.0715
## 6 1.2309 nan 0.1000 0.0623
## 7 1.1911 nan 0.1000 0.0661
## 8 1.1501 nan 0.1000 0.0551
## 9 1.1163 nan 0.1000 0.0552
## 10 1.0829 nan 0.1000 0.0415
## 20 0.8545 nan 0.1000 0.0283
## 40 0.6141 nan 0.1000 0.0087
## 60 0.4796 nan 0.1000 0.0080
## 80 0.3920 nan 0.1000 0.0065
## 100 0.3296 nan 0.1000 0.0043
## 120 0.2764 nan 0.1000 0.0026
## 140 0.2325 nan 0.1000 0.0028
## 150 0.2164 nan 0.1000 0.0010
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.2406
## 2 1.4559 nan 0.1000 0.1579
## 3 1.3557 nan 0.1000 0.1275
## 4 1.2739 nan 0.1000 0.1069
## 5 1.2064 nan 0.1000 0.0912
## 6 1.1475 nan 0.1000 0.0778
## 7 1.0986 nan 0.1000 0.0742
## 8 1.0503 nan 0.1000 0.0524
## 9 1.0165 nan 0.1000 0.0713
## 10 0.9726 nan 0.1000 0.0650
## 20 0.6950 nan 0.1000 0.0360
## 40 0.4558 nan 0.1000 0.0161
## 60 0.3239 nan 0.1000 0.0058
## 80 0.2526 nan 0.1000 0.0047
## 100 0.1977 nan 0.1000 0.0028
## 120 0.1592 nan 0.1000 0.0043
## 140 0.1265 nan 0.1000 0.0012
## 150 0.1149 nan 0.1000 0.0011
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1277
## 2 1.5225 nan 0.1000 0.0887
## 3 1.4641 nan 0.1000 0.0697
## 4 1.4186 nan 0.1000 0.0532
## 5 1.3833 nan 0.1000 0.0463
## 6 1.3516 nan 0.1000 0.0390
## 7 1.3261 nan 0.1000 0.0434
## 8 1.2996 nan 0.1000 0.0290
## 9 1.2802 nan 0.1000 0.0414
## 10 1.2529 nan 0.1000 0.0326
## 20 1.0924 nan 0.1000 0.0180
## 40 0.9136 nan 0.1000 0.0096
## 60 0.8008 nan 0.1000 0.0072
## 80 0.7185 nan 0.1000 0.0045
## 100 0.6544 nan 0.1000 0.0030
## 120 0.6008 nan 0.1000 0.0029
## 140 0.5539 nan 0.1000 0.0028
## 150 0.5329 nan 0.1000 0.0026
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1945
## 2 1.4851 nan 0.1000 0.1273
## 3 1.4019 nan 0.1000 0.1042
## 4 1.3348 nan 0.1000 0.0883
## 5 1.2787 nan 0.1000 0.0750
## 6 1.2311 nan 0.1000 0.0616
## 7 1.1908 nan 0.1000 0.0637
## 8 1.1505 nan 0.1000 0.0560
## 9 1.1156 nan 0.1000 0.0496
## 10 1.0844 nan 0.1000 0.0499
## 20 0.8412 nan 0.1000 0.0219
## 40 0.6174 nan 0.1000 0.0102
## 60 0.4846 nan 0.1000 0.0072
## 80 0.3886 nan 0.1000 0.0044
## 100 0.3242 nan 0.1000 0.0037
## 120 0.2743 nan 0.1000 0.0035
## 140 0.2286 nan 0.1000 0.0026
## 150 0.2131 nan 0.1000 0.0016
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.2414
## 2 1.4558 nan 0.1000 0.1637
## 3 1.3536 nan 0.1000 0.1235
## 4 1.2763 nan 0.1000 0.1150
## 5 1.2036 nan 0.1000 0.0948
## 6 1.1439 nan 0.1000 0.0919
## 7 1.0879 nan 0.1000 0.0850
## 8 1.0358 nan 0.1000 0.0517
## 9 1.0023 nan 0.1000 0.0660
## 10 0.9618 nan 0.1000 0.0645
## 20 0.6987 nan 0.1000 0.0295
## 40 0.4530 nan 0.1000 0.0114
## 60 0.3233 nan 0.1000 0.0060
## 80 0.2490 nan 0.1000 0.0019
## 100 0.1927 nan 0.1000 0.0033
## 120 0.1542 nan 0.1000 0.0024
## 140 0.1216 nan 0.1000 0.0024
## 150 0.1091 nan 0.1000 0.0022
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1340
## 2 1.5215 nan 0.1000 0.0912
## 3 1.4618 nan 0.1000 0.0677
## 4 1.4164 nan 0.1000 0.0548
## 5 1.3801 nan 0.1000 0.0456
## 6 1.3504 nan 0.1000 0.0439
## 7 1.3209 nan 0.1000 0.0406
## 8 1.2953 nan 0.1000 0.0343
## 9 1.2710 nan 0.1000 0.0365
## 10 1.2480 nan 0.1000 0.0325
## 20 1.0856 nan 0.1000 0.0199
## 40 0.9073 nan 0.1000 0.0100
## 60 0.7984 nan 0.1000 0.0086
## 80 0.7154 nan 0.1000 0.0046
## 100 0.6503 nan 0.1000 0.0042
## 120 0.5958 nan 0.1000 0.0034
## 140 0.5507 nan 0.1000 0.0023
## 150 0.5300 nan 0.1000 0.0029
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1924
## 2 1.4836 nan 0.1000 0.1325
## 3 1.3992 nan 0.1000 0.1090
## 4 1.3291 nan 0.1000 0.0899
## 5 1.2712 nan 0.1000 0.0743
## 6 1.2239 nan 0.1000 0.0742
## 7 1.1776 nan 0.1000 0.0512
## 8 1.1439 nan 0.1000 0.0603
## 9 1.1065 nan 0.1000 0.0505
## 10 1.0749 nan 0.1000 0.0440
## 20 0.8470 nan 0.1000 0.0205
## 40 0.6285 nan 0.1000 0.0139
## 60 0.4828 nan 0.1000 0.0070
## 80 0.3946 nan 0.1000 0.0065
## 100 0.3296 nan 0.1000 0.0020
## 120 0.2800 nan 0.1000 0.0020
## 140 0.2378 nan 0.1000 0.0026
## 150 0.2179 nan 0.1000 0.0028
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.2419
## 2 1.4546 nan 0.1000 0.1629
## 3 1.3522 nan 0.1000 0.1243
## 4 1.2721 nan 0.1000 0.1025
## 5 1.2083 nan 0.1000 0.0869
## 6 1.1535 nan 0.1000 0.0871
## 7 1.0983 nan 0.1000 0.0896
## 8 1.0443 nan 0.1000 0.0686
## 9 1.0010 nan 0.1000 0.0586
## 10 0.9643 nan 0.1000 0.0643
## 20 0.6928 nan 0.1000 0.0337
## 40 0.4499 nan 0.1000 0.0148
## 60 0.3260 nan 0.1000 0.0086
## 80 0.2489 nan 0.1000 0.0031
## 100 0.1919 nan 0.1000 0.0019
## 120 0.1511 nan 0.1000 0.0014
## 140 0.1214 nan 0.1000 0.0010
## 150 0.1097 nan 0.1000 0.0008
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.2375
## 2 1.4587 nan 0.1000 0.1613
## 3 1.3574 nan 0.1000 0.1341
## 4 1.2733 nan 0.1000 0.1057
## 5 1.2075 nan 0.1000 0.0880
## 6 1.1526 nan 0.1000 0.0908
## 7 1.0968 nan 0.1000 0.0727
## 8 1.0514 nan 0.1000 0.0755
## 9 1.0046 nan 0.1000 0.0549
## 10 0.9703 nan 0.1000 0.0707
## 20 0.6905 nan 0.1000 0.0289
## 40 0.4493 nan 0.1000 0.0181
## 60 0.3288 nan 0.1000 0.0044
## 80 0.2507 nan 0.1000 0.0057
## 100 0.1934 nan 0.1000 0.0044
## 120 0.1540 nan 0.1000 0.0012
## 140 0.1249 nan 0.1000 0.0022
## 150 0.1115 nan 0.1000 0.0023
model_gbm
## Stochastic Gradient Boosting
##
## 15699 samples
## 53 predictor
## 5 classes: 'A', 'B', 'C', 'D', 'E'
##
## Pre-processing: num_window nearest neighbor imputation (53),
## centered (53), scaled (53)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 14128, 14131, 14128, 14130, 14130, 14130, ...
## Resampling results across tuning parameters:
##
## interaction.depth n.trees Accuracy Kappa Accuracy SD
## 1 50 0.7572414 0.6921248 0.010781740
## 1 100 0.8310718 0.7861098 0.008251060
## 1 150 0.8710122 0.8367512 0.007516531
## 2 50 0.8888440 0.8592505 0.007208724
## 2 100 0.9420982 0.9267382 0.006847320
## 2 150 0.9650940 0.9558321 0.004713744
## 3 50 0.9343269 0.9168797 0.005743863
## 3 100 0.9721639 0.9647846 0.002881328
## 3 150 0.9875147 0.9842065 0.001883892
## Kappa SD
## 0.013570491
## 0.010341014
## 0.009492510
## 0.009070638
## 0.008664593
## 0.005963822
## 0.007272041
## 0.003639594
## 0.002381420
##
## Tuning parameter 'shrinkage' was held constant at a value of 0.1
##
## Tuning parameter 'n.minobsinnode' was held constant at a value of 10
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were n.trees = 150,
## interaction.depth = 3, shrinkage = 0.1 and n.minobsinnode = 10.
This model shows that it used ten fold cross validation to resample. It’s difficult to predict the accuracy, but we will test it on our test set and determine accuracy.
We test the model with the following code and compare our predicted outcomes with the actual outcomes using a confusion matrix.
predictions_gbm <- predict(model_gbm, test)
confusionMatrix(predictions_gbm, test$classe)
## Confusion Matrix and Statistics
##
## Reference
## Prediction A B C D E
## A 1114 5 0 0 0
## B 2 742 1 3 2
## C 0 12 681 5 2
## D 0 0 1 634 6
## E 0 0 1 1 711
##
## Overall Statistics
##
## Accuracy : 0.9895
## 95% CI : (0.9858, 0.9925)
## No Information Rate : 0.2845
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9868
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: A Class: B Class: C Class: D Class: E
## Sensitivity 0.9982 0.9776 0.9956 0.9860 0.9861
## Specificity 0.9982 0.9975 0.9941 0.9979 0.9994
## Pos Pred Value 0.9955 0.9893 0.9729 0.9891 0.9972
## Neg Pred Value 0.9993 0.9946 0.9991 0.9973 0.9969
## Prevalence 0.2845 0.1935 0.1744 0.1639 0.1838
## Detection Rate 0.2840 0.1891 0.1736 0.1616 0.1812
## Detection Prevalence 0.2852 0.1912 0.1784 0.1634 0.1817
## Balanced Accuracy 0.9982 0.9875 0.9949 0.9919 0.9928
This Confusion Matrix shows that the GBM model predicted with 99.34% accuracy. According to the 95% confidence interval, this model will predict with between 99.03% and 99.57% accuracy about 95% of the time.
The GBM and the Random Forrest models both are about 99% accurate. We use both to predict the answers to the Courera quiz. They yield the same answers. When used to answer the questions to the quiz we received 100%. These models are both accurate predictors for the classe variable. Although they are highly accurate, they are difficult to interpret. They both use very sophisticated algorithms to build their models.
predictions_test_rf <- predict(model_rf, test_test)
predictions_test_gbm <- predict(model_gbm, test_test)
data.frame("Random Forrest Predictions" = predictions_test_rf,
"GBM Predictions" = predictions_test_gbm)
## Random.Forrest.Predictions GBM.Predictions
## 1 B B
## 2 A A
## 3 B B
## 4 A A
## 5 A A
## 6 E E
## 7 D D
## 8 B B
## 9 A A
## 10 A A
## 11 B B
## 12 C C
## 13 B B
## 14 A A
## 15 E E
## 16 E E
## 17 A A
## 18 B B
## 19 B B
## 20 B B