6/29/2018

Overview

The goal of this analysis will be to develop a predictive model for exercise manner.

-We will generate a tool to quantify how well people exercise, based on data obtained from accelerometers on the belt, arm, forearm, and dumbell of six participants who performed exercises in correct and incorrect ways.

-We will critique this model from its results when using it to evaluate 20 different test cases.

## Warning: package 'caret' was built under R version 3.4.4
## Loading required package: lattice
## Loading required package: ggplot2
## Warning: package 'rpart.plot' was built under R version 3.4.4
## Warning: package 'randomForest' was built under R version 3.4.4
## randomForest 4.6-14
## Type rfNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'randomForest'
## The following object is masked from 'package:ggplot2':
## 
##     margin
## Loading required package: survival
## 
## Attaching package: 'survival'
## The following object is masked from 'package:caret':
## 
##     cluster
## Loading required package: splines
## Loading required package: parallel
## Loaded gbm 2.1.3

Training the Model

The data for this project come from this source: link.

We will use the data to train three models.

Methods

  1. Classification Tree
  2. Random Forest
  3. Gradient Boosting Model

Classification Tree

Random Forest

## Random Forest 
## 
## 13737 samples
##    51 predictor
##     5 classes: 'A', 'B', 'C', 'D', 'E' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 10990, 10989, 10989, 10989, 10991 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa    
##    2    0.9913375  0.9890414
##   26    0.9917741  0.9895939
##   51    0.9883523  0.9852640
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 26.

Gradient Boosting Model

## Stochastic Gradient Boosting 
## 
## 13737 samples
##    51 predictor
##     5 classes: 'A', 'B', 'C', 'D', 'E' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 10991, 10990, 10989, 10990, 10988 
## Resampling results across tuning parameters:
## 
##   interaction.depth  n.trees  Accuracy   Kappa    
##   1                   50      0.7334194  0.6619464
##   1                  100      0.8074534  0.7562767
##   1                  150      0.8403575  0.7979034
##   2                   50      0.8494573  0.8093067
##   2                  100      0.9054378  0.8803153
##   2                  150      0.9292420  0.9104648
##   3                   50      0.8931357  0.8646879
##   3                  100      0.9412536  0.9256585
##   3                  150      0.9593807  0.9486071
## 
## Tuning parameter 'shrinkage' was held constant at a value of 0.1
## 
## Tuning parameter 'n.minobsinnode' was held constant at a value of 10
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were n.trees = 150,
##  interaction.depth = 3, shrinkage = 0.1 and n.minobsinnode = 10.

The Models and the Confusion Matrix

Classification Tree

##           Reference
## Prediction    A    B    C    D    E
##          A 1504   26  108   29    7
##          B  464  394  241   40    0
##          C  489   33  403  101    0
##          D  411  174  122  257    0
##          E  256  203  264   47  312
##  Accuracy 
## 0.4876805

The Models and the Confusion Matrix

Random Forest

##           Reference
## Prediction    A    B    C    D    E
##          A 1669    4    0    0    1
##          B    9 1128    2    0    0
##          C    0    7 1017    2    0
##          D    0    0    8  956    0
##          E    0    0    3    4 1075
##  Accuracy 
## 0.9932031

The Models and the Confusion Matrix

Gradient Boosting Model

##           Reference
## Prediction    A    B    C    D    E
##          A 1650   17    6    1    0
##          B   28 1080   27    1    3
##          C    0   27  984   10    5
##          D    2    1   34  918    9
##          E    3    4   10   11 1054
##  Accuracy 
## 0.9661852

Conclusion

The greatest degree of accuracy was obtained with the random forest model. Now, we will employ it in our evaluation of the test data.

##  [1] B A B A A E D B A A B C B A E E A B B B
## Levels: A B C D E

The results returned are 100% accurate. They exactly match a list of knowns that were kept separate for testing purposes.