Introduction

For this project, I am to determine the manner of how participants conducted exersize from data obtained by devices that track exercise. In the past, exersize has tracked the amount of exersize, but not how the exersize is done. This project is to find out if we can predict the manner in which the exersize is done from data gathered from these devices. The data is from sensors on people as they do exersize. Sometimes the participants do the exersize correctly and sometimes not. Finally, I predict the manner of exersize for previously unseen data. Below are dimensions of the dataset. As you can see, there are many features and many of them contain NAs. It was necessary to remove features with very little information in them. Also, many features described the subject who underwent the exersize leading to highly correlated attributes. It is unnecessary to use these attributes because they can be reduced to the one variable identifying the subject of the study. I split the data into a training and test set to perform predictions and assess the results.

library(caret)

## Loading required package: lattice

## Warning: package 'lattice' was built under R version 3.4.4

## Loading required package: ggplot2

library(rattle)

## Warning: package 'rattle' was built under R version 3.4.4

## Rattle: A free graphical interface for data science with R.
## Version 5.1.0 Copyright (c) 2006-2017 Togaware Pty Ltd.
## Type 'rattle()' to shake, rattle, and roll your data.

library(plyr)
library(randomForest)

## Warning: package 'randomForest' was built under R version 3.4.4

## randomForest 4.6-14

## Type rfNews() to see new features/changes/bug fixes.

## 
## Attaching package: 'randomForest'

## The following object is masked from 'package:rattle':
## 
##     importance

## The following object is masked from 'package:ggplot2':
## 
##     margin

library(nnet)
library(AppliedPredictiveModeling)

## Warning: package 'AppliedPredictiveModeling' was built under R version
## 3.4.4

library(e1071)

## Warning: package 'e1071' was built under R version 3.4.4

library(ElemStatLearn)

## Warning: package 'ElemStatLearn' was built under R version 3.4.4

## 
## Attaching package: 'ElemStatLearn'

## The following object is masked from 'package:plyr':
## 
##     ozone

df<-read.csv(file="pml-training.csv", header=TRUE)
cat("Dimensions of entire training dataset: ",dim(df))

## Dimensions of entire training dataset:  19622 160

train.dat <- which(colSums(is.na(df) |df=="")>0.9*dim(df)[1]) 
train.data <- df[,-train.dat]

train.data<-train.data[,-c(1:7)]
cat("Dimensions of cleaned training dataset: ",dim(train.data))

## Dimensions of cleaned training dataset:  19622 53

set.seed(123)

inTrain = createDataPartition(train.data$classe, p = 3/4)[[1]]

training = train.data[ inTrain,]
dim(training)

## [1] 14718    53

testing=train.data[-inTrain,]
dim(testing)

## [1] 4904   53

x_train <- training[, -ncol(training)]
x_test <- testing[, -ncol(testing)]
y_train <- training$classe
y_test<-testing$classe

Lasso Regression

Because there is a sparce dataframe of features, I chose lasso regression to proving a penalty to features that had little effect on the outome of a prediction. As you can see below, many features are zero. I set up the model to use the best parameters for prediction using the grid and trcontrol parameters, then display the coefficients used for the lasso regression. I used the model with the best lambda value, or size of the penalty that effects the number of features used in the model, and used five fold cross validation on the training set. Lasso regression is usually used for numerical classes, but I chose Lasso Regression to see how a linear model would perform on classification problems.

control<-trainControl(method="cv", number=5, classProbs = TRUE)

tuneGrid=expand.grid(.alpha=1,.lambda=10^seq(10, -2, length = 100))

lasso.mod <- train(classe~.,data=training, method="glmnet",trControl=control, metric="Accuracy", tuneGrid=tuneGrid,  family="multinomial")

bestlam <- lasso.mod$lambda.min

lasso.pred <- predict(lasso.mod, s=bestlam, newdata =testing, type="raw")

lasso.con<-confusionMatrix( lasso.pred, testing$classe)

head(varImp(lasso.mod), 53)

## $importance
##                                 A          B           C          D
## roll_belt            0.000000e+00 0.00000000  0.00000000 0.37493523
## pitch_belt           0.000000e+00 0.00000000  0.00000000 0.00000000
## yaw_belt             0.000000e+00 0.00000000  0.00000000 0.00000000
## total_accel_belt     0.000000e+00 0.00000000  0.00000000 0.00000000
## gyros_belt_x         1.000000e+02 0.00000000  0.00000000 0.00000000
## gyros_belt_y         5.531670e+01 0.00000000  0.00000000 0.00000000
## gyros_belt_z         0.000000e+00 0.00000000 18.11713037 0.00000000
## accel_belt_x         0.000000e+00 0.00000000  0.00000000 0.00000000
## accel_belt_y         0.000000e+00 0.71465368  0.00000000 0.00000000
## accel_belt_z         4.869178e-01 0.00000000  0.00000000 0.00000000
## magnet_belt_x        1.095818e+00 0.00000000  0.00000000 0.00000000
## magnet_belt_y        0.000000e+00 0.00000000  0.00000000 1.21849220
## magnet_belt_z        5.092630e-01 0.00000000  0.00000000 0.00000000
## roll_arm             0.000000e+00 0.20640733  0.00000000 0.00000000
## pitch_arm            6.004254e-01 0.17778836  0.00000000 0.00000000
## yaw_arm              1.372930e-01 0.12450372  0.00000000 0.01880261
## total_accel_arm      0.000000e+00 0.00000000  0.00000000 0.00000000
## gyros_arm_x          0.000000e+00 0.00000000  0.00000000 0.00000000
## gyros_arm_y          4.329295e+00 0.00000000  0.00000000 0.00000000
## gyros_arm_z          0.000000e+00 0.00000000 16.60737140 0.00000000
## accel_arm_x          0.000000e+00 0.15985478  0.00000000 0.45821549
## accel_arm_y          0.000000e+00 0.00000000  0.00000000 0.00000000
## accel_arm_z          6.173637e-01 0.00000000  0.00000000 0.00000000
## magnet_arm_x         3.943701e-03 0.00000000  0.00000000 0.00000000
## magnet_arm_y         0.000000e+00 0.24704908  0.00000000 0.00000000
## magnet_arm_z         1.553366e-01 0.44046435  0.00000000 0.02970192
## roll_dumbbell        0.000000e+00 0.00000000  1.13946927 1.17853108
## pitch_dumbbell       0.000000e+00 0.00000000  0.31278419 0.00000000
## yaw_dumbbell         6.225701e-01 0.00000000  0.54257746 0.33655658
## total_accel_dumbbell 0.000000e+00 4.11087984  0.00000000 2.75798910
## gyros_dumbbell_x     0.000000e+00 0.00000000  0.00000000 0.00000000
## gyros_dumbbell_y     0.000000e+00 0.00000000  0.00000000 0.00000000
## gyros_dumbbell_z     0.000000e+00 0.00000000  0.00000000 0.00000000
## accel_dumbbell_x     0.000000e+00 1.77708447  0.00000000 0.00000000
## accel_dumbbell_y     0.000000e+00 0.29804218  0.47744523 0.00000000
## accel_dumbbell_z     4.679891e-02 0.01633127  0.00000000 0.00000000
## magnet_dumbbell_x    0.000000e+00 0.00000000  0.00000000 0.00000000
## magnet_dumbbell_y    0.000000e+00 0.11971844  0.00000000 0.00000000
## magnet_dumbbell_z    1.433707e+00 0.57070194  0.17463770 0.00000000
## roll_forearm         2.200678e-01 0.00000000  0.03584440 0.00000000
## pitch_forearm        4.554249e+00 0.00000000  0.00000000 2.43038110
## yaw_forearm          0.000000e+00 0.00000000  0.00000000 0.00000000
## total_accel_forearm  5.618260e+00 0.00000000  0.00000000 0.00000000
## gyros_forearm_x      0.000000e+00 0.00000000  0.00000000 0.00000000
## gyros_forearm_y      0.000000e+00 0.00000000  0.00000000 0.00000000
## gyros_forearm_z      0.000000e+00 0.00000000  0.00000000 0.00000000
## accel_forearm_x      0.000000e+00 0.00000000  0.00000000 0.47654328
## accel_forearm_y      0.000000e+00 0.05140739  0.00000000 0.07034039
## accel_forearm_z      4.861418e-01 0.04319445  0.00000000 0.00000000
## magnet_forearm_x     0.000000e+00 0.00000000  0.00000000 0.00000000
## magnet_forearm_y     0.000000e+00 0.00000000  0.04550607 0.00000000
## magnet_forearm_z     2.837598e-02 0.00000000  0.16621399 0.00000000
##                                E
## roll_belt             0.99442837
## pitch_belt            1.68097635
## yaw_belt              0.00000000
## total_accel_belt      5.81365375
## gyros_belt_x         33.89016841
## gyros_belt_y          0.00000000
## gyros_belt_z          0.00000000
## accel_belt_x          0.00000000
## accel_belt_y          1.73329879
## accel_belt_z          0.00000000
## magnet_belt_x         0.00000000
## magnet_belt_y         4.30866576
## magnet_belt_z         0.00000000
## roll_arm              0.00000000
## pitch_arm             0.34008175
## yaw_arm               0.00000000
## total_accel_arm       0.00000000
## gyros_arm_x           4.86463719
## gyros_arm_y           0.00000000
## gyros_arm_z           0.00000000
## accel_arm_x           0.00000000
## accel_arm_y           0.00000000
## accel_arm_z           0.00000000
## magnet_arm_x          0.00000000
## magnet_arm_y          0.40023753
## magnet_arm_z          0.00000000
## roll_dumbbell         0.00000000
## pitch_dumbbell        0.00000000
## yaw_dumbbell          0.00000000
## total_accel_dumbbell  7.87607563
## gyros_dumbbell_x      0.00000000
## gyros_dumbbell_y      0.00000000
## gyros_dumbbell_z      0.00000000
## accel_dumbbell_x      0.17351241
## accel_dumbbell_y      0.00000000
## accel_dumbbell_z      0.16804920
## magnet_dumbbell_x     0.00000000
## magnet_dumbbell_y     0.20283209
## magnet_dumbbell_z     0.00000000
## roll_forearm          0.07666691
## pitch_forearm         0.00000000
## yaw_forearm           0.00000000
## total_accel_forearm   2.18606816
## gyros_forearm_x       0.00000000
## gyros_forearm_y       0.00000000
## gyros_forearm_z       0.00000000
## accel_forearm_x       0.00000000
## accel_forearm_y       0.00000000
## accel_forearm_z       0.54511833
## magnet_forearm_x      0.00000000
## magnet_forearm_y      0.05092659
## magnet_forearm_z      0.00000000
## 
## $model
## [1] "glmnet"
## 
## $calledFrom
## [1] "varImp"

Principal Component Analysis

Because there are so many features, only a small subset of them could be used for predictions. If all of the features were used, there may be overfitting. Below is the investigation into principal components. When the pca parameter is used to fit the machine learning models below, I chose the number of principal components to equal a little less than %98 of the total variance. This turns out to be about 30 principal components instead of the 52 features contained in the dataset as you can see from the analysis below.

pr_comp<-prcomp(x_train, scale=TRUE)

st_dev<-pr_comp$sdev

p_var<-st_dev^2

prop_ex<-p_var/sum(p_var)

summary(pr_comp)

## Importance of components:
##                           PC1   PC2     PC3    PC4     PC5     PC6     PC7
## Standard deviation     2.8940 2.839 2.15826 2.0712 1.92040 1.73879 1.49195
## Proportion of Variance 0.1611 0.155 0.08958 0.0825 0.07092 0.05814 0.04281
## Cumulative Proportion  0.1611 0.316 0.40560 0.4881 0.55902 0.61716 0.65997
##                            PC8     PC9    PC10    PC11    PC12    PC13
## Standard deviation     1.44674 1.31154 1.22948 1.17376 1.06498 0.99335
## Proportion of Variance 0.04025 0.03308 0.02907 0.02649 0.02181 0.01898
## Cumulative Proportion  0.70022 0.73330 0.76237 0.78886 0.81067 0.82965
##                          PC14    PC15    PC16    PC17    PC18    PC19
## Standard deviation     0.9429 0.90748 0.88704 0.82819 0.75526 0.72243
## Proportion of Variance 0.0171 0.01584 0.01513 0.01319 0.01097 0.01004
## Cumulative Proportion  0.8468 0.86258 0.87771 0.89090 0.90187 0.91191
##                           PC20    PC21    PC22    PC23    PC24    PC25
## Standard deviation     0.68825 0.64289 0.62711 0.60947 0.57735 0.54723
## Proportion of Variance 0.00911 0.00795 0.00756 0.00714 0.00641 0.00576
## Cumulative Proportion  0.92102 0.92897 0.93653 0.94368 0.95009 0.95584
##                           PC26    PC27    PC28    PC29    PC30    PC31
## Standard deviation     0.53362 0.50213 0.48136 0.44849 0.41347 0.39012
## Proportion of Variance 0.00548 0.00485 0.00446 0.00387 0.00329 0.00293
## Cumulative Proportion  0.96132 0.96617 0.97062 0.97449 0.97778 0.98071
##                           PC32    PC33    PC34    PC35    PC36    PC37
## Standard deviation     0.36248 0.34510 0.33401 0.30285 0.28111 0.25327
## Proportion of Variance 0.00253 0.00229 0.00215 0.00176 0.00152 0.00123
## Cumulative Proportion  0.98323 0.98552 0.98767 0.98943 0.99095 0.99219
##                           PC38    PC39    PC40    PC41    PC42   PC43
## Standard deviation     0.23593 0.23317 0.20002 0.19187 0.18478 0.1772
## Proportion of Variance 0.00107 0.00105 0.00077 0.00071 0.00066 0.0006
## Cumulative Proportion  0.99326 0.99430 0.99507 0.99578 0.99644 0.9970
##                           PC44    PC45    PC46    PC47    PC48    PC49
## Standard deviation     0.16770 0.16539 0.16313 0.14621 0.14204 0.11230
## Proportion of Variance 0.00054 0.00053 0.00051 0.00041 0.00039 0.00024
## Cumulative Proportion  0.99758 0.99811 0.99862 0.99903 0.99942 0.99966
##                           PC50    PC51    PC52
## Standard deviation     0.09834 0.07666 0.04577
## Proportion of Variance 0.00019 0.00011 0.00004
## Cumulative Proportion  0.99985 0.99996 1.00000

barplot(100*prop_ex[1:30], las=2, xlab='Components 1-30', ylab='% Variance Explained')

Training Models

I trained several models using tuning paramters to get the best results. I used five fold cross validation, principal components and selected standard parameters for neural networks. that account for 98% of the variance. I chose a variety of different algorithms for this task based on their applicability to the dataset, including the lasso regression discussed above and including a stacking algorithm that combined all of the algorithms into an ensemble. These include, support vector machines which work best with complex classification boundaries; Neural Networks that produces it’s own learning alrogithm; and Random Forests which approaches learning with many different shallower trees rather than one tree that is fully grown.

Fitting Support Vector Machines

trcontrol<-trainControl(method="repeatedcv", number=5, repeats=3, classProbs=TRUE, summaryFunction=multiClassSummary)

svm.fit<-train(classe~., data=training, method="svmRadial", metric="Accuracy", preProcess = "pca", n.comp=30, trcontrol=trcontrol)
svm.pred<-predict(svm.fit, testing)
svm.con<-confusionMatrix(svm.pred, testing$classe)

Fitting Random Forests

trcontrol<-trainControl(method="repeatedcv", number=5, repeats=3, classProbs=TRUE, summaryFunction=multiClassSummary)

grid.tune<-expand.grid(.mtry=c(1:6))

rf_mod <- train(classe~.,data=training, method = "rf", metric="Accuracy", tuneGrid=grid.tune, preProcess = "pca", n.comp=30, trcontrol=trcontrol)
rf_pred<-predict(rf_mod, testing)
rf.con<-confusionMatrix(rf_pred, testing$classe)

Fitting Neural Networks

trcontrol<-trainControl(method="repeatedcv", number=5, repeats=3, classProbs=TRUE, summaryFunction=multiClassSummary)

grid <- expand.grid(.decay = c(.05, .01), .size = c(13,15,17))


nn.mod<-train(classe~., data=training, method="nnet", preProcess="pca", n.comp=30, maxit=1000, tuneGrid=grid, trcontrol=trcontrol)

nn.pred<-predict(nn.mod, testing, type="raw")
nn.con<-confusionMatrix(as.factor(nn.pred), testing$classe)

Fitting Stacking Algorithm

trcontrol<-trainControl(method="repeatedcv", number=5, repeats=3, classProbs=TRUE, summaryFunction=multiClassSummary)
predDF <- data.frame(svm.pred, rf_pred, nn.pred, classe=testing$classe)
modelStack <- train(classe ~ ., data = predDF, method = "gbm")
combPred <- predict(modelStack, predDF)
stack.con<-confusionMatrix(combPred, predDF$classe)

Visualizations

Below are plots of some of the models above. They show that the Neural networks work best with an optimum level of decay of .05 and hidden units at 17.

plot(nn.mod, main="Neural Networks")

Support Vector Machines performs best when there is a wider decision boundary.

plot(svm.fit, main="Radial Support Vector Machines")

Random forest work best with less than 6 trees.

plot(rf_mod, main="Random Forests")

The boosting algorithm used in stacking works best with fewer weak learners and smaller trees.

plot(modelStack, main="Boosting Stacked Model")

Accuracy

Below is the accuracy of each of the models. Notice that Neural Networks, Support Vector Machines, Random forests and the stacked model did well. In fact Random Forests and the stacked model got over 97% accuracy, with the stacked model getting a slightly higher accuracy. Lasso performed the worst, most likely because it is usually used for numerical data rather than classification. I decided to exlude Lasso Regression on the stacked algorithm because of its poor performance on the data.

cat("Lasso Regression Accuracy: ", lasso.con$overall[1])

## Lasso Regression Accuracy:  0.5921697

cat("Suppor Vector Machine Accuracy: ", svm.con$overall[1])

## Suppor Vector Machine Accuracy:  0.9068108

cat("Random Forest Accuracy: ", rf.con$overall[1])

## Random Forest Accuracy:  0.9743067

cat("Neural Network Accuracy: ", nn.con$overall[1])

## Neural Network Accuracy:  0.8762235

cat("Stached Algorithm Accuracy", stack.con$overall[1])

## Stached Algorithm Accuracy 0.9743067

Prediction on Test Set

First, I load the test set into R and perform the same cleaning I performed for the training set.

dat<-read.csv(file="pml-testing.csv", header=TRUE)


test.dats <- which(colSums(is.na(dat) |dat=="")>0.9*dim(dat)[1]) 

test.datas <- dat[,-test.dats]


test.datas<-test.datas[,-c(1:7)]
cat("Dimensions of Cleaned Test Set: ",dim(test.datas))

## Dimensions of Cleaned Test Set:  20 53

Final Predictions and Conclusionn

As you can see, there are a wide variety of predictions from the different models. Random forests and the stacking algorithm predictions are the same and because they recieved the highest accuracy, I will use them for predictions. Due to the amount of time it took to fit the models, I would suggest carefully choosing a few promising algorithms that would perform well on the given data. The time it took to render and evaluate the models was prohibitive. Random forests are emsemble methods that use several many smaller trees to classify the data and it performed as well as ensebling all of the algorithms together. It was not necessary to perform the boosting algorithm for stacking because the accuracy was not improved significantly by stacking.

svm.test<-predict(svm.fit, newdata=test.datas)

rf.test<-predict(rf_mod, newdata=test.datas)

nn.test<-predict(nn.mod, newdata=test.datas)


pred.stack <- data.frame(svm.pred=svm.test, rf_pred=rf.test, nn.pred=nn.test)


com.predict <- predict(modelStack, pred.stack)


final.data<-cbind(pred.stack, com.predict)


names(final.data)<-c("svm", "RandomForest", "NeuralNetworks", "StackedAlgorithm")

final.data

##    svm RandomForest NeuralNetworks StackedAlgorithm
## 1    B            B              B                B
## 2    A            A              A                A
## 3    C            C              C                C
## 4    A            A              A                A
## 5    A            A              A                A
## 6    E            E              E                E
## 7    D            D              D                D
## 8    B            B              B                B
## 9    A            A              A                A
## 10   A            A              A                A
## 11   A            A              B                A
## 12   C            C              C                C
## 13   B            B              E                B
## 14   A            A              A                A
## 15   E            E              E                E
## 16   E            E              B                E
## 17   A            A              A                A
## 18   B            B              B                B
## 19   B            B              B                B
## 20   B            B              B                B

Machine_Learing_Project_Coursera

Tim Norton

August 6, 2018