The following report fits a machine learning model to the Weight Lifting Exercise Dataset 1. A dataset that has been created to study how well it is executed the exercise of Dumbbell Biceps Curl. This exersice is performed by 6 different subjects that repeat the exercise in 5 different ways, the correct one and 4 incorrect ones, more information at the reference 2.
The paper, Best practices for machine learning in Human movement 3, reviews a total of 129 papers that fit machine learning models to study human movement, most of the studies involved datasets collected from accelerometers.
Summary of the meta analysis of machine learning models for human movement classification
According to this paper, the most common classification model to address human movement was the Support Vector Machine. Therefore, this model will be fitted and tuned and the results will be compared to the ones produced by the originators of the dataset. They decided to fit a rainforest model.
Additionally, the meta analysis paper suggests the following practices:
df<-read.csv("pml-training.csv")
library(caret)
# Remove near zero variables.
nzv<-nearZeroVar(df,saveMetrics = TRUE)
df<-df[,which(nzv$nzv==FALSE)]
# Remove NA values
na_count <-sapply(df, function(y) sum(length(which(is.na(y)))))
#table(na_count) indicates that missing values are pretty consistent throughout the data, 67 variables miss 19216 values out of 19622. These variables will be removed.
rdf<-df[,which(na_count==0)]
# Remove the X and the general time variable. These variables, do not provide relevant information to the model and they are highly correlated to the output as the data was recorded in order.
rdf<-rdf[,c(-1,-5)]
prepro<-preProcess(rdf,method=c("center", "scale", "pca"), thresh = 0.99)
training<-predict(prepro,rdf)
The data pre-process has consisted of:
A quick hyper-parameter tuning with 3 different kernels for the support vector machine has been undertaken, the best performing one is the polynomial kernel. The cross validation undertaken is 3-folds, to keep the model building simple.
Finally, the predetermined paremeter tuning has elected the following parameters: degree = 3, scale = 0.1 and C = 1.
When performing cross validation, the accuracy provided by the model is the averaged one of the 3-folds. The model selected has a pretty good out-of-sample error, so it is expected to perform well with the test data.
ctrl <- trainControl(method = "cv", number = 3, verboseIter = TRUE)
modellinear<-train(classe~.,data=training, method="svmLinear", trControl=ctrl)
modelpoly<-train(classe~.,data=training, method="svmPoly", trControl=ctrl)
modelradial<-train(classe~.,data=training, method="svmRadial", trControl=ctrl)
The tuning process of the polynomial model:
modelpoly
## Support Vector Machines with Polynomial Kernel
##
## 19622 samples
## 39 predictor
## 5 classes: 'A', 'B', 'C', 'D', 'E'
##
## No pre-processing
## Resampling: Cross-Validated (3 fold)
## Summary of sample sizes: 13081, 13081, 13082
## Resampling results across tuning parameters:
##
## degree scale C Accuracy Kappa
## 1 0.001 0.25 0.5802162 0.4544048
## 1 0.001 0.50 0.6336767 0.5313720
## 1 0.001 1.00 0.6528389 0.5572044
## 1 0.010 0.25 0.6720521 0.5821410
## 1 0.010 0.50 0.6806138 0.5930609
## 1 0.010 1.00 0.6897361 0.6048222
## 1 0.100 0.25 0.6975333 0.6149026
## 1 0.100 0.50 0.7013554 0.6199117
## 1 0.100 1.00 0.7070633 0.6272600
## 2 0.001 0.25 0.6408116 0.5405979
## 2 0.001 0.50 0.6633884 0.5707413
## 2 0.001 1.00 0.6832640 0.5965506
## 2 0.010 0.25 0.8357457 0.7916625
## 2 0.010 0.50 0.8718276 0.8375975
## 2 0.010 1.00 0.9011824 0.8748059
## 2 0.100 0.25 0.9743657 0.9675572
## 2 0.100 0.50 0.9798696 0.9745286
## 2 0.100 1.00 0.9839467 0.9796896
## 3 0.001 0.25 0.6635413 0.5707229
## 3 0.001 0.50 0.6868824 0.6010764
## 3 0.001 1.00 0.7106822 0.6316587
## 3 0.010 0.25 0.8966466 0.8690820
## 3 0.010 0.50 0.9228418 0.9022733
## 3 0.010 1.00 0.9475590 0.9335866
## 3 0.100 0.25 0.9895526 0.9867844
## 3 0.100 0.50 0.9902661 0.9876869
## 3 0.100 1.00 0.9907247 0.9882672
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were degree = 3, scale = 0.1 and C = 1.
The confusion Matrix is:
confusionMatrix(predict(modelpoly,training),training$classe)
## Confusion Matrix and Statistics
##
## Reference
## Prediction A B C D E
## A 5580 1 0 0 0
## B 0 3796 1 0 0
## C 0 0 3419 6 0
## D 0 0 2 3206 0
## E 0 0 0 4 3607
##
## Overall Statistics
##
## Accuracy : 0.9993
## 95% CI : (0.9988, 0.9996)
## No Information Rate : 0.2844
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9991
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: A Class: B Class: C Class: D Class: E
## Sensitivity 1.0000 0.9997 0.9991 0.9969 1.0000
## Specificity 0.9999 0.9999 0.9996 0.9999 0.9998
## Pos Pred Value 0.9998 0.9997 0.9982 0.9994 0.9989
## Neg Pred Value 1.0000 0.9999 0.9998 0.9994 1.0000
## Prevalence 0.2844 0.1935 0.1744 0.1639 0.1838
## Detection Rate 0.2844 0.1935 0.1742 0.1634 0.1838
## Detection Prevalence 0.2844 0.1935 0.1745 0.1635 0.1840
## Balanced Accuracy 1.0000 0.9998 0.9994 0.9984 0.9999
test<- read.csv("pml-testing.csv")
testing<-predict(prepro, test)
R<-predict(modelradial, testing)
data.frame(Question=test$id, Solution=R)
data.frame(Question=test$problem_id, Solution=R)
## Question Solution
## 1 1 B
## 2 2 A
## 3 3 B
## 4 4 A
## 5 5 A
## 6 6 C
## 7 7 D
## 8 8 D
## 9 9 A
## 10 10 A
## 11 11 B
## 12 12 C
## 13 13 B
## 14 14 A
## 15 15 E
## 16 16 E
## 17 17 A
## 18 18 B
## 19 19 B
## 20 20 B
Support Machine Vector seems to perfom pretty good with little tuning. To improve accuracy, the following activities could be undertaken:
Velloso, E.; Bulling, A.; Gellersen, H.; Ugulino, W.; Fuks, H. Qualitative Activity Recognition of Weight Lifting Exercises. Proceedings of 4th International Conference in Cooperation with SIGCHI (Augmented Human ’13) . Stuttgart, Germany: ACM SIGCHI, 2013.↩
http://web.archive.org/web/20161224072740/http:/groupware.les.inf.puc-rio.br/har↩
E. Halilaj, A. Rajagopal, M. Fiterau, J.L. Hicks, T.J. Hastie, S.L. Delp, Machine Learning in Human Movement Biomechanics: Best Practices, Common Pitfalls, and New Opportunities, Journal of Biomechanics (2018), doi:https://doi.org/10.1016/j.jbiomech.2018.09.009↩