This human activity recognition research has traditionally focused on discriminating between different activities, i.e. to predict “which” activity was performed at a specific point in time (like with the Daily Living Activities dataset).
The approach we propose for the Weight Lifting Exercises dataset is to investigate “how (well)” an activity was performed by the wearer. The “how (well)” investigation has only received little attention so far, even though it potentially provides useful information for a large variety of applications,such as sports training.
In this work (see the paper) we first define quality of execution and investigate three aspects that pertain to qualitative activity recognition: the problem of specifying correct execution, the automatic and robust detection of execution mistakes, and how to provide feedback on the quality of execution to the user. We tried out an on-body sensing approach (dataset here), but also an “ambient sensing approach” (by using Microsoft Kinect - dataset still unavailable).
Six young health participants were asked to perform one set of 10 repetitions of the Unilateral Dumbbell Biceps Curl in five different fashions: exactly according to the specification (Class A), throwing the elbows to the front (Class B), lifting the dumbbell only halfway (Class C), lowering the dumbbell only halfway (Class D) and throwing the hips to the front (Class E).
The purpose of this exercise is to construct a machine learning model that can effectively and accurately predict which class of workout (workout mistake) is performed by applying the metrics from gyroscopes, accelerometers, magnetic gauges, and belt/pully systems.
The data for this can be found here:
http://groupware.les.inf.puc-rio.br/static/WLE/WearableComputing_weight_lifting_exercises_biceps_curl_variations.csv
Ugulino, W.; Cardador, D.; Vega, K.; Velloso, E.; Milidiu, R.; Fuks, H. Wearable Computing: Accelerometers’ Data Classification of Body Postures and Movements. Proceedings of 21st Brazilian Symposium on Artificial Intelligence. Advances in Artificial Intelligence - SBIA 2012. In: Lecture Notes in Computer Science., pp. 52-61. Curitiba, PR: Springer Berlin / Heidelberg, 2012. ISBN 978-3-642-34458-9. DOI: 10.1007/978-3-642-34459-6_6.
R version 3.3.3 (2017-03-06) - Another Canoe - svn 72310
| Package | Version |
|---|---|
| caret | 6.0-78 |
| doSNOW | 1.0.16 |
| dplyr | 0.7.4 |
| foreach | 1.4.4 |
| gplots | 3.0.1 |
| iterators | 1.0.9 |
| snow | 0.4-2 |
There are many variables with NA values. They serve no purpose and will not be selected.
There are many observations, and thus a 50/50 training and evaluation dataset can be created.
The model will be for classification, thus a random forest is used. Because there are more than 2 classes, logistic regression cannot be used.
There is also a lot of multicollinearity, and that will make any linear prediction difficult and time consuming, and interpretability will be lost if Principal Components are generated and used.
Cross Validation (5-fold) was used to train the model. The model was then tested against the evaluation data.
Due to the intense processor utilisation, and the time it takes to train the model, parallel processing will be used.
library(dplyr)
library(caret)
library(doSNOW)
library(gplots)
train.import <- read.csv("C:/R/Datasets/pml-training.csv", stringsAsFactors = T)
train2 <- train.import %>%
select(roll_belt, pitch_belt, yaw_belt,
gyros_belt_x, gyros_belt_y, gyros_belt_z,
accel_belt_x, accel_belt_y, accel_belt_z,
magnet_belt_x, magnet_belt_y, magnet_belt_z,
roll_arm, pitch_arm, yaw_arm,
gyros_arm_x, gyros_arm_y, gyros_arm_z,
accel_arm_x, accel_arm_y, accel_arm_z,
magnet_arm_x, magnet_arm_y, magnet_arm_z,
roll_dumbbell, pitch_dumbbell, yaw_dumbbell,
gyros_dumbbell_x, gyros_dumbbell_y, gyros_dumbbell_z,
accel_dumbbell_x, accel_dumbbell_y, accel_dumbbell_z,
magnet_dumbbell_x, magnet_dumbbell_y, magnet_dumbbell_z,
roll_forearm, pitch_forearm, yaw_forearm,
gyros_forearm_x, gyros_forearm_y, gyros_forearm_z,
accel_forearm_x, accel_forearm_y, accel_forearm_z,
magnet_forearm_x, magnet_forearm_y, magnet_forearm_z,
classe)
train2 <- train2[-5373,]
set.seed(12345)
train.set <- createDataPartition(y = train2$classe, p = 0.5, list = FALSE)
training <- train2[train.set, ]
evaluation <- train2[-train.set, ]
diag(my.cor.var) <- 0
my.cpu.cl <- makeCluster(2, type="SOCK")
registerDoSNOW(my.cpu.cl)
set.seed(12345)
tr.control <- trainControl(method = "cv", number = 5, verboseIter = T)
rf.mod <- train(classe ~., data = training, method = "rf",
trControl = tr.control,
tuneLength = 5)
## Aggregating results
## Selecting tuning parameters
## Fitting mtry = 13 on full training set
The cluster has to be de-registered as well to clean things up nicely.
stopCluster(my.cpu.cl)
registerDoSEQ()
View the model diagnostics - look at Kappa and Accuracy values.
Kappa and Accuracy values are high.
rf.mod
## Random Forest
##
## 9812 samples
## 48 predictor
## 5 classes: 'A', 'B', 'C', 'D', 'E'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 7849, 7850, 7850, 7850, 7849
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 2 0.9868522 0.9833668
## 13 0.9900121 0.9873646
## 25 0.9874643 0.9841409
## 36 0.9861393 0.9824646
## 48 0.9860377 0.9823359
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 13.
Predict on evaluation dataset:
rf.pred <- predict(rf.mod, newdata = evaluation)
rf.error <- data.frame(Evaluation = evaluation$classe, Predicted = rf.pred)
rf.error$test <- rf.error$Evaluation==rf.error$Predicted
table(rf.error$test)[2]/nrow(evaluation)
## TRUE
## 0.9916403
The model appears to be 99.16 % accurate in predicting values in the evaluation dataset.
rf.cmat <- confusionMatrix(evaluation$classe, rf.pred)
rf.ooser <- (1 - rf.cmat$overall[["Accuracy"]]) * 100
The out of sample error rate is 0.84%
The following confusion matrix shows the accuracy in prediction vs. the evaluation dataset:
conf.matrix <- as.matrix(table(rf.error$Predicted, rf.error$Evaluation))
conf.matrix
##
## A B C D E
## A 2784 14 0 0 0
## B 5 1883 7 0 0
## C 0 1 1695 28 7
## D 0 0 9 1578 9
## E 0 0 0 2 1787
hm.cols <- colorRampPalette(c("lightyellow","maroon"))(256)
heatmap.2(conf.matrix, Colv = NA, Rowv = NA,
xlab = "Predicted",
ylab = "Evaluation",
main = "Confusion Matrix Heatmap",
trace = "none",
density.info = "histogram",
cellnote = conf.matrix,
notecex = 1.5,
notecol = "black",
col = hm.cols)