Introduction

Biometric data was collected for the “Qualitative Activity Recognition of Weight Lifting Exercises”, a human-factors research paper in cooperation with the ACM and SIGCHI. The training and test data sets were downloaded on 2018-05-04 16:46:42.

The goal of this research study is to develop a machine learning algorithm that can classify how well a dumbbell curl is being performed. The original study defines 5 classes of performance:

Class A: the correct performance

Class B: incorrectly throwing elbows in front

Class C: only lifting halfways up

Class D: only lowering halfways down

Class E: incorrectly throwing hips forward

All of the above will be classified using movement data. This was obtained using 4 inertial measurement units (IMU’s) placed on the hand (1), upper-arm (2), lower-lumbar (3), and dumbbell (4) of 20 male participants.

Each IMU features 3-axis acceleration, a gyroscope, and a magnetometer. Roll, pitch, and yaw were calculated using a sliding window technique. And features were extracted for each Euler angle such as mean, variance, standard deviation and so on.

Data Cleaning

The data was cleaned with the following steps.

1. Removal of time stamp and other variables related to data acquisition

2. Predictors removed with majority missing values

3. All predictor variables coerced to class numeric

4. Check for zero and near-zero variance predictors (remove if any)

5. Remove highly correlated predictors (R > .9)

corrplot(cor(train[,-46]), tl.col = "black", tl.cex = .75)

The plot below shows the predictor space that models will be fit to. This space has been cleaned for highly co linear variables.

Machine Learning Modeling

Non-parametric models do not parameterize, or make explicit assumptions, about the form of the function that it is trying to estimate. Decision trees exemplify this flexible approach to predictive modeling.

Parametric models assume a fundamental form to the pattern in data. The function is parameterized so that coefficients are now estimated. This more structured approach tends to perform better than non-parametric models when sample sizes are smaller; However, parametric models are more rigid and can suffer from bias, especially if the parameters don’t closely approximate the true function being estimated.

This report fits four non-parametric models (decision trees and DT variants) and one parametric model (linear discriminant analysis).

Non-Parametric Model Building

Regular Classification Tree

All predictor variables begin in the same group and the decision tree algorithm segments this “predictor space” into J distinct groups.

Groups are formed by selecting the predictor variable and the cut within this variable that results in two maximally homogeneous groups. Here we try to maximize the proportion of observations in the plurality “class” at each terminal node.

This predictor splitting continues by a process known as “recursive binary splitting” and ceases once the terminal groups contain too few observations or attain the desired level of purity (measured by classification error rate, Gini Index, and more).

#Regular Classification Tree
tree <- train(classe ~., data=train, method = "rpart", trControl = tc)

Bagged Classification Tree

Regular decision trees suffer from a high degree of variance. Their predictions are highly dependent on the particular data that they are trained on. One way to reduce this variance is through bootstrap aggregation (or “bagging”).

This method repeatedly samples from the training data and fits a decision tree to each sample. Each of these trees produces a prediction for any given observation. Classes are assigned to observations based on a majority vote of all the predictions. This averaging of predictions across different bootstrapped samples greatly reduces a decision tree’s inherent variance.

#Bagged Classification Tree
bag_tree <- train(classe ~., data=train, method = "treebag", trControl = tc)

Random Forest Classification Tree

Bagging tends to produce highly correlated trees. The trees tend to build themselves similarly on each bootstrapped sample.

Random forests utilize this bootstrap aggregation; however, they consider a random subset of predictors to split on at each node, whereas the above two trees consider all remaining predictors at each node. In this way, trees are not allowed to build in a similar manner each time. This works to decorrelate the decision trees which is more effective in reducing model variance.

#Random Forest Classification Tree
rf_tree <- train(classe ~., data=train, method = "rf", trControl = tc)

Boosted Classification Tree

Instead of fitting an entirely developed decision tree to the data and averaging across these, boosting can be imagined as the sequential growth of a tree by fitting multiple shrunken trees at every node in an effort to reduce the classification error. This is also known as “slow learning” with “weak learners”.

#Stochastic Gradient Boosting Tree Classification
boost_tree <- train(classe ~., data=train, method = "gbm", verbose = FALSE)

Parametric Model Building

Linear Discriminant Analysis

LDA considers the distribution of all predictors in each response class. The probability of examining X in a given response class J can be calculated based on the assumption that predictors exhibit a Gaussian distribution.

Applying Bayes Theorem to this we are able to flip the probability and determine the probability of (and therefore predict) a response class J given values of X. Essentially boundaries are drawn through the data with regions representing the class with the highest posterior probability.

lda <- train(classe ~., data=train, method = "lda", trControl = tc)

A Note On Cross-Validation

Each of the above models utilizes cross-validation to estimate test sample accuracy and to tune the non-parametric models.

For example, my Random Forest model took 500 bootstrap samples from 20% of the training data set (first-fold) and fit a decision tree to each one. The accuracy of this random forest model was assessed by making predictions on the remaining 80% of the training data. This was repeated on the other four folds, except caret tuned the “mtry” variable in each fold (the number of predictors assessed at each split of the decision tree). These 5 accuracies were compared to determine the optimal value for “mtry”, which turned out to be 23.

Prediction

The Random Forest model performs best with a classification accuracy of 99.5%. This is the model that we will use to make predictions on the test set.

Test.Subjects Class
Pedro B
Jeremy A
Jeremy B
Adelmo A
Eurico A
Jeremy E
Jeremy D
Jeremy B
Carlitos A
Charles A
Carlitos B
Jeremy C
Eurico B
Jeremy A
Jeremy E
Eurico E
Pedro A
Carlitos B
Pedro B
Eurico B

Conclusion

Movement data from weight lifters was used to predict how well the lifter performed an exercise. Did they use the proper technique?

4 non-parametric decision trees were fit along with one parametric linear discriminant analysis. The Random Forest model performed best with a Bagged Tree and Boosted Tree following closely (in terms of predictive power). The regular Decision Tree performed worse, exhibiting an accuracy of roughly 50%.

References

*Velloso, E.; Bulling, A.; Gellersen, H.; Ugulino, W.; Fuks, H. Qualitative Activity Recognition of Weight Lifting Exercises. Proceedings of 4th International Conference in Cooperation with SIGCHI (Augmented Human ’13) . Stuttgart, Germany: ACM SIGCHI, 2013.

*Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2014. An Introduction to Statistical Learning: With Applications in R. Springer Publishing Company, Incorporated.