Using devices such as Jawbone Up, Nike FuelBand, and Fitbit it is now possible to inexpensively collect a large amount of data about personal activity. These types of devices are becoming incresingly popular among individuals who take regular mesurements on themselves to improve their health, or to find behavioral patters. One thing that people regularly do is quantify how much of a particular activity they do, but they rarely quantify how well they do it.
In this report we exporlore using classification methods in combination with data from accelerometers on the belt, forearm, arm and dumbell of 6 participents performing barbell lifts to predict if they are performing the lift correctly, or if incorrectly, what type of error they are making.
The data used is the Weight Lifting Excercise Dataset (Velloso et al. 2013). The dataset consists of sensor data from accelerometers on the belt, forearm, arm and dumbell of 6 male participents aged between 20-28 years, with little weight lifting experience. The participants were tasked with performing one set of 10 repetitions of the Unilateral Dumbell Biceps Curl using five different techniques: according to the specification (Class A), throwing the elbows in front (Class B), lifting the dumbell only halfway (Class C), lowering the dumbell only halfway (Class D), and throwing the hips to the front (Class E).
Class A corresponds to performing the excercise according to the specification, while the other 4 classes correspond to common mistakes. The participants were supervised by an experianced weight lifter to ensure the excecution complied with the manner they were tasked with simulating. The weight of the dumbell was 1.25kg to ensure that the participants could perform the excercises in a safe, controlled manner.
The dataset consists of 19622 observations across the 6 subjects. Each observation is a single time slice during the 10 rep excercise. We can see the distribution of time-slices per participant, per class.
## Lift Class
## Subject A B C D E
## adelmo 1165 776 750 515 686
## carlitos 834 690 493 486 609
## charles 899 745 539 642 711
## eurico 865 592 489 582 542
## jeremy 1177 489 652 522 562
## pedro 640 505 499 469 497
In order to use the data for learning it was preprocessed by removing all columns that contain at least 1 NA value and any columns including date stamps are ignored. The processed dataset contains 52 covariates to use for training a classifier.
In order to provide an accurate estimate of the performance of the classification algorithms on unseen data we’ll take a 2-step approach. First, we’ll split the data into atraining and test set using a 70%/30% split. Second, we’ll perform feature selection, model tuning, and model selection using 10-fold cross validation.
We trained is a CART decision tree using the rpart library. The only tuning parameter used in this algorithm is the complexity parameter which is a real number on the interval (0, 1]. The complexity parameter is used to set a threshold to how many splits are in the decision tree. We look at 10 complexity parameters on an exponential scale.
This validation chart shows the trend that a low-complexity parameter leads to improved prediction accuracy. The optimal complexity is selected to be 1.6935 × 10-5 yielding a cross-validation accuracy of 0.9223
A random forest can be thought of as an averaging of many decision trees that are bagged both along the data samples and along the covariates. A random forest generates a pool of decision trees (a forest) from multiple bootstrapped samples of the training data, the forest then performs a randomized feature selection proceedure where each split in the decision tree is trained on a subset of the covariates in the training data. When making predictions the random forest takes the average prediction from every tree.
We use the rf package for the random forest model. The tuning parameters in this model are mtry which controls the number of covariates to select in each tree, and ntree which controls the number of trees to generate.
## Loading required package: randomForest
## randomForest 4.6-7
## Type rfNews() to see new features/changes/bug fixes.
This figure shows the cross validation of each RF model for the parameters mtry and ntree. Clearly setting mtry to 52 gives the best RF model on cross validation but as shown in the table below, there is still a decision to be made on the optimal value of ntree.
| ntrees | Accuracy | AccuracySD |
|---|---|---|
| 10 | 0.9137 | 0.007850 |
| 30 | 0.9491 | 0.006321 |
| 50 | 0.9551 | 0.007419 |
| 70 | 0.9557 | 0.007262 |
| 90 | 0.9563 | 0.005515 |
| 110 | 0.9554 | 0.004656 |
There is a decision to be made on the optimal number of trees. On one hand, 90 gives the best accuracy while 110 has the smallest deviation between cross validation results. For the final decision we will pick the model that scores highest in Accuracy - AccuracySD. In this case the optimal model has ntree set to 90
The table below summarizes the accuracy of the chosen Decision Tree and Random Forest models on the full training set, cross validation, and withheld test set.
| Model | Train.Accuracy | CV.Accuracy | Test.Accuracy |
|---|---|---|---|
| Decision Tree | 0.9590 | 0.9223 | 0.9273 |
| Random Forest | 0.9856 | 0.9563 | 0.9662 |
We can see that the test results are very close to the expected results from cross validation. In fact, the test results are an improvement over the cross validation results in both cases. This is a good sign that given more data, it may be possible to improve the prediction accuracy of both models.
We’ve seen that both Random Forests and Decision Trees can both achieve an accuracy over 90% on the Weight Lifting Excercise Dataset with very minimal preprocessing. For future research it may be worth investigating the prediction accuracy of these methods as you vote over a time series of datapoints instead of a single time slice. This may help furthur improve accuracy. Futhermore, it may also be worthwhile to build a set of distinct learning algorithms and make a majority vote decision. Given the structure of the data, one would only need to create 6 independent classifiers to guaruntee a majority vote, and it can be proved that voting is guarunteed to increase the lower bound on expected accuracy with respect to the weakest learner assuming the weakest learner has greater than 50% accuracy.
Velloso, E.; Bulling, A.; Gellersen, H.; Ugulino, W.; Fuks, H. (2013). Qualitative Activity Recognition of Weight Lifting Exercises. Proceedings of 4th International Conference in Cooperation with SIGCHI (Augmented Human ’13) . Stuttgart, Germany: ACM SIGCHI, 2013.