Classifying Weight Lifting Exercises

The Weight Lifting Dataset used in this study contains motion metrics of 6 individuals performing a specific weight lifting exercise (barbell lifts) in 5 different ways: 1 correct way according to specification and 4 incorrect.

An initial exploration of the data will narrow down the feature space and test predictors’ distributions for normality in order to assess the convenience of using PCA preprocessing and Discriminant Analysis modeling.

Finally, we’ll predict the 5 different ways this exercise can be done using 2 off-the-shelf classification models:

Multinomial regression (generalization of logistic regression for multiclass problems)
- Regularization parameter: decay (Weight Decay)
Random Forest
- Regularization parameter: mtry (Number of Randomly Selected Predictors at each node)

Due to performance constraints, 5-fold cross-validation will be used across the models to tune the regularization parameters and to estimate out-of-sample error, to then choose the model with highest accuracy.

Dimensionality reduction exploration

The original study made use of a sliding window that aggregated subsequent sensor readings and generated 96 new features for every window out of the total 52 raw features. Only the raw sensor readings will be used in our modeling, eliminating the ones generated by the sliding window technique. timestamp & window features will be discarded. To avoid further complexity, username information will also be discarded.

This initial feature selection leaves us with 53 variables: 52 continuous predictors and 1 categorical outcome. These numerical predictors and categorical outcome hint us at using a Discriminant Analysis approach to generate a model, unless LDA’s assumption about normality of predictors’ distributions for each class can’t be met. That is \(P(\vec{x}|y)\) is Gaussian for every \(x\) in predictors and every \(y\) in type of outcome.

The null hypothesis for the Shapiro-Wilk test is that the variable is normally distributed.
Predictors for each class that fail to reject the null hyp (p-value >= 0.05) are 0.
We can safely reject the hypothesis of normality and, therefore, the use of Discriminant Analysis models.

There’s another consequence derived from this non-normality that we just assessed: PCA preprocessing might miss higher order statistics beyond variance that are not taken into account by PCA, as it best captures variance when the data are normally distributed. Our 2 models will be applied off-the-shelf with basic centering and scaling of the 52 raw sensor features.

Model selection

As introduced earlier, the 2 models will be trained and cross-validated with 5 data folds. Different parameter settings will be tuned through this cross-validation and a final model decision will be made based on accuracy values:

fitControl <- trainControl(method = "cv", number = 5, allowParallel = TRUE)

set.seed(23)
fit1 <- train(classe~., method="multinom",data=train_set, preProcess=c("center","scale"), trControl = fitControl)

set.seed(7)
fit2 <- train(classe~., method="rf"      ,data=train_set, preProcess=c("center","scale"), trControl = fitControl)

Multinomial Regression results:

decay	Accuracy	Kappa	AccuracySD	KappaSD
0e+00	0.7320365	0.6603409	0.0103164	0.0130881
1e-04	0.7309152	0.6589210	0.0103954	0.0132286
1e-01	0.7316285	0.6597812	0.0080569	0.0103398

Random Forest results:

mtry	Accuracy	Kappa	AccuracySD	KappaSD
2	0.9941903	0.9926504	0.0012132	0.0015351
27	0.9944959	0.9930376	0.0012302	0.0015560
52	0.9888901	0.9859447	0.0036771	0.0046541

The accuracy of Random Forest modeling stands out. Even under performance constrains, we were able to reach a 99% accuracy for the best tuned model. Further visualizations of the results can be found in Appendix 1.

Final model and test prediction

Our final model performs the following steps on the test set:

test_original  <- read.csv(test_link, na.strings = c("NA","#DIV/0!")) # "#DIV/0!" included as NA value
test_set <- test_original[,colSums(is.na(train_original))==0] # Eliminate features containing NA values
test_set <- test_set[,-(1:7)] # Eliminate username and window derived features

predict(fit2, newdata=test_set[,1:52]) # Predict on the 52 raw sensor features

##  [1] B A B A A E D B A A B C B A E E A B B B
## Levels: A B C D E

Appendix 1

Parameter tuning

## [1] "Javier Prado"