This study is conducted on the data about personal activity by means of wearable devices such as Jawbone Up, Nike FuelBand, and Fitbit. The data is borrowed from the research project - ‘Human Activity Recognition’ conducted by ‘groupware.les’.

The Weight Lifting Data was collected on weight lifting enthusiasts who performed various exercises based on pre-defined specifications and certain variations. The data collected was then labelled into following five categories:

Pre Process

The first few columns of the data set (shown below) records the user name, time stamp and hence been excluded from analysis.

## [1] "user_name"            "raw_timestamp_part_1" "raw_timestamp_part_2"
## [4] "cvtd_timestamp"       "new_window"

Then the columns with NAs were removed along with the features that had zero variance. Prediction model is built on the resultant features (shown below).

## Loading required package: lattice
## Loading required package: ggplot2
##  [1] "roll_belt"            "pitch_belt"           "yaw_belt"            
##  [4] "total_accel_belt"     "gyros_belt_x"         "gyros_belt_y"        
##  [7] "gyros_belt_z"         "accel_belt_x"         "accel_belt_y"        
## [10] "accel_belt_z"         "magnet_belt_x"        "magnet_belt_y"       
## [13] "magnet_belt_z"        "roll_arm"             "pitch_arm"           
## [16] "yaw_arm"              "total_accel_arm"      "gyros_arm_x"         
## [19] "gyros_arm_y"          "gyros_arm_z"          "accel_arm_x"         
## [22] "accel_arm_y"          "accel_arm_z"          "magnet_arm_x"        
## [25] "magnet_arm_y"         "magnet_arm_z"         "roll_dumbbell"       
## [28] "pitch_dumbbell"       "yaw_dumbbell"         "total_accel_dumbbell"
## [31] "gyros_dumbbell_x"     "gyros_dumbbell_y"     "gyros_dumbbell_z"    
## [34] "accel_dumbbell_x"     "accel_dumbbell_y"     "accel_dumbbell_z"    
## [37] "magnet_dumbbell_x"    "magnet_dumbbell_y"    "magnet_dumbbell_z"   
## [40] "roll_forearm"         "pitch_forearm"        "yaw_forearm"         
## [43] "total_accel_forearm"  "gyros_forearm_x"      "gyros_forearm_y"     
## [46] "gyros_forearm_z"      "accel_forearm_x"      "accel_forearm_y"     
## [49] "accel_forearm_z"      "magnet_forearm_x"     "magnet_forearm_y"    
## [52] "magnet_forearm_z"     "classe"

Data Analysis

The data set contains approximately 20k instances, which a quite a big number. Therefore, 50% split is chosen for seperating estimation of out of sample error. A logistic regression model was fit on train data with bootstrap sampling. The density plot of the model’s top four important features for predicting ‘Class A’ density plot is shown below. plot of chunk unnamed-chunk-3

As can be seen in the above plot the distribution of features have multiple peaks and there exists a significant ovelap of various classes. With such a distribution logistic regression model would not give better results as verified by the results of the model on remaining 50% test data (see below confusion matrix), the accuracy of the model is found to be 70%.

plot of chunk unnamed-chunk-4

Therefore, random forests is used to fit a model with same 50% split training data and 500 random trees. Though the model takes more time to build but the accuracy improves to 99%. See below confusion matrix. plot of chunk unnamed-chunk-5

Summary

The expected out of sample error is expected to be 30% for logistic regression model while 1% for random forest model. Given the significant improvement of result ‘random forests’, it is concluded that random forests suits best for predicting the quality of weight lifting exercise.

Reference

Ugulino, W.; Cardador, D.; Vega, K.; Velloso, E.; Milidiu, R.; Fuks, H. Wearable Computing: Accelerometers’ Data Classification of Body Postures and Movements. Proceedings of 21st Brazilian Symposium on Artificial Intelligence. Advances in Artificial Intelligence - SBIA 2012. In: Lecture Notes in Computer Science. , pp. 52-61. Curitiba, PR: Springer Berlin / Heidelberg, 2012. ISBN 978-3-642-34458-9. DOI: 10.1007/978-3-642-34459-6_6.

Read more: http://groupware.les.inf.puc-rio.br/har#ixzz35KzA6iO1