Weightlifting Data Analysis

Introduction

Research abounds about whether people exercise or not. However, rare is the study about how well people exercise. Thankfully one study from Germany did such research.

In this study, participants were asked to perform one set of 10 repetitions of the Unilateral Dumbbell Biceps Curl in five different fashions: Class A: Exactly according to the specification Class B: throwing the elbows to the front Class C: lifting the dumbbell only halfway Class D: lowering the dumbbell only halfway Class E: throwing the hips to the front

Class A corresponds to the specified execution of the exercise, while the other 4 classes correspond to common mistakes.

Participants were supervised by an experienced weight lifter to make sure the execution complied to the manner they were supposed to simulate. The exercises were performed by six male participants aged between 20-28 years, with little weight lifting experience. We made sure that all participants could easily simulate the mistakes in a safe and controlled manner by using a relatively light dumbbell (1.25kg).

Our task is to perform some exploratory analysis and build a machine learning model to predict how well any given exercise will be performed.

Getting Started

First let’s load the packages that we plan to use. Tidyverse will allow us some convenient data munging tools. Corrplot will help with Exploratory Data Analysis, and Caret and RandomForest assist with modeling.

library(tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag():    dplyr, stats
library(corrplot)
## corrplot 0.84 loaded
library(caret)
## Loading required package: lattice
## 
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
## 
##     lift
library(randomForest)
## randomForest 4.6-12
## Type rfNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'randomForest'
## The following object is masked from 'package:dplyr':
## 
##     combine
## The following object is masked from 'package:ggplot2':
## 
##     margin

Then we have to load the data. After a cursory look at the headers available, we’ll take a subset of the columns with only the most meaningful variables and factors.

rawtraining <- read.csv("pml-training.csv", header=TRUE)
rawtesting <- read.csv("pml-testing.csv", header=TRUE, stringsAsFactors = FALSE)
head(rawtraining)
##   X user_name raw_timestamp_part_1 raw_timestamp_part_2   cvtd_timestamp
## 1 1  carlitos           1323084231               788290 05/12/2011 11:23
## 2 2  carlitos           1323084231               808298 05/12/2011 11:23
## 3 3  carlitos           1323084231               820366 05/12/2011 11:23
## 4 4  carlitos           1323084232               120339 05/12/2011 11:23
## 5 5  carlitos           1323084232               196328 05/12/2011 11:23
## 6 6  carlitos           1323084232               304277 05/12/2011 11:23
##   new_window num_window roll_belt pitch_belt yaw_belt total_accel_belt
## 1         no         11      1.41       8.07    -94.4                3
## 2         no         11      1.41       8.07    -94.4                3
## 3         no         11      1.42       8.07    -94.4                3
## 4         no         12      1.48       8.05    -94.4                3
## 5         no         12      1.48       8.07    -94.4                3
## 6         no         12      1.45       8.06    -94.4                3
##   kurtosis_roll_belt kurtosis_picth_belt kurtosis_yaw_belt
## 1                                                         
## 2                                                         
## 3                                                         
## 4                                                         
## 5                                                         
## 6                                                         
##   skewness_roll_belt skewness_roll_belt.1 skewness_yaw_belt max_roll_belt
## 1                                                                      NA
## 2                                                                      NA
## 3                                                                      NA
## 4                                                                      NA
## 5                                                                      NA
## 6                                                                      NA
##   max_picth_belt max_yaw_belt min_roll_belt min_pitch_belt min_yaw_belt
## 1             NA                         NA             NA             
## 2             NA                         NA             NA             
## 3             NA                         NA             NA             
## 4             NA                         NA             NA             
## 5             NA                         NA             NA             
## 6             NA                         NA             NA             
##   amplitude_roll_belt amplitude_pitch_belt amplitude_yaw_belt
## 1                  NA                   NA                   
## 2                  NA                   NA                   
## 3                  NA                   NA                   
## 4                  NA                   NA                   
## 5                  NA                   NA                   
## 6                  NA                   NA                   
##   var_total_accel_belt avg_roll_belt stddev_roll_belt var_roll_belt
## 1                   NA            NA               NA            NA
## 2                   NA            NA               NA            NA
## 3                   NA            NA               NA            NA
## 4                   NA            NA               NA            NA
## 5                   NA            NA               NA            NA
## 6                   NA            NA               NA            NA
##   avg_pitch_belt stddev_pitch_belt var_pitch_belt avg_yaw_belt
## 1             NA                NA             NA           NA
## 2             NA                NA             NA           NA
## 3             NA                NA             NA           NA
## 4             NA                NA             NA           NA
## 5             NA                NA             NA           NA
## 6             NA                NA             NA           NA
##   stddev_yaw_belt var_yaw_belt gyros_belt_x gyros_belt_y gyros_belt_z
## 1              NA           NA         0.00         0.00        -0.02
## 2              NA           NA         0.02         0.00        -0.02
## 3              NA           NA         0.00         0.00        -0.02
## 4              NA           NA         0.02         0.00        -0.03
## 5              NA           NA         0.02         0.02        -0.02
## 6              NA           NA         0.02         0.00        -0.02
##   accel_belt_x accel_belt_y accel_belt_z magnet_belt_x magnet_belt_y
## 1          -21            4           22            -3           599
## 2          -22            4           22            -7           608
## 3          -20            5           23            -2           600
## 4          -22            3           21            -6           604
## 5          -21            2           24            -6           600
## 6          -21            4           21             0           603
##   magnet_belt_z roll_arm pitch_arm yaw_arm total_accel_arm var_accel_arm
## 1          -313     -128      22.5    -161              34            NA
## 2          -311     -128      22.5    -161              34            NA
## 3          -305     -128      22.5    -161              34            NA
## 4          -310     -128      22.1    -161              34            NA
## 5          -302     -128      22.1    -161              34            NA
## 6          -312     -128      22.0    -161              34            NA
##   avg_roll_arm stddev_roll_arm var_roll_arm avg_pitch_arm stddev_pitch_arm
## 1           NA              NA           NA            NA               NA
## 2           NA              NA           NA            NA               NA
## 3           NA              NA           NA            NA               NA
## 4           NA              NA           NA            NA               NA
## 5           NA              NA           NA            NA               NA
## 6           NA              NA           NA            NA               NA
##   var_pitch_arm avg_yaw_arm stddev_yaw_arm var_yaw_arm gyros_arm_x
## 1            NA          NA             NA          NA        0.00
## 2            NA          NA             NA          NA        0.02
## 3            NA          NA             NA          NA        0.02
## 4            NA          NA             NA          NA        0.02
## 5            NA          NA             NA          NA        0.00
## 6            NA          NA             NA          NA        0.02
##   gyros_arm_y gyros_arm_z accel_arm_x accel_arm_y accel_arm_z magnet_arm_x
## 1        0.00       -0.02        -288         109        -123         -368
## 2       -0.02       -0.02        -290         110        -125         -369
## 3       -0.02       -0.02        -289         110        -126         -368
## 4       -0.03        0.02        -289         111        -123         -372
## 5       -0.03        0.00        -289         111        -123         -374
## 6       -0.03        0.00        -289         111        -122         -369
##   magnet_arm_y magnet_arm_z kurtosis_roll_arm kurtosis_picth_arm
## 1          337          516                                     
## 2          337          513                                     
## 3          344          513                                     
## 4          344          512                                     
## 5          337          506                                     
## 6          342          513                                     
##   kurtosis_yaw_arm skewness_roll_arm skewness_pitch_arm skewness_yaw_arm
## 1                                                                       
## 2                                                                       
## 3                                                                       
## 4                                                                       
## 5                                                                       
## 6                                                                       
##   max_roll_arm max_picth_arm max_yaw_arm min_roll_arm min_pitch_arm
## 1           NA            NA          NA           NA            NA
## 2           NA            NA          NA           NA            NA
## 3           NA            NA          NA           NA            NA
## 4           NA            NA          NA           NA            NA
## 5           NA            NA          NA           NA            NA
## 6           NA            NA          NA           NA            NA
##   min_yaw_arm amplitude_roll_arm amplitude_pitch_arm amplitude_yaw_arm
## 1          NA                 NA                  NA                NA
## 2          NA                 NA                  NA                NA
## 3          NA                 NA                  NA                NA
## 4          NA                 NA                  NA                NA
## 5          NA                 NA                  NA                NA
## 6          NA                 NA                  NA                NA
##   roll_dumbbell pitch_dumbbell yaw_dumbbell kurtosis_roll_dumbbell
## 1      13.05217      -70.49400    -84.87394                       
## 2      13.13074      -70.63751    -84.71065                       
## 3      12.85075      -70.27812    -85.14078                       
## 4      13.43120      -70.39379    -84.87363                       
## 5      13.37872      -70.42856    -84.85306                       
## 6      13.38246      -70.81759    -84.46500                       
##   kurtosis_picth_dumbbell kurtosis_yaw_dumbbell skewness_roll_dumbbell
## 1                                                                     
## 2                                                                     
## 3                                                                     
## 4                                                                     
## 5                                                                     
## 6                                                                     
##   skewness_pitch_dumbbell skewness_yaw_dumbbell max_roll_dumbbell
## 1                                                              NA
## 2                                                              NA
## 3                                                              NA
## 4                                                              NA
## 5                                                              NA
## 6                                                              NA
##   max_picth_dumbbell max_yaw_dumbbell min_roll_dumbbell min_pitch_dumbbell
## 1                 NA                                 NA                 NA
## 2                 NA                                 NA                 NA
## 3                 NA                                 NA                 NA
## 4                 NA                                 NA                 NA
## 5                 NA                                 NA                 NA
## 6                 NA                                 NA                 NA
##   min_yaw_dumbbell amplitude_roll_dumbbell amplitude_pitch_dumbbell
## 1                                       NA                       NA
## 2                                       NA                       NA
## 3                                       NA                       NA
## 4                                       NA                       NA
## 5                                       NA                       NA
## 6                                       NA                       NA
##   amplitude_yaw_dumbbell total_accel_dumbbell var_accel_dumbbell
## 1                                          37                 NA
## 2                                          37                 NA
## 3                                          37                 NA
## 4                                          37                 NA
## 5                                          37                 NA
## 6                                          37                 NA
##   avg_roll_dumbbell stddev_roll_dumbbell var_roll_dumbbell
## 1                NA                   NA                NA
## 2                NA                   NA                NA
## 3                NA                   NA                NA
## 4                NA                   NA                NA
## 5                NA                   NA                NA
## 6                NA                   NA                NA
##   avg_pitch_dumbbell stddev_pitch_dumbbell var_pitch_dumbbell
## 1                 NA                    NA                 NA
## 2                 NA                    NA                 NA
## 3                 NA                    NA                 NA
## 4                 NA                    NA                 NA
## 5                 NA                    NA                 NA
## 6                 NA                    NA                 NA
##   avg_yaw_dumbbell stddev_yaw_dumbbell var_yaw_dumbbell gyros_dumbbell_x
## 1               NA                  NA               NA                0
## 2               NA                  NA               NA                0
## 3               NA                  NA               NA                0
## 4               NA                  NA               NA                0
## 5               NA                  NA               NA                0
## 6               NA                  NA               NA                0
##   gyros_dumbbell_y gyros_dumbbell_z accel_dumbbell_x accel_dumbbell_y
## 1            -0.02             0.00             -234               47
## 2            -0.02             0.00             -233               47
## 3            -0.02             0.00             -232               46
## 4            -0.02            -0.02             -232               48
## 5            -0.02             0.00             -233               48
## 6            -0.02             0.00             -234               48
##   accel_dumbbell_z magnet_dumbbell_x magnet_dumbbell_y magnet_dumbbell_z
## 1             -271              -559               293               -65
## 2             -269              -555               296               -64
## 3             -270              -561               298               -63
## 4             -269              -552               303               -60
## 5             -270              -554               292               -68
## 6             -269              -558               294               -66
##   roll_forearm pitch_forearm yaw_forearm kurtosis_roll_forearm
## 1         28.4         -63.9        -153                      
## 2         28.3         -63.9        -153                      
## 3         28.3         -63.9        -152                      
## 4         28.1         -63.9        -152                      
## 5         28.0         -63.9        -152                      
## 6         27.9         -63.9        -152                      
##   kurtosis_picth_forearm kurtosis_yaw_forearm skewness_roll_forearm
## 1                                                                  
## 2                                                                  
## 3                                                                  
## 4                                                                  
## 5                                                                  
## 6                                                                  
##   skewness_pitch_forearm skewness_yaw_forearm max_roll_forearm
## 1                                                           NA
## 2                                                           NA
## 3                                                           NA
## 4                                                           NA
## 5                                                           NA
## 6                                                           NA
##   max_picth_forearm max_yaw_forearm min_roll_forearm min_pitch_forearm
## 1                NA                               NA                NA
## 2                NA                               NA                NA
## 3                NA                               NA                NA
## 4                NA                               NA                NA
## 5                NA                               NA                NA
## 6                NA                               NA                NA
##   min_yaw_forearm amplitude_roll_forearm amplitude_pitch_forearm
## 1                                     NA                      NA
## 2                                     NA                      NA
## 3                                     NA                      NA
## 4                                     NA                      NA
## 5                                     NA                      NA
## 6                                     NA                      NA
##   amplitude_yaw_forearm total_accel_forearm var_accel_forearm
## 1                                        36                NA
## 2                                        36                NA
## 3                                        36                NA
## 4                                        36                NA
## 5                                        36                NA
## 6                                        36                NA
##   avg_roll_forearm stddev_roll_forearm var_roll_forearm avg_pitch_forearm
## 1               NA                  NA               NA                NA
## 2               NA                  NA               NA                NA
## 3               NA                  NA               NA                NA
## 4               NA                  NA               NA                NA
## 5               NA                  NA               NA                NA
## 6               NA                  NA               NA                NA
##   stddev_pitch_forearm var_pitch_forearm avg_yaw_forearm
## 1                   NA                NA              NA
## 2                   NA                NA              NA
## 3                   NA                NA              NA
## 4                   NA                NA              NA
## 5                   NA                NA              NA
## 6                   NA                NA              NA
##   stddev_yaw_forearm var_yaw_forearm gyros_forearm_x gyros_forearm_y
## 1                 NA              NA            0.03            0.00
## 2                 NA              NA            0.02            0.00
## 3                 NA              NA            0.03           -0.02
## 4                 NA              NA            0.02           -0.02
## 5                 NA              NA            0.02            0.00
## 6                 NA              NA            0.02           -0.02
##   gyros_forearm_z accel_forearm_x accel_forearm_y accel_forearm_z
## 1           -0.02             192             203            -215
## 2           -0.02             192             203            -216
## 3            0.00             196             204            -213
## 4            0.00             189             206            -214
## 5           -0.02             189             206            -214
## 6           -0.03             193             203            -215
##   magnet_forearm_x magnet_forearm_y magnet_forearm_z classe
## 1              -17              654              476      A
## 2              -18              661              473      A
## 3              -18              658              469      A
## 4              -16              658              469      A
## 5              -17              655              473      A
## 6               -9              660              478      A
trainingsubset <- rawtraining %>% select(classe, user_name, num_window,
               roll_belt,pitch_belt,yaw_belt,
               gyros_belt_x,gyros_belt_y,gyros_belt_z,
               accel_belt_x,accel_belt_y,accel_belt_z,
               magnet_belt_x,magnet_belt_y,magnet_belt_z,
               roll_arm,pitch_arm,yaw_arm,
               gyros_arm_x,gyros_arm_y,gyros_arm_z,
               accel_arm_x,accel_arm_y,accel_arm_z,
               magnet_arm_x,magnet_arm_y,magnet_arm_z,
               gyros_dumbbell_x,gyros_dumbbell_y,gyros_dumbbell_z,
               accel_dumbbell_x,accel_dumbbell_y,accel_dumbbell_z,
               magnet_dumbbell_x,magnet_dumbbell_y,magnet_dumbbell_z,
               roll_forearm,pitch_forearm,yaw_forearm,
               gyros_forearm_x,gyros_forearm_y,gyros_forearm_z,
               accel_forearm_x,accel_forearm_y,accel_forearm_z,
               magnet_forearm_x,magnet_forearm_y,magnet_forearm_z)

testingsubset <- rawtesting %>% select(user_name, num_window,
               roll_belt,pitch_belt,yaw_belt,
               gyros_belt_x,gyros_belt_y,gyros_belt_z,
               accel_belt_x,accel_belt_y,accel_belt_z,
               magnet_belt_x,magnet_belt_y,magnet_belt_z,
               roll_arm,pitch_arm,yaw_arm,
               gyros_arm_x,gyros_arm_y,gyros_arm_z,
               accel_arm_x,accel_arm_y,accel_arm_z,
               magnet_arm_x,magnet_arm_y,magnet_arm_z,
               gyros_dumbbell_x,gyros_dumbbell_y,gyros_dumbbell_z,
               accel_dumbbell_x,accel_dumbbell_y,accel_dumbbell_z,
               magnet_dumbbell_x,magnet_dumbbell_y,magnet_dumbbell_z,
               roll_forearm,pitch_forearm,yaw_forearm,
               gyros_forearm_x,gyros_forearm_y,gyros_forearm_z,
               accel_forearm_x,accel_forearm_y,accel_forearm_z,
               magnet_forearm_x,magnet_forearm_y,magnet_forearm_z)

Handling NAs

In this section, we’ll clean things up a bit by creating a logical function for if a value is NA. Then we use the simple map function to remove NAs. We convert the data set to a tibble to prevent any formatting issues downstream.

NAasy <- function(x, y){
  x[is.na(x)] <- y
  x # to return vector after replacement
}

trainingdata <- map(trainingsubset, NAasy, 0)
## Warning in `[<-.factor`(`*tmp*`, is.na(x), value = 0): invalid factor
## level, NA generated

## Warning in `[<-.factor`(`*tmp*`, is.na(x), value = 0): invalid factor
## level, NA generated
trainingdata <- as.tibble(trainingdata)

testingdata <- map(testingsubset, NAasy, 0)
testingdata <- as.tibble(testingdata)

Exploration

Since the data is provided to us pre-partitioned between training and test sets, we are ready for a little exploratory data analysis.

ggplot(trainingdata) + 
        geom_bar(mapping = aes(x = classe, fill = user_name)) + 
        labs(title=paste("Barbell Activity per Class by User"))

Next it may be useful to see the spread of types of classes by number of windows. We can tell some classes are not spread evenly among user. Jeremy and Pedro for instance have a tendency to move their hips to the front while adelmo and charles have performed a seemingly large number of exercises to spec.

levels(trainingdata$classe) <- c("Exercise to Specs","Elbow to Front","Dumbbell Lift Halfway","Dumbbell Lower Halfway","Hips to Front")

ggplot(trainingdata, aes(x = user_name, y=num_window, color = classe)) + 
        geom_point() +
        labs(title=paste("Number of Windows Capturing Bar Bell Activity by User and Class"))

Lastly it helps guide research to view correlations among the variables we subset. Roll and Yaw Belt for example seem to have a relationship. Notably Acceleration of Belts Y and Z seem to have a negative correlation. More research could be done as to why.

cormax <- cor(trainingdata[, 4:48])
corrplot(cormax[1:10,1:10])

qplot(roll_belt, accel_belt_y, data=trainingdata, color=classe, main='Plot of roll_belt by accel_belt_y per classe')

Modeling

After completing some exploratory data analysis, it’s time to train our model. We’ll use a random forest model to maintain interpretability while allowing for the high correlation classes.

model <- train(classe ~., data=trainingdata, method="rf")
model$finalModel
## 
## Call:
##  randomForest(x = x, y = y, mtry = param$mtry) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 26
## 
##         OOB estimate of  error rate: 0.13%
## Confusion matrix:
##                        Exercise to Specs Elbow to Front
## Exercise to Specs                   5578              1
## Elbow to Front                         3           3791
## Dumbbell Lift Halfway                  0              5
## Dumbbell Lower Halfway                 0              0
## Hips to Front                          0              0
##                        Dumbbell Lift Halfway Dumbbell Lower Halfway
## Exercise to Specs                          0                      0
## Elbow to Front                             2                      1
## Dumbbell Lift Halfway                   3416                      1
## Dumbbell Lower Halfway                     9                   3206
## Hips to Front                              0                      2
##                        Hips to Front  class.error
## Exercise to Specs                  1 0.0003584229
## Elbow to Front                     0 0.0015801949
## Dumbbell Lift Halfway              0 0.0017533606
## Dumbbell Lower Halfway             1 0.0031094527
## Hips to Front                   3605 0.0005544774
plot(model)

Testing

Finally we must run our test data through the model. Here is where we finally determine how well an individual performs an exercise giving a sample of barbell lifts.

testingpred <- predict(model, newdata = testingdata)

testingpred
##  [1] Elbow to Front         Exercise to Specs      Elbow to Front        
##  [4] Exercise to Specs      Exercise to Specs      Hips to Front         
##  [7] Dumbbell Lower Halfway Elbow to Front         Exercise to Specs     
## [10] Exercise to Specs      Elbow to Front         Dumbbell Lift Halfway 
## [13] Elbow to Front         Exercise to Specs      Hips to Front         
## [16] Hips to Front          Exercise to Specs      Elbow to Front        
## [19] Elbow to Front         Elbow to Front        
## 5 Levels: Exercise to Specs Elbow to Front ... Hips to Front

Results demonstrate a significant model. While more research must be done to determine if we can improve predictiveness, we can safely predict when an exercise will be performed to specification or if not, how it will be out of spec to a .14% error rate. For example, an ensemble method may be more predictive, but we choose not to sacrifice interpretability. It will be much easier to explain a random forest model than to explain other methods to our peers and laymen wanting to build on this research. You might say that the model does the “lifting” for us.

References
Velloso, E.; Bulling, A.; Gellersen, H.; Ugulino, W.; Fuks, H. Qualitative Activity Recognition of Weight Lifting Exercises. Proceedings of 4th International Conference in Cooperation with SIGCHI (Augmented Human '13) . Stuttgart, Germany: ACM SIGCHI, 2013.