Grace Pehl, Ph.D.

Classification of Weightlifting Technique from Personal Activity Trackers

Objective

Build a machine learning algorithm to predict activity quality from personal activity monitors.

Course project for Practical Machine Learning, part of the Johns Hopkins Data Science Specialization taught by Jeff Leek

Data citation:

Velloso, E.; Bulling, A.; Gellersen, H.; Ugulino, W.; Fuks, H. Qualitative Activity Recognition of Weight Lifting Exercises. Proceedings of 4th International Conference in Cooperation with SIGCHI (Augmented Human '13) . Stuttgart, Germany: ACM SIGCHI, 2013.

Experimental Setup

Experimental Design 10 reps unilateral dumbbell biceps curls

A - Proper technique
B - Throwing the elbows forward
C - Half lifting the dumbbell
D - Half lowering the dumbbell
E - Throwing the hips forward

Reading in the Data

### download data, if necessary
if (!file.exists(trainingfile)) {
    download.file(trainingURL, trainingfile, mode = "w")
}
if (!file.exists(testingfile)) {
    download.file(testingURL, testingfile, mode = "w")
}
### read in files
df <- read.csv(trainingfile, na.strings = c('NA', '#DIV/0!', ''))
validation <- read.csv(testingfile, na.strings = c('NA', '#DIV/0!', ''))
validation_id <- validation$problem_id

160 Features, 19622 Observations

Divide Training Data

suppressMessages(library(caret))
inTrain <- createDataPartition(y = df$classe, p = 0.7, list = FALSE)
training <- df[inTrain, ]
training_labels <- training$classe
testing <- df[-inTrain, ]
testing_labels <- testing$classe

Data Cleaning

# remove columns with more than 80% NA values (100 features)
badfeatures <- training[ , colSums(is.na(training)) >= 0.8 * nrow(training)]
validation <- validation[ , colSums(is.na(training)) < 0.8 * nrow(training)]
testing <- testing[ , colSums(is.na(training)) < 0.8 * nrow(training)]
training <- training[ , colSums(is.na(training)) < 0.8 * nrow(training)]
# remove identification columns (7 features)
validation <- validation[ , 8:ncol(training)]
testing <- testing[ , 8:ncol(training)]
training <- training[ , 8:ncol(training)]

Distribution of Classes

suppressMessages(library(knitr))
kable(class_table)

	Technique	Counts	Proportion
A	Proper Form	3906	0.28
B	Elbows Forward	2658	0.19
C	Halfway Up	2396	0.17
D	Halfway Down	2252	0.16
E	Hips Forward	2525	0.18

PCA - Retain 95% of Variance

# Perform Principal Component Analysis on the training set
preProc <- preProcess(training[ , -53], method = "pca")
# Apply the transform to all 3 datasets
trainPC <- predict(preProc, training[ , -53])
testPC  <- predict(preProc, testing[ , -53])
validationPC <- predict(preProc, validation[ , -53])

Feature Selection

# Use Random Forest with cross validation removing 20% of features
suppressMessages(library(randomForest))
modFit <- rfcv(trainPC, training_labels, step = 0.8, cv.fold = 10)

plot of chunk SelectingFeaturesPlot

Identify Top 11 Principal Components

Error in eval(expr, envir, enclos) : 
  could not find function "randomForest"