Fitness Tracker Deep Learning

Background

Using devices such as Jawbone Up, Nike FuelBand, and Fitbit it is now possible to collect a large amount of data about personal activity relatively inexpensively. These type of devices are part of the quantified self movement – a group of enthusiasts who take measurements about themselves regularly to improve their health, to find patterns in their behavior, or because they are tech geeks. One thing that people regularly do is quantify how much of a particular activity they do, but they rarely quantify how well they do it. In this project, your goal will be to use data from accelerometers on the belt, forearm, arm, and dumbell of 6 participants. They were asked to perform barbell lifts correctly and incorrectly in 5 different ways. More information is available from the website here. (Check out the section on the Weight Lifting Exercise Dataset).

Data Loading

Loading the libraries, and the data files.

library(tidyverse); library(caret); library(rpart); library(rpart.plot)
library(e1071); library(ranger)

training <- read.csv("pml-training.csv", na.strings = c("NA", ""))
testing <- read.csv("pml-testing.csv", na.strings = c("NA", ""))
dim(training)

## [1] 19622   160

dim(testing)

## [1]  20 160

Data Processing

Training data has 19,622 rows, with 160 variables, the testing data shows to have 20 rows.

Remove fields that showed to have NAs throughout.

Then, remove the first 7 fields, since they are not relevant features to contribute in modelling.

training <- training[, colSums(is.na(training)) == 0]
training <- training[, colSums(is.na(training)) == 0]

traindata <- training[, -c(1:7)]
testdata <- testing[, -c(1:7)]

Then, split the training data further into training/validation sets with around 70/30 split.

set.seed(12345)

inTrain <- createDataPartition(traindata$classe, p = 0.7, list = FALSE)
train_d <- traindata[inTrain,]
valid_d <- traindata[-inTrain,]

Learning Algorithms

Using classifiaction tree (rpart) and random forest (rnager) as the learning algorithms to compare from. Random forest is known to be one of the best learning algorithms, and we are using a modern model built called “ranger”."

First, run classification tree (rpart). note: only 3 iterations are chosen to reduce running time, especially for ‘ranger’ model.

A number around 10 is easily the norm and higher numbers are not uncommon for ideal result.

myControl <- trainControl(method = "cv", number = 3)

model_rpart <- train(classe ~ ., data = train_d, method = "rpart",
                     trControl = myControl)

Then, run random forest model (ranger)

model_ranger <- train(classe ~ ., data = train_d, method = "ranger",
                      trControl = myControl)

Calculate results using validation data. Then check accuracy through confusion matrix.

result_rpart <- predict(model_rpart, valid_d)
result_ranger <- predict(model_ranger, valid_d)

confusionMatrix(result_rpart, valid_d$classe)$overall['Accuracy']

##  Accuracy 
## 0.4963466

confusionMatrix(result_ranger, valid_d$classe)$overall['Accuracy']

##  Accuracy 
## 0.9920136

As expected, the second model (ranger) produces much better accuracy, very close to 1, on the validation data, therefore, we choose model_ranger to be the final model.

results <-resamples(list(rpart = model_rpart, ranger = model_ranger))
summary(results)

## 
## Call:
## summary.resamples(object = results)
## 
## Models: rpart, ranger 
## Number of resamples: 3 
## 
## Accuracy 
##             Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
## rpart  0.5117904 0.5129944 0.5141983 0.5168525 0.5193835 0.5245687    0
## ranger 0.9877729 0.9884268 0.9890806 0.9888623 0.9894070 0.9897335    0
## 
## Kappa 
##             Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
## rpart  0.3628715 0.3630979 0.3633243 0.3713404 0.3755748 0.3878254    0
## ranger 0.9845318 0.9853590 0.9861861 0.9859099 0.9865990 0.9870119    0

bwplot(results)

dotplot(results)

Conclusion

Based on our final model, the data in test is now used for final prediction. The answers are also reported for the Quiz section of this report.

Predict <- predict(model_ranger, testdata, method = "response")
Predict

##  [1] B A B A A E D B A A B C B A E E A B B B
## Levels: A B C D E