Introduction

As part of the course “Practical Machine Learning”, this final assigment deals with analysing the “fit” data.

According to Leek et al, “one thing that people regularly do is quantify how much of a particular activity they do, but they rarely quantify how well they do it. In this project, your goal will be to use data from accelerometers on the belt, forearm, arm, and dumbell of 6 participants” (Leek,. J. et al. 2016, coursera PML material).

In this report, a model will be built to try to derermine how well people exercise in contrast with traditional approaches that focus on the quantitative aspect of how much people practice physiscal activities.

the report contains a description of how a model was built, how cross validation was ued, and a final compariosn and analysis of the expected sample erro and the reasoning on why certain choices were made during the data processing and analysis.

Also 20 different test cases will be predicted based on the model built.

The “Fit”" data

Accelerometers placed on the belt, forearm, arm, and dumbell of 6 participants were used to record physical activity related data. Individual were directed to perform barbell lifts correctly and incorrectly in 5 different ways:(sitting-down, standing-up, standing, walking, and sitting) and data was collected on 8 hours of activities

Read more: http://groupware.les.inf.puc-rio.br/har#dataset#ixzz4PxFL5Tuh

Data originates from the HAR project. This dataset is licensed under the Creative Commons license (CC BY-SA). Ugulino, W.; Cardador, D.; Vega, K.; Velloso, E.; Milidiu, R.; Fuks, H. Wearable Computing: Accelerometers’ Data Classification of Body Postures and Movements. Proceedings of 21st Brazilian Symposium on Artificial Intelligence. Advances in Artificial Intelligence - SBIA 2012. In: Lecture Notes in Computer Science. , pp. 52-61. Curitiba, PR: Springer Berlin / Heidelberg, 2012. ISBN 978-3-642-34458-9. DOI: 10.1007/978-3-642-34459-6_6., http://groupware.les.inf.puc-rio.br/har#ixzz4PxEjxWYN

library(caret)
## Warning: package 'caret' was built under R version 3.2.5
## Loading required package: lattice
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.2.5
library(ggplot2)
library(randomForest)
## Warning: package 'randomForest' was built under R version 3.2.5
## randomForest 4.6-12
## Type rfNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'randomForest'
## The following object is masked from 'package:ggplot2':
## 
##     margin
library(rpart)
## Warning: package 'rpart' was built under R version 3.2.5
library(rpart.plot)
## Warning: package 'rpart.plot' was built under R version 3.2.5
library(RColorBrewer) 
library(rattle)
## Warning: package 'rattle' was built under R version 3.2.5
## Rattle: A free graphical interface for data mining with R.
## Version 4.1.0 Copyright (c) 2006-2015 Togaware Pty Ltd.
## Type 'rattle()' to shake, rattle, and roll your data.
set.seed(433)
urlT="http://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv"
trainFit <- read.csv(url(urlT),na.strings=c("NA","#DIV/0!",""))
str(trainFit)
## 'data.frame':    19622 obs. of  160 variables:
##  $ X                       : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ user_name               : Factor w/ 6 levels "adelmo","carlitos",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ raw_timestamp_part_1    : int  1323084231 1323084231 1323084231 1323084232 1323084232 1323084232 1323084232 1323084232 1323084232 1323084232 ...
##  $ raw_timestamp_part_2    : int  788290 808298 820366 120339 196328 304277 368296 440390 484323 484434 ...
##  $ cvtd_timestamp          : Factor w/ 20 levels "02/12/2011 13:32",..: 9 9 9 9 9 9 9 9 9 9 ...
##  $ new_window              : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ num_window              : int  11 11 11 12 12 12 12 12 12 12 ...
##  $ roll_belt               : num  1.41 1.41 1.42 1.48 1.48 1.45 1.42 1.42 1.43 1.45 ...
##  $ pitch_belt              : num  8.07 8.07 8.07 8.05 8.07 8.06 8.09 8.13 8.16 8.17 ...
##  $ yaw_belt                : num  -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 ...
##  $ total_accel_belt        : int  3 3 3 3 3 3 3 3 3 3 ...
##  $ kurtosis_roll_belt      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ kurtosis_picth_belt     : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ kurtosis_yaw_belt       : logi  NA NA NA NA NA NA ...
##  $ skewness_roll_belt      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ skewness_roll_belt.1    : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ skewness_yaw_belt       : logi  NA NA NA NA NA NA ...
##  $ max_roll_belt           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ max_picth_belt          : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ max_yaw_belt            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ min_roll_belt           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ min_pitch_belt          : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ min_yaw_belt            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ amplitude_roll_belt     : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ amplitude_pitch_belt    : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ amplitude_yaw_belt      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ var_total_accel_belt    : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ avg_roll_belt           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ stddev_roll_belt        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ var_roll_belt           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ avg_pitch_belt          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ stddev_pitch_belt       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ var_pitch_belt          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ avg_yaw_belt            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ stddev_yaw_belt         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ var_yaw_belt            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ gyros_belt_x            : num  0 0.02 0 0.02 0.02 0.02 0.02 0.02 0.02 0.03 ...
##  $ gyros_belt_y            : num  0 0 0 0 0.02 0 0 0 0 0 ...
##  $ gyros_belt_z            : num  -0.02 -0.02 -0.02 -0.03 -0.02 -0.02 -0.02 -0.02 -0.02 0 ...
##  $ accel_belt_x            : int  -21 -22 -20 -22 -21 -21 -22 -22 -20 -21 ...
##  $ accel_belt_y            : int  4 4 5 3 2 4 3 4 2 4 ...
##  $ accel_belt_z            : int  22 22 23 21 24 21 21 21 24 22 ...
##  $ magnet_belt_x           : int  -3 -7 -2 -6 -6 0 -4 -2 1 -3 ...
##  $ magnet_belt_y           : int  599 608 600 604 600 603 599 603 602 609 ...
##  $ magnet_belt_z           : int  -313 -311 -305 -310 -302 -312 -311 -313 -312 -308 ...
##  $ roll_arm                : num  -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 ...
##  $ pitch_arm               : num  22.5 22.5 22.5 22.1 22.1 22 21.9 21.8 21.7 21.6 ...
##  $ yaw_arm                 : num  -161 -161 -161 -161 -161 -161 -161 -161 -161 -161 ...
##  $ total_accel_arm         : int  34 34 34 34 34 34 34 34 34 34 ...
##  $ var_accel_arm           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ avg_roll_arm            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ stddev_roll_arm         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ var_roll_arm            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ avg_pitch_arm           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ stddev_pitch_arm        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ var_pitch_arm           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ avg_yaw_arm             : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ stddev_yaw_arm          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ var_yaw_arm             : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ gyros_arm_x             : num  0 0.02 0.02 0.02 0 0.02 0 0.02 0.02 0.02 ...
##  $ gyros_arm_y             : num  0 -0.02 -0.02 -0.03 -0.03 -0.03 -0.03 -0.02 -0.03 -0.03 ...
##  $ gyros_arm_z             : num  -0.02 -0.02 -0.02 0.02 0 0 0 0 -0.02 -0.02 ...
##  $ accel_arm_x             : int  -288 -290 -289 -289 -289 -289 -289 -289 -288 -288 ...
##  $ accel_arm_y             : int  109 110 110 111 111 111 111 111 109 110 ...
##  $ accel_arm_z             : int  -123 -125 -126 -123 -123 -122 -125 -124 -122 -124 ...
##  $ magnet_arm_x            : int  -368 -369 -368 -372 -374 -369 -373 -372 -369 -376 ...
##  $ magnet_arm_y            : int  337 337 344 344 337 342 336 338 341 334 ...
##  $ magnet_arm_z            : int  516 513 513 512 506 513 509 510 518 516 ...
##  $ kurtosis_roll_arm       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ kurtosis_picth_arm      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ kurtosis_yaw_arm        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ skewness_roll_arm       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ skewness_pitch_arm      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ skewness_yaw_arm        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ max_roll_arm            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ max_picth_arm           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ max_yaw_arm             : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ min_roll_arm            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ min_pitch_arm           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ min_yaw_arm             : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ amplitude_roll_arm      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ amplitude_pitch_arm     : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ amplitude_yaw_arm       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ roll_dumbbell           : num  13.1 13.1 12.9 13.4 13.4 ...
##  $ pitch_dumbbell          : num  -70.5 -70.6 -70.3 -70.4 -70.4 ...
##  $ yaw_dumbbell            : num  -84.9 -84.7 -85.1 -84.9 -84.9 ...
##  $ kurtosis_roll_dumbbell  : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ kurtosis_picth_dumbbell : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ kurtosis_yaw_dumbbell   : logi  NA NA NA NA NA NA ...
##  $ skewness_roll_dumbbell  : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ skewness_pitch_dumbbell : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ skewness_yaw_dumbbell   : logi  NA NA NA NA NA NA ...
##  $ max_roll_dumbbell       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ max_picth_dumbbell      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ max_yaw_dumbbell        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ min_roll_dumbbell       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ min_pitch_dumbbell      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ min_yaw_dumbbell        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ amplitude_roll_dumbbell : num  NA NA NA NA NA NA NA NA NA NA ...
##   [list output truncated]
urls <- "http://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv"
testset <- read.csv(url(urls),na.strings=c("NA","#DIV/0!",""))
dim(testset)
## [1]  20 160

Data tiding –“NA” variables removed Columns 1-7 are not neccesary

trainFit<-trainFit[,colSums(is.na(trainFit)) == 0]
testset <-testset[,colSums(is.na(testset)) == 0]
trainFit <- trainFit[,-c(1:7)]
testset <-testset[,-c(1:7)]

The Main data set is split between training and test subsets (70% for training / 30% for testing) random seed=433

set.seed(433);
trainIn <- createDataPartition(y=trainFit$classe,p=.70,list=F)
training <- trainFit[trainIn,]
testing <- trainFit[-trainIn,]
str(training);str(testing)
## 'data.frame':    13737 obs. of  53 variables:
##  $ roll_belt           : num  1.41 1.42 1.48 1.48 1.45 1.42 1.42 1.43 1.45 1.45 ...
##  $ pitch_belt          : num  8.07 8.07 8.05 8.07 8.06 8.09 8.13 8.16 8.17 8.18 ...
##  $ yaw_belt            : num  -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 ...
##  $ total_accel_belt    : int  3 3 3 3 3 3 3 3 3 3 ...
##  $ gyros_belt_x        : num  0.02 0 0.02 0.02 0.02 0.02 0.02 0.02 0.03 0.03 ...
##  $ gyros_belt_y        : num  0 0 0 0.02 0 0 0 0 0 0 ...
##  $ gyros_belt_z        : num  -0.02 -0.02 -0.03 -0.02 -0.02 -0.02 -0.02 -0.02 0 -0.02 ...
##  $ accel_belt_x        : int  -22 -20 -22 -21 -21 -22 -22 -20 -21 -21 ...
##  $ accel_belt_y        : int  4 5 3 2 4 3 4 2 4 2 ...
##  $ accel_belt_z        : int  22 23 21 24 21 21 21 24 22 23 ...
##  $ magnet_belt_x       : int  -7 -2 -6 -6 0 -4 -2 1 -3 -5 ...
##  $ magnet_belt_y       : int  608 600 604 600 603 599 603 602 609 596 ...
##  $ magnet_belt_z       : int  -311 -305 -310 -302 -312 -311 -313 -312 -308 -317 ...
##  $ roll_arm            : num  -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 ...
##  $ pitch_arm           : num  22.5 22.5 22.1 22.1 22 21.9 21.8 21.7 21.6 21.5 ...
##  $ yaw_arm             : num  -161 -161 -161 -161 -161 -161 -161 -161 -161 -161 ...
##  $ total_accel_arm     : int  34 34 34 34 34 34 34 34 34 34 ...
##  $ gyros_arm_x         : num  0.02 0.02 0.02 0 0.02 0 0.02 0.02 0.02 0.02 ...
##  $ gyros_arm_y         : num  -0.02 -0.02 -0.03 -0.03 -0.03 -0.03 -0.02 -0.03 -0.03 -0.03 ...
##  $ gyros_arm_z         : num  -0.02 -0.02 0.02 0 0 0 0 -0.02 -0.02 0 ...
##  $ accel_arm_x         : int  -290 -289 -289 -289 -289 -289 -289 -288 -288 -290 ...
##  $ accel_arm_y         : int  110 110 111 111 111 111 111 109 110 110 ...
##  $ accel_arm_z         : int  -125 -126 -123 -123 -122 -125 -124 -122 -124 -123 ...
##  $ magnet_arm_x        : int  -369 -368 -372 -374 -369 -373 -372 -369 -376 -366 ...
##  $ magnet_arm_y        : int  337 344 344 337 342 336 338 341 334 339 ...
##  $ magnet_arm_z        : int  513 513 512 506 513 509 510 518 516 509 ...
##  $ roll_dumbbell       : num  13.1 12.9 13.4 13.4 13.4 ...
##  $ pitch_dumbbell      : num  -70.6 -70.3 -70.4 -70.4 -70.8 ...
##  $ yaw_dumbbell        : num  -84.7 -85.1 -84.9 -84.9 -84.5 ...
##  $ total_accel_dumbbell: int  37 37 37 37 37 37 37 37 37 37 ...
##  $ gyros_dumbbell_x    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ gyros_dumbbell_y    : num  -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 ...
##  $ gyros_dumbbell_z    : num  0 0 -0.02 0 0 0 0 0 0 0 ...
##  $ accel_dumbbell_x    : int  -233 -232 -232 -233 -234 -232 -234 -232 -235 -233 ...
##  $ accel_dumbbell_y    : int  47 46 48 48 48 47 46 47 48 47 ...
##  $ accel_dumbbell_z    : int  -269 -270 -269 -270 -269 -270 -272 -269 -270 -269 ...
##  $ magnet_dumbbell_x   : int  -555 -561 -552 -554 -558 -551 -555 -549 -558 -564 ...
##  $ magnet_dumbbell_y   : int  296 298 303 292 294 295 300 292 291 299 ...
##  $ magnet_dumbbell_z   : num  -64 -63 -60 -68 -66 -70 -74 -65 -69 -64 ...
##  $ roll_forearm        : num  28.3 28.3 28.1 28 27.9 27.9 27.8 27.7 27.7 27.6 ...
##  $ pitch_forearm       : num  -63.9 -63.9 -63.9 -63.9 -63.9 -63.9 -63.8 -63.8 -63.8 -63.8 ...
##  $ yaw_forearm         : num  -153 -152 -152 -152 -152 -152 -152 -152 -152 -152 ...
##  $ total_accel_forearm : int  36 36 36 36 36 36 36 36 36 36 ...
##  $ gyros_forearm_x     : num  0.02 0.03 0.02 0.02 0.02 0.02 0.02 0.03 0.02 0.02 ...
##  $ gyros_forearm_y     : num  0 -0.02 -0.02 0 -0.02 0 -0.02 0 0 -0.02 ...
##  $ gyros_forearm_z     : num  -0.02 0 0 -0.02 -0.03 -0.02 0 -0.02 -0.02 -0.02 ...
##  $ accel_forearm_x     : int  192 196 189 189 193 195 193 193 190 193 ...
##  $ accel_forearm_y     : int  203 204 206 206 203 205 205 204 205 205 ...
##  $ accel_forearm_z     : int  -216 -213 -214 -214 -215 -215 -213 -214 -215 -214 ...
##  $ magnet_forearm_x    : int  -18 -18 -16 -17 -9 -18 -9 -16 -22 -17 ...
##  $ magnet_forearm_y    : num  661 658 658 655 660 659 660 653 656 657 ...
##  $ magnet_forearm_z    : num  473 469 469 473 478 470 474 476 473 465 ...
##  $ classe              : Factor w/ 5 levels "A","B","C","D",..: 1 1 1 1 1 1 1 1 1 1 ...
## 'data.frame':    5885 obs. of  53 variables:
##  $ roll_belt           : num  1.41 1.43 1.42 1.45 1.6 1.52 1.44 1.41 1.4 1.39 ...
##  $ pitch_belt          : num  8.07 8.18 8.21 8.2 8.1 8.17 8.19 8.18 8.04 8.05 ...
##  $ yaw_belt            : num  -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.3 -94.3 ...
##  $ total_accel_belt    : int  3 3 3 3 3 3 3 3 3 3 ...
##  $ gyros_belt_x        : num  0 0.02 0.02 0 0.02 0.03 0.02 0 0 0 ...
##  $ gyros_belt_y        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ gyros_belt_z        : num  -0.02 -0.02 -0.02 0 -0.02 -0.02 -0.03 -0.03 -0.02 -0.03 ...
##  $ accel_belt_x        : int  -21 -22 -22 -21 -20 -21 -21 -22 -21 -22 ...
##  $ accel_belt_y        : int  4 2 4 2 1 4 5 4 4 1 ...
##  $ accel_belt_z        : int  22 23 21 22 20 21 21 22 21 21 ...
##  $ magnet_belt_x       : int  -3 -2 -8 -1 -10 2 -4 1 -3 1 ...
##  $ magnet_belt_y       : int  599 602 598 597 607 593 603 600 604 606 ...
##  $ magnet_belt_z       : int  -313 -319 -310 -310 -304 -308 -315 -301 -316 -310 ...
##  $ roll_arm            : num  -128 -128 -128 -129 -129 -129 -129 -129 -130 -130 ...
##  $ pitch_arm           : num  22.5 21.5 21.4 21.4 20.9 20.7 20.6 20.5 20.2 20 ...
##  $ yaw_arm             : num  -161 -161 -161 -161 -161 -161 -161 -161 -161 -162 ...
##  $ total_accel_arm     : int  34 34 34 34 34 34 34 34 34 34 ...
##  $ gyros_arm_x         : num  0 0.02 0.02 0.02 0.03 0 0.02 0.02 0 0.02 ...
##  $ gyros_arm_y         : num  0 -0.03 0 0 -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 ...
##  $ gyros_arm_z         : num  -0.02 0 -0.03 -0.03 -0.02 0 -0.02 -0.02 -0.02 -0.02 ...
##  $ accel_arm_x         : int  -288 -288 -288 -289 -288 -289 -289 -290 -290 -288 ...
##  $ accel_arm_y         : int  109 111 111 111 111 109 108 111 111 110 ...
##  $ accel_arm_z         : int  -123 -123 -124 -124 -124 -125 -127 -124 -125 -124 ...
##  $ magnet_arm_x        : int  -368 -363 -371 -374 -375 -366 -368 -369 -373 -370 ...
##  $ magnet_arm_y        : int  337 343 331 342 337 349 338 336 336 337 ...
##  $ magnet_arm_z        : int  516 520 523 510 513 523 528 514 507 500 ...
##  $ roll_dumbbell       : num  13.1 13.1 13.4 13.1 13.4 ...
##  $ pitch_dumbbell      : num  -70.5 -70.5 -71 -70.7 -70.8 ...
##  $ yaw_dumbbell        : num  -84.9 -84.9 -84.3 -84.7 -84.5 ...
##  $ total_accel_dumbbell: int  37 37 37 37 37 37 37 37 37 37 ...
##  $ gyros_dumbbell_x    : num  0 0 0.02 0 0 0 0 0 0 0 ...
##  $ gyros_dumbbell_y    : num  -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 ...
##  $ gyros_dumbbell_z    : num  0 0 -0.02 0 0 -0.02 -0.02 -0.02 0 0 ...
##  $ accel_dumbbell_x    : int  -234 -233 -234 -234 -234 -234 -234 -234 -233 -233 ...
##  $ accel_dumbbell_y    : int  47 47 48 47 48 47 47 48 49 47 ...
##  $ accel_dumbbell_z    : int  -271 -270 -268 -270 -269 -270 -269 -270 -270 -269 ...
##  $ magnet_dumbbell_x   : int  -559 -554 -554 -554 -554 -562 -555 -560 -552 -560 ...
##  $ magnet_dumbbell_y   : int  293 291 295 294 299 298 299 293 296 294 ...
##  $ magnet_dumbbell_z   : num  -65 -65 -68 -63 -72 -64 -66 -65 -69 -64 ...
##  $ roll_forearm        : num  28.4 27.5 27.2 27.2 26.9 26.8 26.7 26.6 26.4 26.4 ...
##  $ pitch_forearm       : num  -63.9 -63.8 -63.9 -63.9 -63.9 -63.6 -63.7 -63.7 -63.9 -63.9 ...
##  $ yaw_forearm         : num  -153 -152 -151 -151 -151 -151 -151 -151 -150 -150 ...
##  $ total_accel_forearm : int  36 36 36 36 36 36 36 36 36 36 ...
##  $ gyros_forearm_x     : num  0.03 0.02 0 0 0.03 0.02 0 0 -0.02 0.03 ...
##  $ gyros_forearm_y     : num  0 0.02 -0.02 -0.02 -0.03 -0.02 0 0 -0.02 -0.02 ...
##  $ gyros_forearm_z     : num  -0.02 -0.03 -0.03 -0.02 -0.02 -0.03 0 -0.02 -0.03 0 ...
##  $ accel_forearm_x     : int  192 191 193 192 194 189 192 191 191 193 ...
##  $ accel_forearm_y     : int  203 203 202 201 208 204 206 205 205 208 ...
##  $ accel_forearm_z     : int  -215 -215 -214 -214 -214 -217 -216 -216 -215 -216 ...
##  $ magnet_forearm_x    : int  -17 -11 -14 -16 -11 -4 -19 -15 -18 -14 ...
##  $ magnet_forearm_y    : num  654 657 659 656 654 661 653 651 651 656 ...
##  $ magnet_forearm_z    : num  476 478 478 472 469 479 466 464 469 464 ...
##  $ classe              : Factor w/ 5 levels "A","B","C","D",..: 1 1 1 1 1 1 1 1 1 1 ...
dim(trainFit);dim(training);dim(testing)
## [1] 19622    53
## [1] 13737    53
## [1] 5885   53

The classe variable

the six participants in the study were asked to perform one set of 10 repetitions of the Unilateral Dumbbell Biceps Curl in five different fashions: exactly according to the specification (Class A), throwing the elbows to the front (Class B), lifting the dumbbell only halfway (Class C), lowering the dumbbell only halfway (Class D) and throwing the hips to the front (Class E).

The objective is to build a classification (predictive) model that allows to predict “classe” based on the set of the other 45 variables remaining in the tidy dataset.

summary(training$classe);summary(testing$classe)
##    A    B    C    D    E 
## 3906 2658 2396 2252 2525
##    A    B    C    D    E 
## 1674 1139 1026  964 1082

A Single Predictive Model

A decision tree will be used to build a model to predict “classe”. The model initially calls for all variables left in the tidy dataset (45).

dt1 <- rpart(classe ~ . , data=training, method="class")
pd1 <- predict(dt1,testing, type = "class")
rpart.plot(dt1, main="HR Data Set- RPart Predictive Model", extra=102, under=TRUE, faclen=0)

confusionMatrix(pd1, testing$classe)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    A    B    C    D    E
##          A 1535  157   16   49   27
##          B   51  711  111  115  123
##          C   45  147  799  155  130
##          D   21   88   64  613   89
##          E   22   36   36   32  713
## 
## Overall Statistics
##                                           
##                Accuracy : 0.7427          
##                  95% CI : (0.7314, 0.7539)
##     No Information Rate : 0.2845          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.6739          
##  Mcnemar's Test P-Value : < 2.2e-16       
## 
## Statistics by Class:
## 
##                      Class: A Class: B Class: C Class: D Class: E
## Sensitivity            0.9170   0.6242   0.7788   0.6359   0.6590
## Specificity            0.9409   0.9157   0.9018   0.9468   0.9738
## Pos Pred Value         0.8604   0.6400   0.6262   0.7006   0.8498
## Neg Pred Value         0.9661   0.9103   0.9507   0.9299   0.9269
## Prevalence             0.2845   0.1935   0.1743   0.1638   0.1839
## Detection Rate         0.2608   0.1208   0.1358   0.1042   0.1212
## Detection Prevalence   0.3031   0.1888   0.2168   0.1487   0.1426
## Balanced Accuracy      0.9289   0.7700   0.8403   0.7913   0.8164

The accuracy of the rPart decision tree (rpart) is 75%

A Combined Predictive Model

Random Forrest will be used to build a model based on combined classifiers of the same type. Again, the variable “classe”in the HRA dataset will be predicted based on the subset of remaining variables after tiding the data.

library(randomForest)
rf1 <- randomForest(classe ~. , data=training, method="class")
pd2=predict(rf1, testing, type ="class")
confusionMatrix(pd2, testing$classe)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    A    B    C    D    E
##          A 1670   11    0    0    0
##          B    3 1124   12    0    0
##          C    0    4 1010    5    1
##          D    0    0    4  958    2
##          E    1    0    0    1 1079
## 
## Overall Statistics
##                                         
##                Accuracy : 0.9925        
##                  95% CI : (0.99, 0.9946)
##     No Information Rate : 0.2845        
##     P-Value [Acc > NIR] : < 2.2e-16     
##                                         
##                   Kappa : 0.9905        
##  Mcnemar's Test P-Value : NA            
## 
## Statistics by Class:
## 
##                      Class: A Class: B Class: C Class: D Class: E
## Sensitivity            0.9976   0.9868   0.9844   0.9938   0.9972
## Specificity            0.9974   0.9968   0.9979   0.9988   0.9996
## Pos Pred Value         0.9935   0.9868   0.9902   0.9938   0.9981
## Neg Pred Value         0.9990   0.9968   0.9967   0.9988   0.9994
## Prevalence             0.2845   0.1935   0.1743   0.1638   0.1839
## Detection Rate         0.2838   0.1910   0.1716   0.1628   0.1833
## Detection Prevalence   0.2856   0.1935   0.1733   0.1638   0.1837
## Balanced Accuracy      0.9975   0.9918   0.9912   0.9963   0.9984

As expected, the use of a combined classifier, such as random forest, produces a model with a higher level of accuracy. The accuracy of RF (98.6%) is significantly higher than the accuracy of asingle model like rPart (74%). It is expected that using random frestm only 2% of instances may be missclassified (expected out of sample error).

Predicting new cases

Using Random Forrest, the 20 cases provided in the training set will predicted.

testpred <- predict(rf1, testset, type="class")
testpred
##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 
##  B  A  B  A  A  E  D  B  A  A  B  C  B  A  E  E  A  B  B  B 
## Levels: A B C D E