As part of the course “Practical Machine Learning”, this final assigment deals with analysing the “fit” data.
According to Leek et al, “one thing that people regularly do is quantify how much of a particular activity they do, but they rarely quantify how well they do it. In this project, your goal will be to use data from accelerometers on the belt, forearm, arm, and dumbell of 6 participants” (Leek,. J. et al. 2016, coursera PML material).
In this report, a model will be built to try to derermine how well people exercise in contrast with traditional approaches that focus on the quantitative aspect of how much people practice physiscal activities.
the report contains a description of how a model was built, how cross validation was ued, and a final compariosn and analysis of the expected sample erro and the reasoning on why certain choices were made during the data processing and analysis.
Also 20 different test cases will be predicted based on the model built.
Accelerometers placed on the belt, forearm, arm, and dumbell of 6 participants were used to record physical activity related data. Individual were directed to perform barbell lifts correctly and incorrectly in 5 different ways:(sitting-down, standing-up, standing, walking, and sitting) and data was collected on 8 hours of activities
Read more: http://groupware.les.inf.puc-rio.br/har#dataset#ixzz4PxFL5Tuh
Data originates from the HAR project. This dataset is licensed under the Creative Commons license (CC BY-SA). Ugulino, W.; Cardador, D.; Vega, K.; Velloso, E.; Milidiu, R.; Fuks, H. Wearable Computing: Accelerometers’ Data Classification of Body Postures and Movements. Proceedings of 21st Brazilian Symposium on Artificial Intelligence. Advances in Artificial Intelligence - SBIA 2012. In: Lecture Notes in Computer Science. , pp. 52-61. Curitiba, PR: Springer Berlin / Heidelberg, 2012. ISBN 978-3-642-34458-9. DOI: 10.1007/978-3-642-34459-6_6., http://groupware.les.inf.puc-rio.br/har#ixzz4PxEjxWYN
library(caret)
## Warning: package 'caret' was built under R version 3.2.5
## Loading required package: lattice
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.2.5
library(ggplot2)
library(randomForest)
## Warning: package 'randomForest' was built under R version 3.2.5
## randomForest 4.6-12
## Type rfNews() to see new features/changes/bug fixes.
##
## Attaching package: 'randomForest'
## The following object is masked from 'package:ggplot2':
##
## margin
library(rpart)
## Warning: package 'rpart' was built under R version 3.2.5
library(rpart.plot)
## Warning: package 'rpart.plot' was built under R version 3.2.5
library(RColorBrewer)
library(rattle)
## Warning: package 'rattle' was built under R version 3.2.5
## Rattle: A free graphical interface for data mining with R.
## Version 4.1.0 Copyright (c) 2006-2015 Togaware Pty Ltd.
## Type 'rattle()' to shake, rattle, and roll your data.
set.seed(433)
urlT="http://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv"
trainFit <- read.csv(url(urlT),na.strings=c("NA","#DIV/0!",""))
str(trainFit)
## 'data.frame': 19622 obs. of 160 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ user_name : Factor w/ 6 levels "adelmo","carlitos",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ raw_timestamp_part_1 : int 1323084231 1323084231 1323084231 1323084232 1323084232 1323084232 1323084232 1323084232 1323084232 1323084232 ...
## $ raw_timestamp_part_2 : int 788290 808298 820366 120339 196328 304277 368296 440390 484323 484434 ...
## $ cvtd_timestamp : Factor w/ 20 levels "02/12/2011 13:32",..: 9 9 9 9 9 9 9 9 9 9 ...
## $ new_window : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ num_window : int 11 11 11 12 12 12 12 12 12 12 ...
## $ roll_belt : num 1.41 1.41 1.42 1.48 1.48 1.45 1.42 1.42 1.43 1.45 ...
## $ pitch_belt : num 8.07 8.07 8.07 8.05 8.07 8.06 8.09 8.13 8.16 8.17 ...
## $ yaw_belt : num -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 ...
## $ total_accel_belt : int 3 3 3 3 3 3 3 3 3 3 ...
## $ kurtosis_roll_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ kurtosis_picth_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ kurtosis_yaw_belt : logi NA NA NA NA NA NA ...
## $ skewness_roll_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ skewness_roll_belt.1 : num NA NA NA NA NA NA NA NA NA NA ...
## $ skewness_yaw_belt : logi NA NA NA NA NA NA ...
## $ max_roll_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ max_picth_belt : int NA NA NA NA NA NA NA NA NA NA ...
## $ max_yaw_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ min_roll_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ min_pitch_belt : int NA NA NA NA NA NA NA NA NA NA ...
## $ min_yaw_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ amplitude_roll_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ amplitude_pitch_belt : int NA NA NA NA NA NA NA NA NA NA ...
## $ amplitude_yaw_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ var_total_accel_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ avg_roll_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ stddev_roll_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ var_roll_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ avg_pitch_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ stddev_pitch_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ var_pitch_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ avg_yaw_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ stddev_yaw_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ var_yaw_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ gyros_belt_x : num 0 0.02 0 0.02 0.02 0.02 0.02 0.02 0.02 0.03 ...
## $ gyros_belt_y : num 0 0 0 0 0.02 0 0 0 0 0 ...
## $ gyros_belt_z : num -0.02 -0.02 -0.02 -0.03 -0.02 -0.02 -0.02 -0.02 -0.02 0 ...
## $ accel_belt_x : int -21 -22 -20 -22 -21 -21 -22 -22 -20 -21 ...
## $ accel_belt_y : int 4 4 5 3 2 4 3 4 2 4 ...
## $ accel_belt_z : int 22 22 23 21 24 21 21 21 24 22 ...
## $ magnet_belt_x : int -3 -7 -2 -6 -6 0 -4 -2 1 -3 ...
## $ magnet_belt_y : int 599 608 600 604 600 603 599 603 602 609 ...
## $ magnet_belt_z : int -313 -311 -305 -310 -302 -312 -311 -313 -312 -308 ...
## $ roll_arm : num -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 ...
## $ pitch_arm : num 22.5 22.5 22.5 22.1 22.1 22 21.9 21.8 21.7 21.6 ...
## $ yaw_arm : num -161 -161 -161 -161 -161 -161 -161 -161 -161 -161 ...
## $ total_accel_arm : int 34 34 34 34 34 34 34 34 34 34 ...
## $ var_accel_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ avg_roll_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ stddev_roll_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ var_roll_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ avg_pitch_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ stddev_pitch_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ var_pitch_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ avg_yaw_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ stddev_yaw_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ var_yaw_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ gyros_arm_x : num 0 0.02 0.02 0.02 0 0.02 0 0.02 0.02 0.02 ...
## $ gyros_arm_y : num 0 -0.02 -0.02 -0.03 -0.03 -0.03 -0.03 -0.02 -0.03 -0.03 ...
## $ gyros_arm_z : num -0.02 -0.02 -0.02 0.02 0 0 0 0 -0.02 -0.02 ...
## $ accel_arm_x : int -288 -290 -289 -289 -289 -289 -289 -289 -288 -288 ...
## $ accel_arm_y : int 109 110 110 111 111 111 111 111 109 110 ...
## $ accel_arm_z : int -123 -125 -126 -123 -123 -122 -125 -124 -122 -124 ...
## $ magnet_arm_x : int -368 -369 -368 -372 -374 -369 -373 -372 -369 -376 ...
## $ magnet_arm_y : int 337 337 344 344 337 342 336 338 341 334 ...
## $ magnet_arm_z : int 516 513 513 512 506 513 509 510 518 516 ...
## $ kurtosis_roll_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ kurtosis_picth_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ kurtosis_yaw_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ skewness_roll_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ skewness_pitch_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ skewness_yaw_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ max_roll_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ max_picth_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ max_yaw_arm : int NA NA NA NA NA NA NA NA NA NA ...
## $ min_roll_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ min_pitch_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ min_yaw_arm : int NA NA NA NA NA NA NA NA NA NA ...
## $ amplitude_roll_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ amplitude_pitch_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ amplitude_yaw_arm : int NA NA NA NA NA NA NA NA NA NA ...
## $ roll_dumbbell : num 13.1 13.1 12.9 13.4 13.4 ...
## $ pitch_dumbbell : num -70.5 -70.6 -70.3 -70.4 -70.4 ...
## $ yaw_dumbbell : num -84.9 -84.7 -85.1 -84.9 -84.9 ...
## $ kurtosis_roll_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ kurtosis_picth_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ kurtosis_yaw_dumbbell : logi NA NA NA NA NA NA ...
## $ skewness_roll_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ skewness_pitch_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ skewness_yaw_dumbbell : logi NA NA NA NA NA NA ...
## $ max_roll_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ max_picth_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ max_yaw_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ min_roll_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ min_pitch_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ min_yaw_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ amplitude_roll_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## [list output truncated]
urls <- "http://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv"
testset <- read.csv(url(urls),na.strings=c("NA","#DIV/0!",""))
dim(testset)
## [1] 20 160
Data tiding –“NA” variables removed Columns 1-7 are not neccesary
trainFit<-trainFit[,colSums(is.na(trainFit)) == 0]
testset <-testset[,colSums(is.na(testset)) == 0]
trainFit <- trainFit[,-c(1:7)]
testset <-testset[,-c(1:7)]
The Main data set is split between training and test subsets (70% for training / 30% for testing) random seed=433
set.seed(433);
trainIn <- createDataPartition(y=trainFit$classe,p=.70,list=F)
training <- trainFit[trainIn,]
testing <- trainFit[-trainIn,]
str(training);str(testing)
## 'data.frame': 13737 obs. of 53 variables:
## $ roll_belt : num 1.41 1.42 1.48 1.48 1.45 1.42 1.42 1.43 1.45 1.45 ...
## $ pitch_belt : num 8.07 8.07 8.05 8.07 8.06 8.09 8.13 8.16 8.17 8.18 ...
## $ yaw_belt : num -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 ...
## $ total_accel_belt : int 3 3 3 3 3 3 3 3 3 3 ...
## $ gyros_belt_x : num 0.02 0 0.02 0.02 0.02 0.02 0.02 0.02 0.03 0.03 ...
## $ gyros_belt_y : num 0 0 0 0.02 0 0 0 0 0 0 ...
## $ gyros_belt_z : num -0.02 -0.02 -0.03 -0.02 -0.02 -0.02 -0.02 -0.02 0 -0.02 ...
## $ accel_belt_x : int -22 -20 -22 -21 -21 -22 -22 -20 -21 -21 ...
## $ accel_belt_y : int 4 5 3 2 4 3 4 2 4 2 ...
## $ accel_belt_z : int 22 23 21 24 21 21 21 24 22 23 ...
## $ magnet_belt_x : int -7 -2 -6 -6 0 -4 -2 1 -3 -5 ...
## $ magnet_belt_y : int 608 600 604 600 603 599 603 602 609 596 ...
## $ magnet_belt_z : int -311 -305 -310 -302 -312 -311 -313 -312 -308 -317 ...
## $ roll_arm : num -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 ...
## $ pitch_arm : num 22.5 22.5 22.1 22.1 22 21.9 21.8 21.7 21.6 21.5 ...
## $ yaw_arm : num -161 -161 -161 -161 -161 -161 -161 -161 -161 -161 ...
## $ total_accel_arm : int 34 34 34 34 34 34 34 34 34 34 ...
## $ gyros_arm_x : num 0.02 0.02 0.02 0 0.02 0 0.02 0.02 0.02 0.02 ...
## $ gyros_arm_y : num -0.02 -0.02 -0.03 -0.03 -0.03 -0.03 -0.02 -0.03 -0.03 -0.03 ...
## $ gyros_arm_z : num -0.02 -0.02 0.02 0 0 0 0 -0.02 -0.02 0 ...
## $ accel_arm_x : int -290 -289 -289 -289 -289 -289 -289 -288 -288 -290 ...
## $ accel_arm_y : int 110 110 111 111 111 111 111 109 110 110 ...
## $ accel_arm_z : int -125 -126 -123 -123 -122 -125 -124 -122 -124 -123 ...
## $ magnet_arm_x : int -369 -368 -372 -374 -369 -373 -372 -369 -376 -366 ...
## $ magnet_arm_y : int 337 344 344 337 342 336 338 341 334 339 ...
## $ magnet_arm_z : int 513 513 512 506 513 509 510 518 516 509 ...
## $ roll_dumbbell : num 13.1 12.9 13.4 13.4 13.4 ...
## $ pitch_dumbbell : num -70.6 -70.3 -70.4 -70.4 -70.8 ...
## $ yaw_dumbbell : num -84.7 -85.1 -84.9 -84.9 -84.5 ...
## $ total_accel_dumbbell: int 37 37 37 37 37 37 37 37 37 37 ...
## $ gyros_dumbbell_x : num 0 0 0 0 0 0 0 0 0 0 ...
## $ gyros_dumbbell_y : num -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 ...
## $ gyros_dumbbell_z : num 0 0 -0.02 0 0 0 0 0 0 0 ...
## $ accel_dumbbell_x : int -233 -232 -232 -233 -234 -232 -234 -232 -235 -233 ...
## $ accel_dumbbell_y : int 47 46 48 48 48 47 46 47 48 47 ...
## $ accel_dumbbell_z : int -269 -270 -269 -270 -269 -270 -272 -269 -270 -269 ...
## $ magnet_dumbbell_x : int -555 -561 -552 -554 -558 -551 -555 -549 -558 -564 ...
## $ magnet_dumbbell_y : int 296 298 303 292 294 295 300 292 291 299 ...
## $ magnet_dumbbell_z : num -64 -63 -60 -68 -66 -70 -74 -65 -69 -64 ...
## $ roll_forearm : num 28.3 28.3 28.1 28 27.9 27.9 27.8 27.7 27.7 27.6 ...
## $ pitch_forearm : num -63.9 -63.9 -63.9 -63.9 -63.9 -63.9 -63.8 -63.8 -63.8 -63.8 ...
## $ yaw_forearm : num -153 -152 -152 -152 -152 -152 -152 -152 -152 -152 ...
## $ total_accel_forearm : int 36 36 36 36 36 36 36 36 36 36 ...
## $ gyros_forearm_x : num 0.02 0.03 0.02 0.02 0.02 0.02 0.02 0.03 0.02 0.02 ...
## $ gyros_forearm_y : num 0 -0.02 -0.02 0 -0.02 0 -0.02 0 0 -0.02 ...
## $ gyros_forearm_z : num -0.02 0 0 -0.02 -0.03 -0.02 0 -0.02 -0.02 -0.02 ...
## $ accel_forearm_x : int 192 196 189 189 193 195 193 193 190 193 ...
## $ accel_forearm_y : int 203 204 206 206 203 205 205 204 205 205 ...
## $ accel_forearm_z : int -216 -213 -214 -214 -215 -215 -213 -214 -215 -214 ...
## $ magnet_forearm_x : int -18 -18 -16 -17 -9 -18 -9 -16 -22 -17 ...
## $ magnet_forearm_y : num 661 658 658 655 660 659 660 653 656 657 ...
## $ magnet_forearm_z : num 473 469 469 473 478 470 474 476 473 465 ...
## $ classe : Factor w/ 5 levels "A","B","C","D",..: 1 1 1 1 1 1 1 1 1 1 ...
## 'data.frame': 5885 obs. of 53 variables:
## $ roll_belt : num 1.41 1.43 1.42 1.45 1.6 1.52 1.44 1.41 1.4 1.39 ...
## $ pitch_belt : num 8.07 8.18 8.21 8.2 8.1 8.17 8.19 8.18 8.04 8.05 ...
## $ yaw_belt : num -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.3 -94.3 ...
## $ total_accel_belt : int 3 3 3 3 3 3 3 3 3 3 ...
## $ gyros_belt_x : num 0 0.02 0.02 0 0.02 0.03 0.02 0 0 0 ...
## $ gyros_belt_y : num 0 0 0 0 0 0 0 0 0 0 ...
## $ gyros_belt_z : num -0.02 -0.02 -0.02 0 -0.02 -0.02 -0.03 -0.03 -0.02 -0.03 ...
## $ accel_belt_x : int -21 -22 -22 -21 -20 -21 -21 -22 -21 -22 ...
## $ accel_belt_y : int 4 2 4 2 1 4 5 4 4 1 ...
## $ accel_belt_z : int 22 23 21 22 20 21 21 22 21 21 ...
## $ magnet_belt_x : int -3 -2 -8 -1 -10 2 -4 1 -3 1 ...
## $ magnet_belt_y : int 599 602 598 597 607 593 603 600 604 606 ...
## $ magnet_belt_z : int -313 -319 -310 -310 -304 -308 -315 -301 -316 -310 ...
## $ roll_arm : num -128 -128 -128 -129 -129 -129 -129 -129 -130 -130 ...
## $ pitch_arm : num 22.5 21.5 21.4 21.4 20.9 20.7 20.6 20.5 20.2 20 ...
## $ yaw_arm : num -161 -161 -161 -161 -161 -161 -161 -161 -161 -162 ...
## $ total_accel_arm : int 34 34 34 34 34 34 34 34 34 34 ...
## $ gyros_arm_x : num 0 0.02 0.02 0.02 0.03 0 0.02 0.02 0 0.02 ...
## $ gyros_arm_y : num 0 -0.03 0 0 -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 ...
## $ gyros_arm_z : num -0.02 0 -0.03 -0.03 -0.02 0 -0.02 -0.02 -0.02 -0.02 ...
## $ accel_arm_x : int -288 -288 -288 -289 -288 -289 -289 -290 -290 -288 ...
## $ accel_arm_y : int 109 111 111 111 111 109 108 111 111 110 ...
## $ accel_arm_z : int -123 -123 -124 -124 -124 -125 -127 -124 -125 -124 ...
## $ magnet_arm_x : int -368 -363 -371 -374 -375 -366 -368 -369 -373 -370 ...
## $ magnet_arm_y : int 337 343 331 342 337 349 338 336 336 337 ...
## $ magnet_arm_z : int 516 520 523 510 513 523 528 514 507 500 ...
## $ roll_dumbbell : num 13.1 13.1 13.4 13.1 13.4 ...
## $ pitch_dumbbell : num -70.5 -70.5 -71 -70.7 -70.8 ...
## $ yaw_dumbbell : num -84.9 -84.9 -84.3 -84.7 -84.5 ...
## $ total_accel_dumbbell: int 37 37 37 37 37 37 37 37 37 37 ...
## $ gyros_dumbbell_x : num 0 0 0.02 0 0 0 0 0 0 0 ...
## $ gyros_dumbbell_y : num -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 ...
## $ gyros_dumbbell_z : num 0 0 -0.02 0 0 -0.02 -0.02 -0.02 0 0 ...
## $ accel_dumbbell_x : int -234 -233 -234 -234 -234 -234 -234 -234 -233 -233 ...
## $ accel_dumbbell_y : int 47 47 48 47 48 47 47 48 49 47 ...
## $ accel_dumbbell_z : int -271 -270 -268 -270 -269 -270 -269 -270 -270 -269 ...
## $ magnet_dumbbell_x : int -559 -554 -554 -554 -554 -562 -555 -560 -552 -560 ...
## $ magnet_dumbbell_y : int 293 291 295 294 299 298 299 293 296 294 ...
## $ magnet_dumbbell_z : num -65 -65 -68 -63 -72 -64 -66 -65 -69 -64 ...
## $ roll_forearm : num 28.4 27.5 27.2 27.2 26.9 26.8 26.7 26.6 26.4 26.4 ...
## $ pitch_forearm : num -63.9 -63.8 -63.9 -63.9 -63.9 -63.6 -63.7 -63.7 -63.9 -63.9 ...
## $ yaw_forearm : num -153 -152 -151 -151 -151 -151 -151 -151 -150 -150 ...
## $ total_accel_forearm : int 36 36 36 36 36 36 36 36 36 36 ...
## $ gyros_forearm_x : num 0.03 0.02 0 0 0.03 0.02 0 0 -0.02 0.03 ...
## $ gyros_forearm_y : num 0 0.02 -0.02 -0.02 -0.03 -0.02 0 0 -0.02 -0.02 ...
## $ gyros_forearm_z : num -0.02 -0.03 -0.03 -0.02 -0.02 -0.03 0 -0.02 -0.03 0 ...
## $ accel_forearm_x : int 192 191 193 192 194 189 192 191 191 193 ...
## $ accel_forearm_y : int 203 203 202 201 208 204 206 205 205 208 ...
## $ accel_forearm_z : int -215 -215 -214 -214 -214 -217 -216 -216 -215 -216 ...
## $ magnet_forearm_x : int -17 -11 -14 -16 -11 -4 -19 -15 -18 -14 ...
## $ magnet_forearm_y : num 654 657 659 656 654 661 653 651 651 656 ...
## $ magnet_forearm_z : num 476 478 478 472 469 479 466 464 469 464 ...
## $ classe : Factor w/ 5 levels "A","B","C","D",..: 1 1 1 1 1 1 1 1 1 1 ...
dim(trainFit);dim(training);dim(testing)
## [1] 19622 53
## [1] 13737 53
## [1] 5885 53
the six participants in the study were asked to perform one set of 10 repetitions of the Unilateral Dumbbell Biceps Curl in five different fashions: exactly according to the specification (Class A), throwing the elbows to the front (Class B), lifting the dumbbell only halfway (Class C), lowering the dumbbell only halfway (Class D) and throwing the hips to the front (Class E).
The objective is to build a classification (predictive) model that allows to predict “classe” based on the set of the other 45 variables remaining in the tidy dataset.
summary(training$classe);summary(testing$classe)
## A B C D E
## 3906 2658 2396 2252 2525
## A B C D E
## 1674 1139 1026 964 1082
A decision tree will be used to build a model to predict “classe”. The model initially calls for all variables left in the tidy dataset (45).
dt1 <- rpart(classe ~ . , data=training, method="class")
pd1 <- predict(dt1,testing, type = "class")
rpart.plot(dt1, main="HR Data Set- RPart Predictive Model", extra=102, under=TRUE, faclen=0)
confusionMatrix(pd1, testing$classe)
## Confusion Matrix and Statistics
##
## Reference
## Prediction A B C D E
## A 1535 157 16 49 27
## B 51 711 111 115 123
## C 45 147 799 155 130
## D 21 88 64 613 89
## E 22 36 36 32 713
##
## Overall Statistics
##
## Accuracy : 0.7427
## 95% CI : (0.7314, 0.7539)
## No Information Rate : 0.2845
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.6739
## Mcnemar's Test P-Value : < 2.2e-16
##
## Statistics by Class:
##
## Class: A Class: B Class: C Class: D Class: E
## Sensitivity 0.9170 0.6242 0.7788 0.6359 0.6590
## Specificity 0.9409 0.9157 0.9018 0.9468 0.9738
## Pos Pred Value 0.8604 0.6400 0.6262 0.7006 0.8498
## Neg Pred Value 0.9661 0.9103 0.9507 0.9299 0.9269
## Prevalence 0.2845 0.1935 0.1743 0.1638 0.1839
## Detection Rate 0.2608 0.1208 0.1358 0.1042 0.1212
## Detection Prevalence 0.3031 0.1888 0.2168 0.1487 0.1426
## Balanced Accuracy 0.9289 0.7700 0.8403 0.7913 0.8164
The accuracy of the rPart decision tree (rpart) is 75%
Random Forrest will be used to build a model based on combined classifiers of the same type. Again, the variable “classe”in the HRA dataset will be predicted based on the subset of remaining variables after tiding the data.
library(randomForest)
rf1 <- randomForest(classe ~. , data=training, method="class")
pd2=predict(rf1, testing, type ="class")
confusionMatrix(pd2, testing$classe)
## Confusion Matrix and Statistics
##
## Reference
## Prediction A B C D E
## A 1670 11 0 0 0
## B 3 1124 12 0 0
## C 0 4 1010 5 1
## D 0 0 4 958 2
## E 1 0 0 1 1079
##
## Overall Statistics
##
## Accuracy : 0.9925
## 95% CI : (0.99, 0.9946)
## No Information Rate : 0.2845
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9905
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: A Class: B Class: C Class: D Class: E
## Sensitivity 0.9976 0.9868 0.9844 0.9938 0.9972
## Specificity 0.9974 0.9968 0.9979 0.9988 0.9996
## Pos Pred Value 0.9935 0.9868 0.9902 0.9938 0.9981
## Neg Pred Value 0.9990 0.9968 0.9967 0.9988 0.9994
## Prevalence 0.2845 0.1935 0.1743 0.1638 0.1839
## Detection Rate 0.2838 0.1910 0.1716 0.1628 0.1833
## Detection Prevalence 0.2856 0.1935 0.1733 0.1638 0.1837
## Balanced Accuracy 0.9975 0.9918 0.9912 0.9963 0.9984
As expected, the use of a combined classifier, such as random forest, produces a model with a higher level of accuracy. The accuracy of RF (98.6%) is significantly higher than the accuracy of asingle model like rPart (74%). It is expected that using random frestm only 2% of instances may be missclassified (expected out of sample error).
Using Random Forrest, the 20 cases provided in the training set will predicted.
testpred <- predict(rf1, testset, type="class")
testpred
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## B A B A A E D B A A B C B A E E A B B B
## Levels: A B C D E