Using devices such as Jawbone Up, Nike FuelBand, and Fitbit it is now possible to collect a large amount of data about personal activity relatively inexpensively. These type of devices are part of the quantified self movement - a group of enthusiasts who take measurements about themselves regularly to improve their health, to find patterns in their behavior, or because they are tech geeks.
One thing that people regularly do is quantify how much of a particular activity they do, but they rarely quantify how well they do it. In this project, our goal will be to use data from accelerometers on the belt, forearm, arm, and dumbell of 6 participants. They were asked to perform barbell lifts correctly and incorrectly in 5 different ways.
Refer http://groupware.les.inf.puc-rio.br/har. See the section on the Weight Lifting Exercise Dataset.
The goal of this project is to predict the manner in which they did the exercise. This is the “class” variable in the training set. You will create a report showing how we built your model, how we used cross validation, and why we made the choices we did. Finally, we will also use our prediction model to predict 20 different test cases.
To create our prediction model we will carry out the following steps
* Load the training data.
* Remove near zero covariates and the records with >= 80% missing values.
* Partition data - Training Data Set - 60% | Checking Data Set - 40%.
* Calculate correlations between each remaining feature to the response, classe. * Utilizing the activity monitor device data, a machine learning model is to be generated using a training data set with class labels representing the 6 ways of performing the barbell lifts (supervised learning). Basically we create a model with random forests algorithm to predict classe with all other predictors using – Boosting Algorithm
– Default Method
– Custom Algorithm * Plot accuracy of the model on the scale [0.9, 1].
* Create a prediction model to predict how well 6 different people performed barbell lifts utilizing data collected from activity monitoring devices.
* Once the models are built, the performance of the madel is assessed using the Checking Data Set * The best model is to be applied to a new set of Testing data to make predictions.
* These predictions are to be submitted for automated grading in a second part of the assignment.
Models When we create prediction models on the training data, we’ll use cross validation with trainControl to help optimize the model parameters We’ll do 10-fold cross validation
cvCtrl <- trainControl(method="cv", number=10, allowParallel=T)
csCtrl <- trainControl(method="oob", number=10, allowParallel=T)
We will use three different models that use different approaches
## Boost Method
m1 <- train(classe~., data=PartTrain, method="gmb", verbose=F, trControl=cvCtrl)
## Default Method
m2 <- train(classe~., data=PartTrain, method="rf", verbose=F, trControl=cvCtrl)
## Custom Algorithm ... notice method is not mentioned here and
## the trControl is different ... tuning param added
m3 <- train(classe~., data=PartTrain, tuneGrid=data.frame(mtry=10), trControl=csCtrl)
We could also try one of these if required
## Support Vector Machines Model
o1 <- train(classe~., data=PartTrain, method="svm", verbose=F, trControl=cvCtrl)
## KNN MOdel
o2 <- train(classe~., data=PartTrain, method="knn", verbose=F, trControl=cvCtrl)
## Bagged Model
o3 <- train(classe~., data=PartTrain, method="bag", verbose=F, trControl=cvCtrl)
Data Location
The data called “Weight Lifting Exercises Dataset (WLE)” for this project come from this source: http://groupware.les.inf.puc-rio.br/har. Use of WLE dataset from the aforementioned site is acknowledged.
The training data for this project are available at:
https://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv
The test data are available at:
https://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv
Description
Six young health participants were asked to perform one set of 10 repetitions of the Unilateral Dumbbell Biceps Curl in five different fashions: exactly according to the specification (Class A), throwing the elbows to the front (Class B), lifting the dumbbell only halfway (Class C), lowering the dumbbell only halfway (Class D) and throwing the hips to the front (Class E).
Class A corresponds to the specified execution of the exercise, while the other 4 classes correspond to common mistakes. Participants were supervised by an experienced weight lifter to make sure the execution complied to the manner they were supposed to simulate. The exercises were performed by six male participants aged between 20-28 years, with little weight lifting experience. We made sure that all participants could easily simulate the mistakes in a safe and controlled manner by using a relatively light dumbbell (1.25kg).
Read more: http://groupware.les.inf.puc-rio.br/har#ixzz3JVzoS6M9
Pre-Requisites
Before you start execution of this Rmd file,
1. Please set working dir to your repository
2. Please download the training & test dataset and copy them to your repository
setwd(<your_assignment_repository>)
training.file <- 'pml-training.csv'
testing.file <- 'pml-test.csv'
training.url <- 'http://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv'
testing.url <- 'http://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv'
download.file(training.url, training.file)
download.file(testing.url,testing.file )
knitr Global Options
knitr::opts_chunk$set(tidy=FALSE, fig.path='figures/')
Load Libraries
library(ggplot2)
library(caret)
## Warning: package 'caret' was built under R version 3.1.2
## Loading required package: lattice
library(randomForest)
## Warning: package 'randomForest' was built under R version 3.1.2
## randomForest 4.6-10
## Type rfNews() to see new features/changes/bug fixes.
library(gbm)
## Warning: package 'gbm' was built under R version 3.1.2
## Loading required package: survival
## Loading required package: splines
##
## Attaching package: 'survival'
##
## The following object is masked from 'package:caret':
##
## cluster
##
## Loading required package: parallel
## Loaded gbm 2.1
library(survival)
library(splines)
library(parallel)
library(plyr)
#library(ipred)
Load Data
## load data
## mark as "NA"", "DIV/0"" & "" as "NA""
DataTrain <- read.csv("pml-training.csv", row.names=1, na.strings=c("NA","#DIV/0!", ""))
DataCases <- read.csv("pml-testing.csv", row.names=1, na.strings=c("NA","#DIV/0!", ""))
Clean Data
## remove data where there is no data i.e. 0 or NA
DataTrain <- DataTrain[,colSums(is.na(DataTrain)) == 0]
DataCases <- DataCases[,colSums(is.na(DataCases)) == 0]
## the following features (columns) are irrelevant, so removed
## user_name, raw_timestamp_part_1, raw_timestamp_part_2,
## cvtd_timestamp, new_window, num_window
DataTrain <- DataTrain[,-c(1:6)]
DataCases <- DataCases[,-c(1:6)]
Observation
After cleaning the data, it is seen that the training data set has
1. 19622 samples
2. 52 possible predictors
After cleaning the data, it is seen that the testcases data set has
1. 20 samples
2. 52 possible predictors
Corelations
## find correlations
TrainCors <- abs(sapply(colnames(DataTrain[, -ncol(DataTrain)]), function(x) cor(as.numeric(DataTrain[, x]), as.numeric(DataTrain$classe), method = "spearman")))
summary(TrainCors)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0015 0.0147 0.0526 0.0875 0.1380 0.3170
TrainCors
## roll_belt pitch_belt yaw_belt
## 0.128825 0.044525 0.073104
## total_accel_belt gyros_belt_x gyros_belt_y
## 0.086895 0.008433 0.003178
## gyros_belt_z accel_belt_x accel_belt_y
## 0.002042 0.041222 0.015762
## accel_belt_z magnet_belt_x magnet_belt_y
## 0.137358 0.001536 0.198570
## magnet_belt_z roll_arm pitch_arm
## 0.139846 0.052367 0.184447
## yaw_arm total_accel_arm gyros_arm_x
## 0.027926 0.154739 0.023321
## gyros_arm_y gyros_arm_z accel_arm_x
## 0.030942 0.014624 0.256517
## accel_arm_y accel_arm_z magnet_arm_x
## 0.082298 0.100899 0.279462
## magnet_arm_y magnet_arm_z roll_dumbbell
## 0.264919 0.158707 0.087774
## pitch_dumbbell yaw_dumbbell total_accel_dumbbell
## 0.099861 0.007714 0.014720
## gyros_dumbbell_x gyros_dumbbell_y gyros_dumbbell_z
## 0.012329 0.020075 0.014287
## accel_dumbbell_x accel_dumbbell_y accel_dumbbell_z
## 0.128826 0.014719 0.081678
## magnet_dumbbell_x magnet_dumbbell_y magnet_dumbbell_z
## 0.151640 0.048651 0.202476
## roll_forearm pitch_forearm yaw_forearm
## 0.052895 0.317281 0.048525
## total_accel_forearm gyros_forearm_x gyros_forearm_y
## 0.123123 0.013357 0.005585
## gyros_forearm_z accel_forearm_x accel_forearm_y
## 0.001525 0.205093 0.020265
## accel_forearm_z magnet_forearm_x magnet_forearm_y
## 0.006820 0.194284 0.112262
## magnet_forearm_z
## 0.050460
Observation
1. No conclusion can be drawn from the above cor summary. Corelations are not found.
Plot Predictors
## plot predictors
plot(DataTrain[, names(which.max(TrainCors))], DataTrain[, names(which.max(TrainCors[-which.max(TrainCors)]))], col=DataTrain$classe, pch=19, cex=0.1, xlab=names(which.max(TrainCors)), ylab=names(which.max(TrainCors[-which.max(TrainCors)])))
Observation
1. There seems to be no strong predictors that correlates with classe, so linear regression model is probably not suitable option for WLE data.
2. RandomForests algorithms will be better suited for robust predictions for WLE data.
Partition Data
PartInfos <- createDataPartition(DataTrain$classe, p=0.60, list=FALSE)
PartTrain <- DataTrain[PartInfos, ]
PartCheck <- DataTrain[-PartInfos, ]
In this section,
1. We will create a model based on Random Forests - Boosting Algorithm with 10-Fold Cross Validation
2. View summary of the model
3. Plot Predictive / Important Variables
4. Compare this model using other (check) part of the training data
Create Model
## set seed
set.seed(707)
## create model
cvCtrl <- trainControl(method="cv", number=10, allowParallel=T)
RfBoModel <- train(classe~.,
method="gbm",
data=PartTrain,
trControl=cvCtrl
)
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1274
## 2 1.5245 nan 0.1000 0.0868
## 3 1.4671 nan 0.1000 0.0683
## 4 1.4219 nan 0.1000 0.0530
## 5 1.3863 nan 0.1000 0.0458
## 6 1.3569 nan 0.1000 0.0462
## 7 1.3272 nan 0.1000 0.0371
## 8 1.3029 nan 0.1000 0.0344
## 9 1.2812 nan 0.1000 0.0336
## 10 1.2590 nan 0.1000 0.0293
## 20 1.1037 nan 0.1000 0.0191
## 40 0.9292 nan 0.1000 0.0102
## 60 0.8217 nan 0.1000 0.0064
## 80 0.7418 nan 0.1000 0.0045
## 100 0.6806 nan 0.1000 0.0051
## 120 0.6268 nan 0.1000 0.0015
## 140 0.5834 nan 0.1000 0.0025
## 150 0.5650 nan 0.1000 0.0027
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1851
## 2 1.4902 nan 0.1000 0.1355
## 3 1.4053 nan 0.1000 0.1046
## 4 1.3381 nan 0.1000 0.0848
## 5 1.2831 nan 0.1000 0.0750
## 6 1.2356 nan 0.1000 0.0725
## 7 1.1909 nan 0.1000 0.0571
## 8 1.1553 nan 0.1000 0.0513
## 9 1.1232 nan 0.1000 0.0494
## 10 1.0915 nan 0.1000 0.0441
## 20 0.8850 nan 0.1000 0.0266
## 40 0.6685 nan 0.1000 0.0102
## 60 0.5450 nan 0.1000 0.0084
## 80 0.4524 nan 0.1000 0.0053
## 100 0.3902 nan 0.1000 0.0042
## 120 0.3397 nan 0.1000 0.0036
## 140 0.3000 nan 0.1000 0.0019
## 150 0.2817 nan 0.1000 0.0014
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.2272
## 2 1.4660 nan 0.1000 0.1652
## 3 1.3617 nan 0.1000 0.1257
## 4 1.2826 nan 0.1000 0.1131
## 5 1.2122 nan 0.1000 0.0909
## 6 1.1540 nan 0.1000 0.0771
## 7 1.1055 nan 0.1000 0.0633
## 8 1.0643 nan 0.1000 0.0658
## 9 1.0238 nan 0.1000 0.0468
## 10 0.9939 nan 0.1000 0.0472
## 20 0.7618 nan 0.1000 0.0274
## 40 0.5344 nan 0.1000 0.0140
## 60 0.4101 nan 0.1000 0.0081
## 80 0.3263 nan 0.1000 0.0041
## 100 0.2682 nan 0.1000 0.0037
## 120 0.2225 nan 0.1000 0.0025
## 140 0.1877 nan 0.1000 0.0018
## 150 0.1749 nan 0.1000 0.0013
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1342
## 2 1.5226 nan 0.1000 0.0884
## 3 1.4646 nan 0.1000 0.0695
## 4 1.4192 nan 0.1000 0.0539
## 5 1.3841 nan 0.1000 0.0461
## 6 1.3545 nan 0.1000 0.0430
## 7 1.3265 nan 0.1000 0.0418
## 8 1.3004 nan 0.1000 0.0370
## 9 1.2772 nan 0.1000 0.0377
## 10 1.2530 nan 0.1000 0.0294
## 20 1.0963 nan 0.1000 0.0191
## 40 0.9243 nan 0.1000 0.0095
## 60 0.8185 nan 0.1000 0.0068
## 80 0.7405 nan 0.1000 0.0042
## 100 0.6749 nan 0.1000 0.0049
## 120 0.6214 nan 0.1000 0.0024
## 140 0.5788 nan 0.1000 0.0028
## 150 0.5597 nan 0.1000 0.0030
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1858
## 2 1.4880 nan 0.1000 0.1362
## 3 1.4028 nan 0.1000 0.1094
## 4 1.3335 nan 0.1000 0.0823
## 5 1.2804 nan 0.1000 0.0755
## 6 1.2325 nan 0.1000 0.0644
## 7 1.1907 nan 0.1000 0.0598
## 8 1.1514 nan 0.1000 0.0492
## 9 1.1193 nan 0.1000 0.0492
## 10 1.0872 nan 0.1000 0.0397
## 20 0.8883 nan 0.1000 0.0249
## 40 0.6720 nan 0.1000 0.0115
## 60 0.5489 nan 0.1000 0.0070
## 80 0.4641 nan 0.1000 0.0040
## 100 0.3986 nan 0.1000 0.0034
## 120 0.3492 nan 0.1000 0.0047
## 140 0.3048 nan 0.1000 0.0020
## 150 0.2862 nan 0.1000 0.0017
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.2308
## 2 1.4619 nan 0.1000 0.1596
## 3 1.3603 nan 0.1000 0.1232
## 4 1.2808 nan 0.1000 0.1052
## 5 1.2137 nan 0.1000 0.0978
## 6 1.1530 nan 0.1000 0.0719
## 7 1.1058 nan 0.1000 0.0603
## 8 1.0659 nan 0.1000 0.0513
## 9 1.0317 nan 0.1000 0.0703
## 10 0.9870 nan 0.1000 0.0464
## 20 0.7541 nan 0.1000 0.0251
## 40 0.5281 nan 0.1000 0.0140
## 60 0.4037 nan 0.1000 0.0055
## 80 0.3225 nan 0.1000 0.0050
## 100 0.2651 nan 0.1000 0.0023
## 120 0.2226 nan 0.1000 0.0023
## 140 0.1891 nan 0.1000 0.0011
## 150 0.1748 nan 0.1000 0.0008
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1296
## 2 1.5240 nan 0.1000 0.0859
## 3 1.4670 nan 0.1000 0.0669
## 4 1.4230 nan 0.1000 0.0509
## 5 1.3882 nan 0.1000 0.0452
## 6 1.3581 nan 0.1000 0.0445
## 7 1.3290 nan 0.1000 0.0384
## 8 1.3043 nan 0.1000 0.0354
## 9 1.2815 nan 0.1000 0.0330
## 10 1.2586 nan 0.1000 0.0286
## 20 1.1032 nan 0.1000 0.0171
## 40 0.9309 nan 0.1000 0.0101
## 60 0.8216 nan 0.1000 0.0066
## 80 0.7443 nan 0.1000 0.0056
## 100 0.6798 nan 0.1000 0.0034
## 120 0.6314 nan 0.1000 0.0025
## 140 0.5866 nan 0.1000 0.0014
## 150 0.5673 nan 0.1000 0.0024
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1787
## 2 1.4903 nan 0.1000 0.1304
## 3 1.4064 nan 0.1000 0.1013
## 4 1.3407 nan 0.1000 0.0876
## 5 1.2852 nan 0.1000 0.0686
## 6 1.2411 nan 0.1000 0.0589
## 7 1.2033 nan 0.1000 0.0598
## 8 1.1655 nan 0.1000 0.0501
## 9 1.1342 nan 0.1000 0.0580
## 10 1.0991 nan 0.1000 0.0427
## 20 0.8919 nan 0.1000 0.0206
## 40 0.6768 nan 0.1000 0.0118
## 60 0.5516 nan 0.1000 0.0081
## 80 0.4660 nan 0.1000 0.0028
## 100 0.3985 nan 0.1000 0.0036
## 120 0.3466 nan 0.1000 0.0023
## 140 0.3065 nan 0.1000 0.0015
## 150 0.2879 nan 0.1000 0.0023
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.2329
## 2 1.4626 nan 0.1000 0.1607
## 3 1.3611 nan 0.1000 0.1256
## 4 1.2803 nan 0.1000 0.1110
## 5 1.2107 nan 0.1000 0.0836
## 6 1.1574 nan 0.1000 0.0767
## 7 1.1088 nan 0.1000 0.0741
## 8 1.0620 nan 0.1000 0.0606
## 9 1.0237 nan 0.1000 0.0481
## 10 0.9937 nan 0.1000 0.0460
## 20 0.7543 nan 0.1000 0.0212
## 40 0.5302 nan 0.1000 0.0092
## 60 0.4030 nan 0.1000 0.0044
## 80 0.3208 nan 0.1000 0.0046
## 100 0.2646 nan 0.1000 0.0043
## 120 0.2214 nan 0.1000 0.0033
## 140 0.1871 nan 0.1000 0.0024
## 150 0.1736 nan 0.1000 0.0012
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1322
## 2 1.5228 nan 0.1000 0.0893
## 3 1.4656 nan 0.1000 0.0665
## 4 1.4213 nan 0.1000 0.0524
## 5 1.3872 nan 0.1000 0.0498
## 6 1.3545 nan 0.1000 0.0393
## 7 1.3290 nan 0.1000 0.0420
## 8 1.3035 nan 0.1000 0.0334
## 9 1.2820 nan 0.1000 0.0353
## 10 1.2582 nan 0.1000 0.0308
## 20 1.0983 nan 0.1000 0.0180
## 40 0.9251 nan 0.1000 0.0102
## 60 0.8169 nan 0.1000 0.0056
## 80 0.7374 nan 0.1000 0.0055
## 100 0.6766 nan 0.1000 0.0036
## 120 0.6234 nan 0.1000 0.0038
## 140 0.5805 nan 0.1000 0.0031
## 150 0.5622 nan 0.1000 0.0020
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1877
## 2 1.4890 nan 0.1000 0.1319
## 3 1.4041 nan 0.1000 0.1052
## 4 1.3368 nan 0.1000 0.0847
## 5 1.2829 nan 0.1000 0.0752
## 6 1.2356 nan 0.1000 0.0674
## 7 1.1940 nan 0.1000 0.0635
## 8 1.1547 nan 0.1000 0.0504
## 9 1.1229 nan 0.1000 0.0476
## 10 1.0936 nan 0.1000 0.0374
## 20 0.8948 nan 0.1000 0.0172
## 40 0.6832 nan 0.1000 0.0080
## 60 0.5530 nan 0.1000 0.0061
## 80 0.4675 nan 0.1000 0.0056
## 100 0.4020 nan 0.1000 0.0029
## 120 0.3474 nan 0.1000 0.0031
## 140 0.3077 nan 0.1000 0.0011
## 150 0.2915 nan 0.1000 0.0020
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.2363
## 2 1.4585 nan 0.1000 0.1623
## 3 1.3570 nan 0.1000 0.1230
## 4 1.2778 nan 0.1000 0.1090
## 5 1.2094 nan 0.1000 0.0875
## 6 1.1546 nan 0.1000 0.0730
## 7 1.1069 nan 0.1000 0.0668
## 8 1.0635 nan 0.1000 0.0661
## 9 1.0215 nan 0.1000 0.0549
## 10 0.9861 nan 0.1000 0.0440
## 20 0.7533 nan 0.1000 0.0248
## 40 0.5282 nan 0.1000 0.0102
## 60 0.4033 nan 0.1000 0.0078
## 80 0.3227 nan 0.1000 0.0052
## 100 0.2625 nan 0.1000 0.0033
## 120 0.2173 nan 0.1000 0.0023
## 140 0.1845 nan 0.1000 0.0013
## 150 0.1715 nan 0.1000 0.0015
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1305
## 2 1.5235 nan 0.1000 0.0885
## 3 1.4654 nan 0.1000 0.0683
## 4 1.4213 nan 0.1000 0.0510
## 5 1.3864 nan 0.1000 0.0512
## 6 1.3532 nan 0.1000 0.0455
## 7 1.3239 nan 0.1000 0.0372
## 8 1.2994 nan 0.1000 0.0370
## 9 1.2748 nan 0.1000 0.0317
## 10 1.2538 nan 0.1000 0.0286
## 20 1.0984 nan 0.1000 0.0180
## 40 0.9265 nan 0.1000 0.0071
## 60 0.8186 nan 0.1000 0.0067
## 80 0.7397 nan 0.1000 0.0046
## 100 0.6765 nan 0.1000 0.0046
## 120 0.6249 nan 0.1000 0.0036
## 140 0.5811 nan 0.1000 0.0023
## 150 0.5620 nan 0.1000 0.0028
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1845
## 2 1.4883 nan 0.1000 0.1307
## 3 1.4058 nan 0.1000 0.1120
## 4 1.3340 nan 0.1000 0.0887
## 5 1.2776 nan 0.1000 0.0722
## 6 1.2313 nan 0.1000 0.0653
## 7 1.1893 nan 0.1000 0.0554
## 8 1.1543 nan 0.1000 0.0484
## 9 1.1233 nan 0.1000 0.0491
## 10 1.0928 nan 0.1000 0.0499
## 20 0.8914 nan 0.1000 0.0232
## 40 0.6779 nan 0.1000 0.0146
## 60 0.5481 nan 0.1000 0.0090
## 80 0.4625 nan 0.1000 0.0062
## 100 0.3923 nan 0.1000 0.0036
## 120 0.3396 nan 0.1000 0.0014
## 140 0.3011 nan 0.1000 0.0022
## 150 0.2848 nan 0.1000 0.0013
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.2374
## 2 1.4579 nan 0.1000 0.1612
## 3 1.3537 nan 0.1000 0.1280
## 4 1.2720 nan 0.1000 0.1020
## 5 1.2067 nan 0.1000 0.0934
## 6 1.1466 nan 0.1000 0.0702
## 7 1.1011 nan 0.1000 0.0656
## 8 1.0590 nan 0.1000 0.0638
## 9 1.0173 nan 0.1000 0.0649
## 10 0.9776 nan 0.1000 0.0454
## 20 0.7495 nan 0.1000 0.0256
## 40 0.5275 nan 0.1000 0.0122
## 60 0.4037 nan 0.1000 0.0063
## 80 0.3240 nan 0.1000 0.0033
## 100 0.2665 nan 0.1000 0.0027
## 120 0.2249 nan 0.1000 0.0018
## 140 0.1902 nan 0.1000 0.0015
## 150 0.1747 nan 0.1000 0.0009
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1283
## 2 1.5225 nan 0.1000 0.0859
## 3 1.4651 nan 0.1000 0.0686
## 4 1.4207 nan 0.1000 0.0553
## 5 1.3846 nan 0.1000 0.0432
## 6 1.3557 nan 0.1000 0.0481
## 7 1.3266 nan 0.1000 0.0390
## 8 1.3018 nan 0.1000 0.0316
## 9 1.2805 nan 0.1000 0.0365
## 10 1.2567 nan 0.1000 0.0289
## 20 1.0976 nan 0.1000 0.0160
## 40 0.9242 nan 0.1000 0.0075
## 60 0.8171 nan 0.1000 0.0049
## 80 0.7375 nan 0.1000 0.0054
## 100 0.6771 nan 0.1000 0.0030
## 120 0.6226 nan 0.1000 0.0048
## 140 0.5779 nan 0.1000 0.0023
## 150 0.5591 nan 0.1000 0.0019
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1833
## 2 1.4890 nan 0.1000 0.1361
## 3 1.4017 nan 0.1000 0.1037
## 4 1.3341 nan 0.1000 0.0812
## 5 1.2810 nan 0.1000 0.0665
## 6 1.2378 nan 0.1000 0.0714
## 7 1.1924 nan 0.1000 0.0678
## 8 1.1496 nan 0.1000 0.0482
## 9 1.1185 nan 0.1000 0.0524
## 10 1.0860 nan 0.1000 0.0438
## 20 0.8892 nan 0.1000 0.0164
## 40 0.6734 nan 0.1000 0.0116
## 60 0.5477 nan 0.1000 0.0097
## 80 0.4608 nan 0.1000 0.0059
## 100 0.3943 nan 0.1000 0.0051
## 120 0.3436 nan 0.1000 0.0018
## 140 0.3019 nan 0.1000 0.0020
## 150 0.2842 nan 0.1000 0.0011
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.2345
## 2 1.4593 nan 0.1000 0.1546
## 3 1.3581 nan 0.1000 0.1202
## 4 1.2803 nan 0.1000 0.1017
## 5 1.2147 nan 0.1000 0.0924
## 6 1.1551 nan 0.1000 0.0804
## 7 1.1046 nan 0.1000 0.0736
## 8 1.0583 nan 0.1000 0.0628
## 9 1.0196 nan 0.1000 0.0569
## 10 0.9832 nan 0.1000 0.0470
## 20 0.7487 nan 0.1000 0.0262
## 40 0.5293 nan 0.1000 0.0067
## 60 0.4049 nan 0.1000 0.0086
## 80 0.3246 nan 0.1000 0.0035
## 100 0.2642 nan 0.1000 0.0021
## 120 0.2219 nan 0.1000 0.0012
## 140 0.1897 nan 0.1000 0.0009
## 150 0.1753 nan 0.1000 0.0019
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1310
## 2 1.5229 nan 0.1000 0.0867
## 3 1.4642 nan 0.1000 0.0682
## 4 1.4197 nan 0.1000 0.0539
## 5 1.3842 nan 0.1000 0.0527
## 6 1.3515 nan 0.1000 0.0431
## 7 1.3236 nan 0.1000 0.0346
## 8 1.3008 nan 0.1000 0.0324
## 9 1.2791 nan 0.1000 0.0368
## 10 1.2557 nan 0.1000 0.0265
## 20 1.1014 nan 0.1000 0.0150
## 40 0.9289 nan 0.1000 0.0086
## 60 0.8225 nan 0.1000 0.0055
## 80 0.7422 nan 0.1000 0.0045
## 100 0.6776 nan 0.1000 0.0030
## 120 0.6258 nan 0.1000 0.0024
## 140 0.5813 nan 0.1000 0.0026
## 150 0.5614 nan 0.1000 0.0017
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1846
## 2 1.4887 nan 0.1000 0.1280
## 3 1.4057 nan 0.1000 0.1075
## 4 1.3366 nan 0.1000 0.0810
## 5 1.2839 nan 0.1000 0.0744
## 6 1.2366 nan 0.1000 0.0784
## 7 1.1889 nan 0.1000 0.0586
## 8 1.1514 nan 0.1000 0.0486
## 9 1.1203 nan 0.1000 0.0492
## 10 1.0901 nan 0.1000 0.0428
## 20 0.8918 nan 0.1000 0.0262
## 40 0.6811 nan 0.1000 0.0116
## 60 0.5543 nan 0.1000 0.0063
## 80 0.4687 nan 0.1000 0.0062
## 100 0.4001 nan 0.1000 0.0035
## 120 0.3510 nan 0.1000 0.0032
## 140 0.3079 nan 0.1000 0.0017
## 150 0.2895 nan 0.1000 0.0025
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.2345
## 2 1.4599 nan 0.1000 0.1615
## 3 1.3565 nan 0.1000 0.1191
## 4 1.2787 nan 0.1000 0.1152
## 5 1.2078 nan 0.1000 0.0887
## 6 1.1515 nan 0.1000 0.0772
## 7 1.1022 nan 0.1000 0.0667
## 8 1.0596 nan 0.1000 0.0517
## 9 1.0263 nan 0.1000 0.0577
## 10 0.9890 nan 0.1000 0.0564
## 20 0.7545 nan 0.1000 0.0233
## 40 0.5291 nan 0.1000 0.0148
## 60 0.3987 nan 0.1000 0.0061
## 80 0.3163 nan 0.1000 0.0044
## 100 0.2600 nan 0.1000 0.0041
## 120 0.2211 nan 0.1000 0.0021
## 140 0.1857 nan 0.1000 0.0011
## 150 0.1722 nan 0.1000 0.0010
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1255
## 2 1.5235 nan 0.1000 0.0858
## 3 1.4672 nan 0.1000 0.0652
## 4 1.4236 nan 0.1000 0.0506
## 5 1.3895 nan 0.1000 0.0467
## 6 1.3589 nan 0.1000 0.0465
## 7 1.3292 nan 0.1000 0.0384
## 8 1.3050 nan 0.1000 0.0369
## 9 1.2814 nan 0.1000 0.0360
## 10 1.2590 nan 0.1000 0.0306
## 20 1.1036 nan 0.1000 0.0164
## 40 0.9309 nan 0.1000 0.0088
## 60 0.8220 nan 0.1000 0.0054
## 80 0.7420 nan 0.1000 0.0032
## 100 0.6777 nan 0.1000 0.0033
## 120 0.6250 nan 0.1000 0.0025
## 140 0.5834 nan 0.1000 0.0023
## 150 0.5633 nan 0.1000 0.0023
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1819
## 2 1.4906 nan 0.1000 0.1290
## 3 1.4070 nan 0.1000 0.1054
## 4 1.3378 nan 0.1000 0.0796
## 5 1.2857 nan 0.1000 0.0723
## 6 1.2395 nan 0.1000 0.0636
## 7 1.2003 nan 0.1000 0.0620
## 8 1.1602 nan 0.1000 0.0542
## 9 1.1260 nan 0.1000 0.0446
## 10 1.0978 nan 0.1000 0.0503
## 20 0.8944 nan 0.1000 0.0224
## 40 0.6775 nan 0.1000 0.0112
## 60 0.5524 nan 0.1000 0.0080
## 80 0.4660 nan 0.1000 0.0039
## 100 0.3966 nan 0.1000 0.0040
## 120 0.3455 nan 0.1000 0.0022
## 140 0.3020 nan 0.1000 0.0021
## 150 0.2843 nan 0.1000 0.0024
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.2344
## 2 1.4611 nan 0.1000 0.1634
## 3 1.3578 nan 0.1000 0.1241
## 4 1.2795 nan 0.1000 0.0957
## 5 1.2175 nan 0.1000 0.0922
## 6 1.1593 nan 0.1000 0.0752
## 7 1.1116 nan 0.1000 0.0725
## 8 1.0649 nan 0.1000 0.0610
## 9 1.0265 nan 0.1000 0.0576
## 10 0.9909 nan 0.1000 0.0463
## 20 0.7571 nan 0.1000 0.0255
## 40 0.5325 nan 0.1000 0.0091
## 60 0.4077 nan 0.1000 0.0096
## 80 0.3254 nan 0.1000 0.0038
## 100 0.2656 nan 0.1000 0.0029
## 120 0.2218 nan 0.1000 0.0020
## 140 0.1878 nan 0.1000 0.0011
## 150 0.1736 nan 0.1000 0.0011
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1277
## 2 1.5240 nan 0.1000 0.0906
## 3 1.4661 nan 0.1000 0.0676
## 4 1.4215 nan 0.1000 0.0526
## 5 1.3863 nan 0.1000 0.0445
## 6 1.3561 nan 0.1000 0.0464
## 7 1.3271 nan 0.1000 0.0378
## 8 1.3024 nan 0.1000 0.0343
## 9 1.2802 nan 0.1000 0.0335
## 10 1.2588 nan 0.1000 0.0359
## 20 1.0999 nan 0.1000 0.0147
## 40 0.9266 nan 0.1000 0.0095
## 60 0.8209 nan 0.1000 0.0059
## 80 0.7418 nan 0.1000 0.0038
## 100 0.6798 nan 0.1000 0.0044
## 120 0.6256 nan 0.1000 0.0038
## 140 0.5818 nan 0.1000 0.0024
## 150 0.5611 nan 0.1000 0.0018
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1850
## 2 1.4882 nan 0.1000 0.1309
## 3 1.4043 nan 0.1000 0.1018
## 4 1.3368 nan 0.1000 0.0855
## 5 1.2820 nan 0.1000 0.0754
## 6 1.2342 nan 0.1000 0.0659
## 7 1.1923 nan 0.1000 0.0524
## 8 1.1583 nan 0.1000 0.0568
## 9 1.1228 nan 0.1000 0.0552
## 10 1.0888 nan 0.1000 0.0355
## 20 0.8924 nan 0.1000 0.0201
## 40 0.6750 nan 0.1000 0.0148
## 60 0.5501 nan 0.1000 0.0050
## 80 0.4620 nan 0.1000 0.0051
## 100 0.4021 nan 0.1000 0.0050
## 120 0.3494 nan 0.1000 0.0020
## 140 0.3064 nan 0.1000 0.0022
## 150 0.2888 nan 0.1000 0.0019
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.2330
## 2 1.4637 nan 0.1000 0.1675
## 3 1.3572 nan 0.1000 0.1219
## 4 1.2805 nan 0.1000 0.1046
## 5 1.2147 nan 0.1000 0.0844
## 6 1.1608 nan 0.1000 0.0904
## 7 1.1056 nan 0.1000 0.0708
## 8 1.0598 nan 0.1000 0.0568
## 9 1.0239 nan 0.1000 0.0617
## 10 0.9847 nan 0.1000 0.0562
## 20 0.7532 nan 0.1000 0.0221
## 40 0.5260 nan 0.1000 0.0101
## 60 0.4033 nan 0.1000 0.0055
## 80 0.3209 nan 0.1000 0.0038
## 100 0.2630 nan 0.1000 0.0032
## 120 0.2185 nan 0.1000 0.0014
## 140 0.1867 nan 0.1000 0.0013
## 150 0.1733 nan 0.1000 0.0015
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1282
## 2 1.5249 nan 0.1000 0.0886
## 3 1.4680 nan 0.1000 0.0671
## 4 1.4236 nan 0.1000 0.0530
## 5 1.3888 nan 0.1000 0.0503
## 6 1.3562 nan 0.1000 0.0388
## 7 1.3302 nan 0.1000 0.0415
## 8 1.3042 nan 0.1000 0.0360
## 9 1.2805 nan 0.1000 0.0343
## 10 1.2578 nan 0.1000 0.0301
## 20 1.0999 nan 0.1000 0.0172
## 40 0.9262 nan 0.1000 0.0082
## 60 0.8171 nan 0.1000 0.0048
## 80 0.7388 nan 0.1000 0.0050
## 100 0.6739 nan 0.1000 0.0042
## 120 0.6236 nan 0.1000 0.0031
## 140 0.5798 nan 0.1000 0.0017
## 150 0.5602 nan 0.1000 0.0021
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1852
## 2 1.4910 nan 0.1000 0.1295
## 3 1.4082 nan 0.1000 0.1100
## 4 1.3378 nan 0.1000 0.0837
## 5 1.2848 nan 0.1000 0.0725
## 6 1.2384 nan 0.1000 0.0701
## 7 1.1932 nan 0.1000 0.0548
## 8 1.1581 nan 0.1000 0.0559
## 9 1.1231 nan 0.1000 0.0465
## 10 1.0933 nan 0.1000 0.0396
## 20 0.8880 nan 0.1000 0.0222
## 40 0.6776 nan 0.1000 0.0145
## 60 0.5518 nan 0.1000 0.0068
## 80 0.4639 nan 0.1000 0.0042
## 100 0.3994 nan 0.1000 0.0029
## 120 0.3482 nan 0.1000 0.0020
## 140 0.3082 nan 0.1000 0.0018
## 150 0.2898 nan 0.1000 0.0017
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.2292
## 2 1.4648 nan 0.1000 0.1585
## 3 1.3650 nan 0.1000 0.1214
## 4 1.2860 nan 0.1000 0.1077
## 5 1.2180 nan 0.1000 0.0865
## 6 1.1630 nan 0.1000 0.0820
## 7 1.1104 nan 0.1000 0.0590
## 8 1.0718 nan 0.1000 0.0685
## 9 1.0291 nan 0.1000 0.0652
## 10 0.9887 nan 0.1000 0.0472
## 20 0.7529 nan 0.1000 0.0313
## 40 0.5208 nan 0.1000 0.0090
## 60 0.4032 nan 0.1000 0.0061
## 80 0.3222 nan 0.1000 0.0039
## 100 0.2623 nan 0.1000 0.0031
## 120 0.2207 nan 0.1000 0.0016
## 140 0.1848 nan 0.1000 0.0007
## 150 0.1706 nan 0.1000 0.0018
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.2375
## 2 1.4607 nan 0.1000 0.1616
## 3 1.3568 nan 0.1000 0.1207
## 4 1.2795 nan 0.1000 0.1089
## 5 1.2123 nan 0.1000 0.0894
## 6 1.1554 nan 0.1000 0.0766
## 7 1.1063 nan 0.1000 0.0614
## 8 1.0668 nan 0.1000 0.0697
## 9 1.0242 nan 0.1000 0.0584
## 10 0.9867 nan 0.1000 0.0508
## 20 0.7508 nan 0.1000 0.0258
## 40 0.5314 nan 0.1000 0.0102
## 60 0.4011 nan 0.1000 0.0059
## 80 0.3214 nan 0.1000 0.0056
## 100 0.2670 nan 0.1000 0.0020
## 120 0.2254 nan 0.1000 0.0019
## 140 0.1904 nan 0.1000 0.0023
## 150 0.1771 nan 0.1000 0.0010
## view model summary
summary(RfBoModel)
## var rel.inf
## roll_belt roll_belt 21.09372
## pitch_forearm pitch_forearm 10.93921
## yaw_belt yaw_belt 7.61584
## magnet_dumbbell_z magnet_dumbbell_z 6.71867
## roll_forearm roll_forearm 5.60517
## magnet_dumbbell_y magnet_dumbbell_y 5.23373
## magnet_belt_z magnet_belt_z 4.24058
## gyros_belt_z gyros_belt_z 3.68570
## accel_forearm_x accel_forearm_x 2.70499
## accel_dumbbell_y accel_dumbbell_y 2.70107
## pitch_belt pitch_belt 2.67203
## gyros_dumbbell_y gyros_dumbbell_y 2.52905
## roll_dumbbell roll_dumbbell 2.24721
## accel_dumbbell_x accel_dumbbell_x 1.99300
## accel_forearm_z accel_forearm_z 1.94759
## magnet_belt_y magnet_belt_y 1.82710
## yaw_arm yaw_arm 1.80176
## magnet_dumbbell_x magnet_dumbbell_x 1.55326
## magnet_arm_z magnet_arm_z 1.52196
## magnet_forearm_z magnet_forearm_z 1.49438
## magnet_belt_x magnet_belt_x 0.91883
## magnet_forearm_x magnet_forearm_x 0.85223
## total_accel_forearm total_accel_forearm 0.81792
## accel_belt_z accel_belt_z 0.80910
## magnet_arm_x magnet_arm_x 0.80029
## accel_dumbbell_z accel_dumbbell_z 0.77048
## total_accel_dumbbell total_accel_dumbbell 0.73065
## magnet_arm_y magnet_arm_y 0.67431
## roll_arm roll_arm 0.56302
## gyros_belt_y gyros_belt_y 0.55962
## gyros_arm_y gyros_arm_y 0.49502
## magnet_forearm_y magnet_forearm_y 0.39396
## gyros_dumbbell_x gyros_dumbbell_x 0.32784
## accel_arm_x accel_arm_x 0.32184
## gyros_forearm_z gyros_forearm_z 0.22067
## gyros_dumbbell_z gyros_dumbbell_z 0.17740
## accel_arm_y accel_arm_y 0.12884
## total_accel_arm total_accel_arm 0.11874
## yaw_dumbbell yaw_dumbbell 0.08060
## pitch_arm pitch_arm 0.06573
## gyros_forearm_x gyros_forearm_x 0.04691
## total_accel_belt total_accel_belt 0.00000
## gyros_belt_x gyros_belt_x 0.00000
## accel_belt_x accel_belt_x 0.00000
## accel_belt_y accel_belt_y 0.00000
## gyros_arm_x gyros_arm_x 0.00000
## gyros_arm_z gyros_arm_z 0.00000
## accel_arm_z accel_arm_z 0.00000
## pitch_dumbbell pitch_dumbbell 0.00000
## yaw_forearm yaw_forearm 0.00000
## gyros_forearm_y gyros_forearm_y 0.00000
## accel_forearm_y accel_forearm_y 0.00000
## plot model important variables
plot(varImp(RfBoModel))
Observation
The Boosting Algorithm with 10-Fold Cross Validation generated a good model as shown below
Accuracy : 0.9586
Kappa : 0.9477
Check Data / Test Data for Prediction
RfBoPredictn <- predict(RfBoModel, newdata=PartCheck)
RfBoConfMtrx <- confusionMatrix(RfBoPredictn, PartCheck$classe)
RfBoConfMtrx
## Confusion Matrix and Statistics
##
## Reference
## Prediction A B C D E
## A 2205 46 0 4 8
## B 20 1433 32 9 12
## C 2 36 1314 41 10
## D 3 1 18 1221 19
## E 2 2 4 11 1393
##
## Overall Statistics
##
## Accuracy : 0.964
## 95% CI : (0.96, 0.968)
## No Information Rate : 0.284
## P-Value [Acc > NIR] : < 2e-16
##
## Kappa : 0.955
## Mcnemar's Test P-Value : 4.16e-06
##
## Statistics by Class:
##
## Class: A Class: B Class: C Class: D Class: E
## Sensitivity 0.988 0.944 0.961 0.949 0.966
## Specificity 0.990 0.988 0.986 0.994 0.997
## Pos Pred Value 0.974 0.952 0.937 0.968 0.987
## Neg Pred Value 0.995 0.987 0.992 0.990 0.992
## Prevalence 0.284 0.193 0.174 0.164 0.184
## Detection Rate 0.281 0.183 0.167 0.156 0.178
## Detection Prevalence 0.288 0.192 0.179 0.161 0.180
## Balanced Accuracy 0.989 0.966 0.973 0.972 0.982
Observation
The Boosting Algorithm with 10-Fold Cross Validation when checked against the Check Data / Test Data (part of original training dataset) shows the following
Overall Accuracy : 0.9643
Out-of-Sample Error: 0.9548
95% ConfInt Lower : 0.96
95% ConfInt Upper : 0.9683
In this section,
1. We will create a model based on Random Forests - Default Method With 10-Fold Cross Validation
2. View summary of the model
3. Plot Predictive / Important Variables
4. Compare this model using other (check) part of the training data
## set seed
set.seed(707)
## generate model
cvCtrl <- trainControl(method="cv", number=10, allowParallel=T)
RfDmModel <- train(classe~ .,
method="rf",
data=PartTrain,
verbose=F,
importance=T,
trControl=cvCtrl
)
## view model summary
summary(RfDmModel)
## Length Class Mode
## call 6 -none- call
## type 1 -none- character
## predicted 11776 factor numeric
## err.rate 3000 -none- numeric
## confusion 30 -none- numeric
## votes 58880 matrix numeric
## oob.times 11776 -none- numeric
## classes 5 -none- character
## importance 364 -none- numeric
## importanceSD 312 -none- numeric
## localImportance 0 -none- NULL
## proximity 0 -none- NULL
## ntree 1 -none- numeric
## mtry 1 -none- numeric
## forest 14 -none- list
## y 11776 factor numeric
## test 0 -none- NULL
## inbag 0 -none- NULL
## xNames 52 -none- character
## problemType 1 -none- character
## tuneValue 1 data.frame list
## obsLevels 5 -none- character
## plot model important variables
plot(varImp(RfDmModel))
Observation
The Default Method with 10-Fold Cross Validation generated a very accurate model as shown below
Accuracy : 0.9916
Kappa : 0.9894
Check Data / Test Data for Prediction
RfDmPredictn <- predict(RfDmModel, newdata=PartCheck)
RfDmConfMtrx <- confusionMatrix(RfDmPredictn, PartCheck$classe)
RfDmConfMtrx
## Confusion Matrix and Statistics
##
## Reference
## Prediction A B C D E
## A 2230 22 0 1 0
## B 2 1491 3 0 1
## C 0 5 1357 18 2
## D 0 0 8 1267 5
## E 0 0 0 0 1434
##
## Overall Statistics
##
## Accuracy : 0.991
## 95% CI : (0.989, 0.993)
## No Information Rate : 0.284
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.989
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: A Class: B Class: C Class: D Class: E
## Sensitivity 0.999 0.982 0.992 0.985 0.994
## Specificity 0.996 0.999 0.996 0.998 1.000
## Pos Pred Value 0.990 0.996 0.982 0.990 1.000
## Neg Pred Value 1.000 0.996 0.998 0.997 0.999
## Prevalence 0.284 0.193 0.174 0.164 0.184
## Detection Rate 0.284 0.190 0.173 0.161 0.183
## Detection Prevalence 0.287 0.191 0.176 0.163 0.183
## Balanced Accuracy 0.998 0.991 0.994 0.992 0.997
Observation
The Default Method with 10-Fold Cross Validation when checked against the Check Data / Test Data (part of original training dataset) shows the following
Overall Accuracy : 0.9915
Out-of-Sample Error: 0.9892
95% ConfInt Lower : 0.9892
95% ConfInt Upper : 0.9934
In this section,
1. We will create a model based on Random Forests - Custom Algorithm (No Method & Tuning) with 10-Fold Cross Validation
2. View summary of the model
3. Plot Predictive / Important Variables
4. Compare this model using other (check) part of the training data
## set seed
set.seed(707)
## generate model
csCtrl <- trainControl(method="oob", number=10, allowParallel=T)
RfCaModel <- train(PartTrain$classe~.,
data=PartTrain,
tuneGrid=data.frame(mtry=10),
trControl=csCtrl
)
## view model summary
summary(RfCaModel)
## Length Class Mode
## call 4 -none- call
## type 1 -none- character
## predicted 11776 factor numeric
## err.rate 3000 -none- numeric
## confusion 30 -none- numeric
## votes 58880 matrix numeric
## oob.times 11776 -none- numeric
## classes 5 -none- character
## importance 52 -none- numeric
## importanceSD 0 -none- NULL
## localImportance 0 -none- NULL
## proximity 0 -none- NULL
## ntree 1 -none- numeric
## mtry 1 -none- numeric
## forest 14 -none- list
## y 11776 factor numeric
## test 0 -none- NULL
## inbag 0 -none- NULL
## xNames 52 -none- character
## problemType 1 -none- character
## tuneValue 1 data.frame list
## obsLevels 5 -none- character
## plot model important variables
plot(varImp(RfCaModel))
Observation
The Custom Algorithm (No Method & Tuning) with 10-Fold Cross Validation generated a decent model as shown below Accuracy : 0.9939
Kappa : 0.9923
Check Data / Test Data for Prediction
RfCaPredictn <- predict(RfCaModel, newdata=PartCheck)
RfCaConfMtrx <- confusionMatrix(RfCaPredictn, PartCheck$classe)
RfCaConfMtrx
## Confusion Matrix and Statistics
##
## Reference
## Prediction A B C D E
## A 2230 8 0 0 0
## B 2 1504 5 0 0
## C 0 6 1359 19 0
## D 0 0 4 1267 5
## E 0 0 0 0 1437
##
## Overall Statistics
##
## Accuracy : 0.994
## 95% CI : (0.992, 0.995)
## No Information Rate : 0.284
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.992
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: A Class: B Class: C Class: D Class: E
## Sensitivity 0.999 0.991 0.993 0.985 0.997
## Specificity 0.999 0.999 0.996 0.999 1.000
## Pos Pred Value 0.996 0.995 0.982 0.993 1.000
## Neg Pred Value 1.000 0.998 0.999 0.997 0.999
## Prevalence 0.284 0.193 0.174 0.164 0.184
## Detection Rate 0.284 0.192 0.173 0.161 0.183
## Detection Prevalence 0.285 0.193 0.176 0.163 0.183
## Balanced Accuracy 0.999 0.995 0.995 0.992 0.998
Observation
The Custom Algorithm (No Method & Tuning) with 10-Fold Cross Validation when checked against the Check Data / Test Data (part of original training dataset) shows the following
Overall Accuracy : 0.9938
Out-of-Sample Error: 0.9921
95% ConfInt Lower : 0.9918
95% ConfInt Upper : 0.9954
From the above Random Forests models we see the following
## Model.Type Overall.Accuracy OutOfSample.Error
## 1 Boosting Algorithm 0.9643 0.9548
## 2 Default Method 0.9915 0.9892
## 3 Custom Algorithm 0.9938 0.9921
Observation
From the above table it is clear that both Default Method & Custom Algorithm are far superior models than Boosting Algorith.
We will use both these models to evaluate the performance with the Test Data Cases.
Custom Algorithm
DcCaPredict <- predict(RfCaModel, newdata=DataCases)
DcCaPredict
## [1] B A B A A E D B A A B C B A E E A B B B
## Levels: A B C D E
Default Method
DcDmPredict <- predict(RfDmModel, newdata=DataCases)
DcDmPredict
## [1] B A B A A E D B A A B C B A E E A B B B
## Levels: A B C D E
Observation
The Prediction Vetors generated by both models are identical.
For each test case we need to submit a text file with a single capital letter (A, B, C, D, or E) corresponding to our prediction for the corresponding problem in the test data set.
We need to create a folder where we want the files to be written. Set that to be the working directory and run the script. The functions given below will create one file for each submission.
pml_write_files = function(x){
n = length(x)
for(i in 1:n){
filename = paste0("problem_id_",i,".txt")
write.table(x[i],file=filename,quote=FALSE,row.names=FALSE,col.names=FALSE)
}
}
pml_write_files(DcCaPredict)
End Of Report