Practical Machine Learning Project

Background

Using devices such as Jawbone Up, Nike FuelBand, and Fitbit it is now possible to collect a large amount of data about personal activity relatively inexpensively. These type of devices are part of the quantified self movement - a group of enthusiasts who take measurements about themselves regularly to improve their health, to find patterns in their behavior, or because they are tech geeks.

One thing that people regularly do is quantify how much of a particular activity they do, but they rarely quantify how well they do it. In this project, our goal will be to use data from accelerometers on the belt, forearm, arm, and dumbell of 6 participants. They were asked to perform barbell lifts correctly and incorrectly in 5 different ways.

Refer http://groupware.les.inf.puc-rio.br/har. See the section on the Weight Lifting Exercise Dataset.

The goal of this project is to predict the manner in which they did the exercise. This is the “class” variable in the training set. You will create a report showing how we built your model, how we used cross validation, and why we made the choices we did. Finally, we will also use our prediction model to predict 20 different test cases.

Overview

To create our prediction model we will carry out the following steps
* Load the training data.
* Remove near zero covariates and the records with >= 80% missing values.
* Partition data - Training Data Set - 60% | Checking Data Set - 40%.
* Calculate correlations between each remaining feature to the response, classe. * Utilizing the activity monitor device data, a machine learning model is to be generated using a training data set with class labels representing the 6 ways of performing the barbell lifts (supervised learning). Basically we create a model with random forests algorithm to predict classe with all other predictors using – Boosting Algorithm
– Default Method
– Custom Algorithm * Plot accuracy of the model on the scale [0.9, 1].
* Create a prediction model to predict how well 6 different people performed barbell lifts utilizing data collected from activity monitoring devices.
* Once the models are built, the performance of the madel is assessed using the Checking Data Set * The best model is to be applied to a new set of Testing data to make predictions.
* These predictions are to be submitted for automated grading in a second part of the assignment.

Models When we create prediction models on the training data, we’ll use cross validation with trainControl to help optimize the model parameters We’ll do 10-fold cross validation

cvCtrl <- trainControl(method="cv",  number=10, allowParallel=T)
csCtrl <- trainControl(method="oob", number=10, allowParallel=T)

We will use three different models that use different approaches

## Boost Method
m1 <- train(classe~., data=PartTrain, method="gmb", verbose=F, trControl=cvCtrl)
## Default Method
m2 <- train(classe~., data=PartTrain, method="rf",  verbose=F, trControl=cvCtrl)
## Custom Algorithm ... notice method is not mentioned here and 
## the trControl is different ... tuning param added
m3 <- train(classe~., data=PartTrain, tuneGrid=data.frame(mtry=10), trControl=csCtrl)

We could also try one of these if required

## Support Vector Machines Model
o1 <- train(classe~., data=PartTrain, method="svm", verbose=F, trControl=cvCtrl)
## KNN MOdel
o2 <- train(classe~., data=PartTrain, method="knn", verbose=F, trControl=cvCtrl)
## Bagged Model
o3 <- train(classe~., data=PartTrain, method="bag", verbose=F, trControl=cvCtrl)

Data

Data Location

The data called “Weight Lifting Exercises Dataset (WLE)” for this project come from this source: http://groupware.les.inf.puc-rio.br/har. Use of WLE dataset from the aforementioned site is acknowledged.

The training data for this project are available at:
https://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv

The test data are available at:
https://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv

Description

Six young health participants were asked to perform one set of 10 repetitions of the Unilateral Dumbbell Biceps Curl in five different fashions: exactly according to the specification (Class A), throwing the elbows to the front (Class B), lifting the dumbbell only halfway (Class C), lowering the dumbbell only halfway (Class D) and throwing the hips to the front (Class E).

Class A corresponds to the specified execution of the exercise, while the other 4 classes correspond to common mistakes. Participants were supervised by an experienced weight lifter to make sure the execution complied to the manner they were supposed to simulate. The exercises were performed by six male participants aged between 20-28 years, with little weight lifting experience. We made sure that all participants could easily simulate the mistakes in a safe and controlled manner by using a relatively light dumbbell (1.25kg).

Pre Process

Pre-Requisites

Before you start execution of this Rmd file,
1. Please set working dir to your repository
2. Please download the training & test dataset and copy them to your repository

setwd(<your_assignment_repository>)
training.file <- 'pml-training.csv'
testing.file <- 'pml-test.csv'
training.url <- 'http://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv'
testing.url <- 'http://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv'
download.file(training.url, training.file)
download.file(testing.url,testing.file )

knitr Global Options

knitr::opts_chunk$set(tidy=FALSE, fig.path='figures/')

Load Libraries

library(ggplot2)
library(caret)

## Warning: package 'caret' was built under R version 3.1.2

## Loading required package: lattice

library(randomForest)

## Warning: package 'randomForest' was built under R version 3.1.2

## randomForest 4.6-10
## Type rfNews() to see new features/changes/bug fixes.

library(gbm)

## Warning: package 'gbm' was built under R version 3.1.2

## Loading required package: survival
## Loading required package: splines
## 
## Attaching package: 'survival'
## 
## The following object is masked from 'package:caret':
## 
##     cluster
## 
## Loading required package: parallel
## Loaded gbm 2.1

library(survival)
library(splines)
library(parallel)
library(plyr)
#library(ipred)

Load Data

## load data
## mark as "NA"", "DIV/0"" & "" as "NA"" 
DataTrain <- read.csv("pml-training.csv", row.names=1, na.strings=c("NA","#DIV/0!", ""))
DataCases <- read.csv("pml-testing.csv", row.names=1, na.strings=c("NA","#DIV/0!", ""))

Exploratory Data Analysis

Clean Data

## remove data where there is no data i.e. 0 or NA
DataTrain <- DataTrain[,colSums(is.na(DataTrain)) == 0]
DataCases <- DataCases[,colSums(is.na(DataCases)) == 0]
## the following features (columns) are irrelevant, so removed
## user_name, raw_timestamp_part_1, raw_timestamp_part_2, 
## cvtd_timestamp, new_window, num_window 
DataTrain <- DataTrain[,-c(1:6)]
DataCases <- DataCases[,-c(1:6)]

Observation
After cleaning the data, it is seen that the training data set has
1. 19622 samples
2. 52 possible predictors
After cleaning the data, it is seen that the testcases data set has
1. 20 samples
2. 52 possible predictors

Corelations

## find correlations
TrainCors <- abs(sapply(colnames(DataTrain[, -ncol(DataTrain)]), function(x) cor(as.numeric(DataTrain[, x]), as.numeric(DataTrain$classe), method = "spearman")))
summary(TrainCors)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0015  0.0147  0.0526  0.0875  0.1380  0.3170

TrainCors

##            roll_belt           pitch_belt             yaw_belt 
##             0.128825             0.044525             0.073104 
##     total_accel_belt         gyros_belt_x         gyros_belt_y 
##             0.086895             0.008433             0.003178 
##         gyros_belt_z         accel_belt_x         accel_belt_y 
##             0.002042             0.041222             0.015762 
##         accel_belt_z        magnet_belt_x        magnet_belt_y 
##             0.137358             0.001536             0.198570 
##        magnet_belt_z             roll_arm            pitch_arm 
##             0.139846             0.052367             0.184447 
##              yaw_arm      total_accel_arm          gyros_arm_x 
##             0.027926             0.154739             0.023321 
##          gyros_arm_y          gyros_arm_z          accel_arm_x 
##             0.030942             0.014624             0.256517 
##          accel_arm_y          accel_arm_z         magnet_arm_x 
##             0.082298             0.100899             0.279462 
##         magnet_arm_y         magnet_arm_z        roll_dumbbell 
##             0.264919             0.158707             0.087774 
##       pitch_dumbbell         yaw_dumbbell total_accel_dumbbell 
##             0.099861             0.007714             0.014720 
##     gyros_dumbbell_x     gyros_dumbbell_y     gyros_dumbbell_z 
##             0.012329             0.020075             0.014287 
##     accel_dumbbell_x     accel_dumbbell_y     accel_dumbbell_z 
##             0.128826             0.014719             0.081678 
##    magnet_dumbbell_x    magnet_dumbbell_y    magnet_dumbbell_z 
##             0.151640             0.048651             0.202476 
##         roll_forearm        pitch_forearm          yaw_forearm 
##             0.052895             0.317281             0.048525 
##  total_accel_forearm      gyros_forearm_x      gyros_forearm_y 
##             0.123123             0.013357             0.005585 
##      gyros_forearm_z      accel_forearm_x      accel_forearm_y 
##             0.001525             0.205093             0.020265 
##      accel_forearm_z     magnet_forearm_x     magnet_forearm_y 
##             0.006820             0.194284             0.112262 
##     magnet_forearm_z 
##             0.050460

Observation
1. No conclusion can be drawn from the above cor summary. Corelations are not found.

Plot Predictors

## plot predictors 
plot(DataTrain[, names(which.max(TrainCors))], DataTrain[, names(which.max(TrainCors[-which.max(TrainCors)]))], col=DataTrain$classe, pch=19, cex=0.1, xlab=names(which.max(TrainCors)), ylab=names(which.max(TrainCors[-which.max(TrainCors)])))

plot of chunk plot_predictors Observation
1. There seems to be no strong predictors that correlates with classe, so linear regression model is probably not suitable option for WLE data.
2. RandomForests algorithms will be better suited for robust predictions for WLE data.

Partition Data

PartInfos <- createDataPartition(DataTrain$classe, p=0.60, list=FALSE)
PartTrain <- DataTrain[PartInfos, ]
PartCheck <- DataTrain[-PartInfos, ]

Random Forests - Boosting Algorithm With 10-Fold Cross Validation

In this section,
1. We will create a model based on Random Forests - Boosting Algorithm with 10-Fold Cross Validation
2. View summary of the model
3. Plot Predictive / Important Variables
4. Compare this model using other (check) part of the training data

Create Model

## set seed
set.seed(707)
## create model
cvCtrl <- trainControl(method="cv",  number=10, allowParallel=T)
RfBoModel <- train(classe~., 
                  method="gbm", 
                  data=PartTrain, 
                  trControl=cvCtrl
                  )

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.1274
##      2        1.5245             nan     0.1000    0.0868
##      3        1.4671             nan     0.1000    0.0683
##      4        1.4219             nan     0.1000    0.0530
##      5        1.3863             nan     0.1000    0.0458
##      6        1.3569             nan     0.1000    0.0462
##      7        1.3272             nan     0.1000    0.0371
##      8        1.3029             nan     0.1000    0.0344
##      9        1.2812             nan     0.1000    0.0336
##     10        1.2590             nan     0.1000    0.0293
##     20        1.1037             nan     0.1000    0.0191
##     40        0.9292             nan     0.1000    0.0102
##     60        0.8217             nan     0.1000    0.0064
##     80        0.7418             nan     0.1000    0.0045
##    100        0.6806             nan     0.1000    0.0051
##    120        0.6268             nan     0.1000    0.0015
##    140        0.5834             nan     0.1000    0.0025
##    150        0.5650             nan     0.1000    0.0027
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.1851
##      2        1.4902             nan     0.1000    0.1355
##      3        1.4053             nan     0.1000    0.1046
##      4        1.3381             nan     0.1000    0.0848
##      5        1.2831             nan     0.1000    0.0750
##      6        1.2356             nan     0.1000    0.0725
##      7        1.1909             nan     0.1000    0.0571
##      8        1.1553             nan     0.1000    0.0513
##      9        1.1232             nan     0.1000    0.0494
##     10        1.0915             nan     0.1000    0.0441
##     20        0.8850             nan     0.1000    0.0266
##     40        0.6685             nan     0.1000    0.0102
##     60        0.5450             nan     0.1000    0.0084
##     80        0.4524             nan     0.1000    0.0053
##    100        0.3902             nan     0.1000    0.0042
##    120        0.3397             nan     0.1000    0.0036
##    140        0.3000             nan     0.1000    0.0019
##    150        0.2817             nan     0.1000    0.0014
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.2272
##      2        1.4660             nan     0.1000    0.1652
##      3        1.3617             nan     0.1000    0.1257
##      4        1.2826             nan     0.1000    0.1131
##      5        1.2122             nan     0.1000    0.0909
##      6        1.1540             nan     0.1000    0.0771
##      7        1.1055             nan     0.1000    0.0633
##      8        1.0643             nan     0.1000    0.0658
##      9        1.0238             nan     0.1000    0.0468
##     10        0.9939             nan     0.1000    0.0472
##     20        0.7618             nan     0.1000    0.0274
##     40        0.5344             nan     0.1000    0.0140
##     60        0.4101             nan     0.1000    0.0081
##     80        0.3263             nan     0.1000    0.0041
##    100        0.2682             nan     0.1000    0.0037
##    120        0.2225             nan     0.1000    0.0025
##    140        0.1877             nan     0.1000    0.0018
##    150        0.1749             nan     0.1000    0.0013
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.1342
##      2        1.5226             nan     0.1000    0.0884
##      3        1.4646             nan     0.1000    0.0695
##      4        1.4192             nan     0.1000    0.0539
##      5        1.3841             nan     0.1000    0.0461
##      6        1.3545             nan     0.1000    0.0430
##      7        1.3265             nan     0.1000    0.0418
##      8        1.3004             nan     0.1000    0.0370
##      9        1.2772             nan     0.1000    0.0377
##     10        1.2530             nan     0.1000    0.0294
##     20        1.0963             nan     0.1000    0.0191
##     40        0.9243             nan     0.1000    0.0095
##     60        0.8185             nan     0.1000    0.0068
##     80        0.7405             nan     0.1000    0.0042
##    100        0.6749             nan     0.1000    0.0049
##    120        0.6214             nan     0.1000    0.0024
##    140        0.5788             nan     0.1000    0.0028
##    150        0.5597             nan     0.1000    0.0030
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.1858
##      2        1.4880             nan     0.1000    0.1362
##      3        1.4028             nan     0.1000    0.1094
##      4        1.3335             nan     0.1000    0.0823
##      5        1.2804             nan     0.1000    0.0755
##      6        1.2325             nan     0.1000    0.0644
##      7        1.1907             nan     0.1000    0.0598
##      8        1.1514             nan     0.1000    0.0492
##      9        1.1193             nan     0.1000    0.0492
##     10        1.0872             nan     0.1000    0.0397
##     20        0.8883             nan     0.1000    0.0249
##     40        0.6720             nan     0.1000    0.0115
##     60        0.5489             nan     0.1000    0.0070
##     80        0.4641             nan     0.1000    0.0040
##    100        0.3986             nan     0.1000    0.0034
##    120        0.3492             nan     0.1000    0.0047
##    140        0.3048             nan     0.1000    0.0020
##    150        0.2862             nan     0.1000    0.0017
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.2308
##      2        1.4619             nan     0.1000    0.1596
##      3        1.3603             nan     0.1000    0.1232
##      4        1.2808             nan     0.1000    0.1052
##      5        1.2137             nan     0.1000    0.0978
##      6        1.1530             nan     0.1000    0.0719
##      7        1.1058             nan     0.1000    0.0603
##      8        1.0659             nan     0.1000    0.0513
##      9        1.0317             nan     0.1000    0.0703
##     10        0.9870             nan     0.1000    0.0464
##     20        0.7541             nan     0.1000    0.0251
##     40        0.5281             nan     0.1000    0.0140
##     60        0.4037             nan     0.1000    0.0055
##     80        0.3225             nan     0.1000    0.0050
##    100        0.2651             nan     0.1000    0.0023
##    120        0.2226             nan     0.1000    0.0023
##    140        0.1891             nan     0.1000    0.0011
##    150        0.1748             nan     0.1000    0.0008
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.1296
##      2        1.5240             nan     0.1000    0.0859
##      3        1.4670             nan     0.1000    0.0669
##      4        1.4230             nan     0.1000    0.0509
##      5        1.3882             nan     0.1000    0.0452
##      6        1.3581             nan     0.1000    0.0445
##      7        1.3290             nan     0.1000    0.0384
##      8        1.3043             nan     0.1000    0.0354
##      9        1.2815             nan     0.1000    0.0330
##     10        1.2586             nan     0.1000    0.0286
##     20        1.1032             nan     0.1000    0.0171
##     40        0.9309             nan     0.1000    0.0101
##     60        0.8216             nan     0.1000    0.0066
##     80        0.7443             nan     0.1000    0.0056
##    100        0.6798             nan     0.1000    0.0034
##    120        0.6314             nan     0.1000    0.0025
##    140        0.5866             nan     0.1000    0.0014
##    150        0.5673             nan     0.1000    0.0024
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.1787
##      2        1.4903             nan     0.1000    0.1304
##      3        1.4064             nan     0.1000    0.1013
##      4        1.3407             nan     0.1000    0.0876
##      5        1.2852             nan     0.1000    0.0686
##      6        1.2411             nan     0.1000    0.0589
##      7        1.2033             nan     0.1000    0.0598
##      8        1.1655             nan     0.1000    0.0501
##      9        1.1342             nan     0.1000    0.0580
##     10        1.0991             nan     0.1000    0.0427
##     20        0.8919             nan     0.1000    0.0206
##     40        0.6768             nan     0.1000    0.0118
##     60        0.5516             nan     0.1000    0.0081
##     80        0.4660             nan     0.1000    0.0028
##    100        0.3985             nan     0.1000    0.0036
##    120        0.3466             nan     0.1000    0.0023
##    140        0.3065             nan     0.1000    0.0015
##    150        0.2879             nan     0.1000    0.0023
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.2329
##      2        1.4626             nan     0.1000    0.1607
##      3        1.3611             nan     0.1000    0.1256
##      4        1.2803             nan     0.1000    0.1110
##      5        1.2107             nan     0.1000    0.0836
##      6        1.1574             nan     0.1000    0.0767
##      7        1.1088             nan     0.1000    0.0741
##      8        1.0620             nan     0.1000    0.0606
##      9        1.0237             nan     0.1000    0.0481
##     10        0.9937             nan     0.1000    0.0460
##     20        0.7543             nan     0.1000    0.0212
##     40        0.5302             nan     0.1000    0.0092
##     60        0.4030             nan     0.1000    0.0044
##     80        0.3208             nan     0.1000    0.0046
##    100        0.2646             nan     0.1000    0.0043
##    120        0.2214             nan     0.1000    0.0033
##    140        0.1871             nan     0.1000    0.0024
##    150        0.1736             nan     0.1000    0.0012
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.1322
##      2        1.5228             nan     0.1000    0.0893
##      3        1.4656             nan     0.1000    0.0665
##      4        1.4213             nan     0.1000    0.0524
##      5        1.3872             nan     0.1000    0.0498
##      6        1.3545             nan     0.1000    0.0393
##      7        1.3290             nan     0.1000    0.0420
##      8        1.3035             nan     0.1000    0.0334
##      9        1.2820             nan     0.1000    0.0353
##     10        1.2582             nan     0.1000    0.0308
##     20        1.0983             nan     0.1000    0.0180
##     40        0.9251             nan     0.1000    0.0102
##     60        0.8169             nan     0.1000    0.0056
##     80        0.7374             nan     0.1000    0.0055
##    100        0.6766             nan     0.1000    0.0036
##    120        0.6234             nan     0.1000    0.0038
##    140        0.5805             nan     0.1000    0.0031
##    150        0.5622             nan     0.1000    0.0020
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.1877
##      2        1.4890             nan     0.1000    0.1319
##      3        1.4041             nan     0.1000    0.1052
##      4        1.3368             nan     0.1000    0.0847
##      5        1.2829             nan     0.1000    0.0752
##      6        1.2356             nan     0.1000    0.0674
##      7        1.1940             nan     0.1000    0.0635
##      8        1.1547             nan     0.1000    0.0504
##      9        1.1229             nan     0.1000    0.0476
##     10        1.0936             nan     0.1000    0.0374
##     20        0.8948             nan     0.1000    0.0172
##     40        0.6832             nan     0.1000    0.0080
##     60        0.5530             nan     0.1000    0.0061
##     80        0.4675             nan     0.1000    0.0056
##    100        0.4020             nan     0.1000    0.0029
##    120        0.3474             nan     0.1000    0.0031
##    140        0.3077             nan     0.1000    0.0011
##    150        0.2915             nan     0.1000    0.0020
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.2363
##      2        1.4585             nan     0.1000    0.1623
##      3        1.3570             nan     0.1000    0.1230
##      4        1.2778             nan     0.1000    0.1090
##      5        1.2094             nan     0.1000    0.0875
##      6        1.1546             nan     0.1000    0.0730
##      7        1.1069             nan     0.1000    0.0668
##      8        1.0635             nan     0.1000    0.0661
##      9        1.0215             nan     0.1000    0.0549
##     10        0.9861             nan     0.1000    0.0440
##     20        0.7533             nan     0.1000    0.0248
##     40        0.5282             nan     0.1000    0.0102
##     60        0.4033             nan     0.1000    0.0078
##     80        0.3227             nan     0.1000    0.0052
##    100        0.2625             nan     0.1000    0.0033
##    120        0.2173             nan     0.1000    0.0023
##    140        0.1845             nan     0.1000    0.0013
##    150        0.1715             nan     0.1000    0.0015
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.1305
##      2        1.5235             nan     0.1000    0.0885
##      3        1.4654             nan     0.1000    0.0683
##      4        1.4213             nan     0.1000    0.0510
##      5        1.3864             nan     0.1000    0.0512
##      6        1.3532             nan     0.1000    0.0455
##      7        1.3239             nan     0.1000    0.0372
##      8        1.2994             nan     0.1000    0.0370
##      9        1.2748             nan     0.1000    0.0317
##     10        1.2538             nan     0.1000    0.0286
##     20        1.0984             nan     0.1000    0.0180
##     40        0.9265             nan     0.1000    0.0071
##     60        0.8186             nan     0.1000    0.0067
##     80        0.7397             nan     0.1000    0.0046
##    100        0.6765             nan     0.1000    0.0046
##    120        0.6249             nan     0.1000    0.0036
##    140        0.5811             nan     0.1000    0.0023
##    150        0.5620             nan     0.1000    0.0028
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.1845
##      2        1.4883             nan     0.1000    0.1307
##      3        1.4058             nan     0.1000    0.1120
##      4        1.3340             nan     0.1000    0.0887
##      5        1.2776             nan     0.1000    0.0722
##      6        1.2313             nan     0.1000    0.0653
##      7        1.1893             nan     0.1000    0.0554
##      8        1.1543             nan     0.1000    0.0484
##      9        1.1233             nan     0.1000    0.0491
##     10        1.0928             nan     0.1000    0.0499
##     20        0.8914             nan     0.1000    0.0232
##     40        0.6779             nan     0.1000    0.0146
##     60        0.5481             nan     0.1000    0.0090
##     80        0.4625             nan     0.1000    0.0062
##    100        0.3923             nan     0.1000    0.0036
##    120        0.3396             nan     0.1000    0.0014
##    140        0.3011             nan     0.1000    0.0022
##    150        0.2848             nan     0.1000    0.0013
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.2374
##      2        1.4579             nan     0.1000    0.1612
##      3        1.3537             nan     0.1000    0.1280
##      4        1.2720             nan     0.1000    0.1020
##      5        1.2067             nan     0.1000    0.0934
##      6        1.1466             nan     0.1000    0.0702
##      7        1.1011             nan     0.1000    0.0656
##      8        1.0590             nan     0.1000    0.0638
##      9        1.0173             nan     0.1000    0.0649
##     10        0.9776             nan     0.1000    0.0454
##     20        0.7495             nan     0.1000    0.0256
##     40        0.5275             nan     0.1000    0.0122
##     60        0.4037             nan     0.1000    0.0063
##     80        0.3240             nan     0.1000    0.0033
##    100        0.2665             nan     0.1000    0.0027
##    120        0.2249             nan     0.1000    0.0018
##    140        0.1902             nan     0.1000    0.0015
##    150        0.1747             nan     0.1000    0.0009
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.1283
##      2        1.5225             nan     0.1000    0.0859
##      3        1.4651             nan     0.1000    0.0686
##      4        1.4207             nan     0.1000    0.0553
##      5        1.3846             nan     0.1000    0.0432
##      6        1.3557             nan     0.1000    0.0481
##      7        1.3266             nan     0.1000    0.0390
##      8        1.3018             nan     0.1000    0.0316
##      9        1.2805             nan     0.1000    0.0365
##     10        1.2567             nan     0.1000    0.0289
##     20        1.0976             nan     0.1000    0.0160
##     40        0.9242             nan     0.1000    0.0075
##     60        0.8171             nan     0.1000    0.0049
##     80        0.7375             nan     0.1000    0.0054
##    100        0.6771             nan     0.1000    0.0030
##    120        0.6226             nan     0.1000    0.0048
##    140        0.5779             nan     0.1000    0.0023
##    150        0.5591             nan     0.1000    0.0019
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.1833
##      2        1.4890             nan     0.1000    0.1361
##      3        1.4017             nan     0.1000    0.1037
##      4        1.3341             nan     0.1000    0.0812
##      5        1.2810             nan     0.1000    0.0665
##      6        1.2378             nan     0.1000    0.0714
##      7        1.1924             nan     0.1000    0.0678
##      8        1.1496             nan     0.1000    0.0482
##      9        1.1185             nan     0.1000    0.0524
##     10        1.0860             nan     0.1000    0.0438
##     20        0.8892             nan     0.1000    0.0164
##     40        0.6734             nan     0.1000    0.0116
##     60        0.5477             nan     0.1000    0.0097
##     80        0.4608             nan     0.1000    0.0059
##    100        0.3943             nan     0.1000    0.0051
##    120        0.3436             nan     0.1000    0.0018
##    140        0.3019             nan     0.1000    0.0020
##    150        0.2842             nan     0.1000    0.0011
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.2345
##      2        1.4593             nan     0.1000    0.1546
##      3        1.3581             nan     0.1000    0.1202
##      4        1.2803             nan     0.1000    0.1017
##      5        1.2147             nan     0.1000    0.0924
##      6        1.1551             nan     0.1000    0.0804
##      7        1.1046             nan     0.1000    0.0736
##      8        1.0583             nan     0.1000    0.0628
##      9        1.0196             nan     0.1000    0.0569
##     10        0.9832             nan     0.1000    0.0470
##     20        0.7487             nan     0.1000    0.0262
##     40        0.5293             nan     0.1000    0.0067
##     60        0.4049             nan     0.1000    0.0086
##     80        0.3246             nan     0.1000    0.0035
##    100        0.2642             nan     0.1000    0.0021
##    120        0.2219             nan     0.1000    0.0012
##    140        0.1897             nan     0.1000    0.0009
##    150        0.1753             nan     0.1000    0.0019
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.1310
##      2        1.5229             nan     0.1000    0.0867
##      3        1.4642             nan     0.1000    0.0682
##      4        1.4197             nan     0.1000    0.0539
##      5        1.3842             nan     0.1000    0.0527
##      6        1.3515             nan     0.1000    0.0431
##      7        1.3236             nan     0.1000    0.0346
##      8        1.3008             nan     0.1000    0.0324
##      9        1.2791             nan     0.1000    0.0368
##     10        1.2557             nan     0.1000    0.0265
##     20        1.1014             nan     0.1000    0.0150
##     40        0.9289             nan     0.1000    0.0086
##     60        0.8225             nan     0.1000    0.0055
##     80        0.7422             nan     0.1000    0.0045
##    100        0.6776             nan     0.1000    0.0030
##    120        0.6258             nan     0.1000    0.0024
##    140        0.5813             nan     0.1000    0.0026
##    150        0.5614             nan     0.1000    0.0017
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.1846
##      2        1.4887             nan     0.1000    0.1280
##      3        1.4057             nan     0.1000    0.1075
##      4        1.3366             nan     0.1000    0.0810
##      5        1.2839             nan     0.1000    0.0744
##      6        1.2366             nan     0.1000    0.0784
##      7        1.1889             nan     0.1000    0.0586
##      8        1.1514             nan     0.1000    0.0486
##      9        1.1203             nan     0.1000    0.0492
##     10        1.0901             nan     0.1000    0.0428
##     20        0.8918             nan     0.1000    0.0262
##     40        0.6811             nan     0.1000    0.0116
##     60        0.5543             nan     0.1000    0.0063
##     80        0.4687             nan     0.1000    0.0062
##    100        0.4001             nan     0.1000    0.0035
##    120        0.3510             nan     0.1000    0.0032
##    140        0.3079             nan     0.1000    0.0017
##    150        0.2895             nan     0.1000    0.0025
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.2345
##      2        1.4599             nan     0.1000    0.1615
##      3        1.3565             nan     0.1000    0.1191
##      4        1.2787             nan     0.1000    0.1152
##      5        1.2078             nan     0.1000    0.0887
##      6        1.1515             nan     0.1000    0.0772
##      7        1.1022             nan     0.1000    0.0667
##      8        1.0596             nan     0.1000    0.0517
##      9        1.0263             nan     0.1000    0.0577
##     10        0.9890             nan     0.1000    0.0564
##     20        0.7545             nan     0.1000    0.0233
##     40        0.5291             nan     0.1000    0.0148
##     60        0.3987             nan     0.1000    0.0061
##     80        0.3163             nan     0.1000    0.0044
##    100        0.2600             nan     0.1000    0.0041
##    120        0.2211             nan     0.1000    0.0021
##    140        0.1857             nan     0.1000    0.0011
##    150        0.1722             nan     0.1000    0.0010
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.1255
##      2        1.5235             nan     0.1000    0.0858
##      3        1.4672             nan     0.1000    0.0652
##      4        1.4236             nan     0.1000    0.0506
##      5        1.3895             nan     0.1000    0.0467
##      6        1.3589             nan     0.1000    0.0465
##      7        1.3292             nan     0.1000    0.0384
##      8        1.3050             nan     0.1000    0.0369
##      9        1.2814             nan     0.1000    0.0360
##     10        1.2590             nan     0.1000    0.0306
##     20        1.1036             nan     0.1000    0.0164
##     40        0.9309             nan     0.1000    0.0088
##     60        0.8220             nan     0.1000    0.0054
##     80        0.7420             nan     0.1000    0.0032
##    100        0.6777             nan     0.1000    0.0033
##    120        0.6250             nan     0.1000    0.0025
##    140        0.5834             nan     0.1000    0.0023
##    150        0.5633             nan     0.1000    0.0023
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.1819
##      2        1.4906             nan     0.1000    0.1290
##      3        1.4070             nan     0.1000    0.1054
##      4        1.3378             nan     0.1000    0.0796
##      5        1.2857             nan     0.1000    0.0723
##      6        1.2395             nan     0.1000    0.0636
##      7        1.2003             nan     0.1000    0.0620
##      8        1.1602             nan     0.1000    0.0542
##      9        1.1260             nan     0.1000    0.0446
##     10        1.0978             nan     0.1000    0.0503
##     20        0.8944             nan     0.1000    0.0224
##     40        0.6775             nan     0.1000    0.0112
##     60        0.5524             nan     0.1000    0.0080
##     80        0.4660             nan     0.1000    0.0039
##    100        0.3966             nan     0.1000    0.0040
##    120        0.3455             nan     0.1000    0.0022
##    140        0.3020             nan     0.1000    0.0021
##    150        0.2843             nan     0.1000    0.0024
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.2344
##      2        1.4611             nan     0.1000    0.1634
##      3        1.3578             nan     0.1000    0.1241
##      4        1.2795             nan     0.1000    0.0957
##      5        1.2175             nan     0.1000    0.0922
##      6        1.1593             nan     0.1000    0.0752
##      7        1.1116             nan     0.1000    0.0725
##      8        1.0649             nan     0.1000    0.0610
##      9        1.0265             nan     0.1000    0.0576
##     10        0.9909             nan     0.1000    0.0463
##     20        0.7571             nan     0.1000    0.0255
##     40        0.5325             nan     0.1000    0.0091
##     60        0.4077             nan     0.1000    0.0096
##     80        0.3254             nan     0.1000    0.0038
##    100        0.2656             nan     0.1000    0.0029
##    120        0.2218             nan     0.1000    0.0020
##    140        0.1878             nan     0.1000    0.0011
##    150        0.1736             nan     0.1000    0.0011
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.1277
##      2        1.5240             nan     0.1000    0.0906
##      3        1.4661             nan     0.1000    0.0676
##      4        1.4215             nan     0.1000    0.0526
##      5        1.3863             nan     0.1000    0.0445
##      6        1.3561             nan     0.1000    0.0464
##      7        1.3271             nan     0.1000    0.0378
##      8        1.3024             nan     0.1000    0.0343
##      9        1.2802             nan     0.1000    0.0335
##     10        1.2588             nan     0.1000    0.0359
##     20        1.0999             nan     0.1000    0.0147
##     40        0.9266             nan     0.1000    0.0095
##     60        0.8209             nan     0.1000    0.0059
##     80        0.7418             nan     0.1000    0.0038
##    100        0.6798             nan     0.1000    0.0044
##    120        0.6256             nan     0.1000    0.0038
##    140        0.5818             nan     0.1000    0.0024
##    150        0.5611             nan     0.1000    0.0018
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.1850
##      2        1.4882             nan     0.1000    0.1309
##      3        1.4043             nan     0.1000    0.1018
##      4        1.3368             nan     0.1000    0.0855
##      5        1.2820             nan     0.1000    0.0754
##      6        1.2342             nan     0.1000    0.0659
##      7        1.1923             nan     0.1000    0.0524
##      8        1.1583             nan     0.1000    0.0568
##      9        1.1228             nan     0.1000    0.0552
##     10        1.0888             nan     0.1000    0.0355
##     20        0.8924             nan     0.1000    0.0201
##     40        0.6750             nan     0.1000    0.0148
##     60        0.5501             nan     0.1000    0.0050
##     80        0.4620             nan     0.1000    0.0051
##    100        0.4021             nan     0.1000    0.0050
##    120        0.3494             nan     0.1000    0.0020
##    140        0.3064             nan     0.1000    0.0022
##    150        0.2888             nan     0.1000    0.0019
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.2330
##      2        1.4637             nan     0.1000    0.1675
##      3        1.3572             nan     0.1000    0.1219
##      4        1.2805             nan     0.1000    0.1046
##      5        1.2147             nan     0.1000    0.0844
##      6        1.1608             nan     0.1000    0.0904
##      7        1.1056             nan     0.1000    0.0708
##      8        1.0598             nan     0.1000    0.0568
##      9        1.0239             nan     0.1000    0.0617
##     10        0.9847             nan     0.1000    0.0562
##     20        0.7532             nan     0.1000    0.0221
##     40        0.5260             nan     0.1000    0.0101
##     60        0.4033             nan     0.1000    0.0055
##     80        0.3209             nan     0.1000    0.0038
##    100        0.2630             nan     0.1000    0.0032
##    120        0.2185             nan     0.1000    0.0014
##    140        0.1867             nan     0.1000    0.0013
##    150        0.1733             nan     0.1000    0.0015
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.1282
##      2        1.5249             nan     0.1000    0.0886
##      3        1.4680             nan     0.1000    0.0671
##      4        1.4236             nan     0.1000    0.0530
##      5        1.3888             nan     0.1000    0.0503
##      6        1.3562             nan     0.1000    0.0388
##      7        1.3302             nan     0.1000    0.0415
##      8        1.3042             nan     0.1000    0.0360
##      9        1.2805             nan     0.1000    0.0343
##     10        1.2578             nan     0.1000    0.0301
##     20        1.0999             nan     0.1000    0.0172
##     40        0.9262             nan     0.1000    0.0082
##     60        0.8171             nan     0.1000    0.0048
##     80        0.7388             nan     0.1000    0.0050
##    100        0.6739             nan     0.1000    0.0042
##    120        0.6236             nan     0.1000    0.0031
##    140        0.5798             nan     0.1000    0.0017
##    150        0.5602             nan     0.1000    0.0021
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.1852
##      2        1.4910             nan     0.1000    0.1295
##      3        1.4082             nan     0.1000    0.1100
##      4        1.3378             nan     0.1000    0.0837
##      5        1.2848             nan     0.1000    0.0725
##      6        1.2384             nan     0.1000    0.0701
##      7        1.1932             nan     0.1000    0.0548
##      8        1.1581             nan     0.1000    0.0559
##      9        1.1231             nan     0.1000    0.0465
##     10        1.0933             nan     0.1000    0.0396
##     20        0.8880             nan     0.1000    0.0222
##     40        0.6776             nan     0.1000    0.0145
##     60        0.5518             nan     0.1000    0.0068
##     80        0.4639             nan     0.1000    0.0042
##    100        0.3994             nan     0.1000    0.0029
##    120        0.3482             nan     0.1000    0.0020
##    140        0.3082             nan     0.1000    0.0018
##    150        0.2898             nan     0.1000    0.0017
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.2292
##      2        1.4648             nan     0.1000    0.1585
##      3        1.3650             nan     0.1000    0.1214
##      4        1.2860             nan     0.1000    0.1077
##      5        1.2180             nan     0.1000    0.0865
##      6        1.1630             nan     0.1000    0.0820
##      7        1.1104             nan     0.1000    0.0590
##      8        1.0718             nan     0.1000    0.0685
##      9        1.0291             nan     0.1000    0.0652
##     10        0.9887             nan     0.1000    0.0472
##     20        0.7529             nan     0.1000    0.0313
##     40        0.5208             nan     0.1000    0.0090
##     60        0.4032             nan     0.1000    0.0061
##     80        0.3222             nan     0.1000    0.0039
##    100        0.2623             nan     0.1000    0.0031
##    120        0.2207             nan     0.1000    0.0016
##    140        0.1848             nan     0.1000    0.0007
##    150        0.1706             nan     0.1000    0.0018
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.2375
##      2        1.4607             nan     0.1000    0.1616
##      3        1.3568             nan     0.1000    0.1207
##      4        1.2795             nan     0.1000    0.1089
##      5        1.2123             nan     0.1000    0.0894
##      6        1.1554             nan     0.1000    0.0766
##      7        1.1063             nan     0.1000    0.0614
##      8        1.0668             nan     0.1000    0.0697
##      9        1.0242             nan     0.1000    0.0584
##     10        0.9867             nan     0.1000    0.0508
##     20        0.7508             nan     0.1000    0.0258
##     40        0.5314             nan     0.1000    0.0102
##     60        0.4011             nan     0.1000    0.0059
##     80        0.3214             nan     0.1000    0.0056
##    100        0.2670             nan     0.1000    0.0020
##    120        0.2254             nan     0.1000    0.0019
##    140        0.1904             nan     0.1000    0.0023
##    150        0.1771             nan     0.1000    0.0010

## view model summary
summary(RfBoModel)

plot of chunk info_boost

##                                       var  rel.inf
## roll_belt                       roll_belt 21.09372
## pitch_forearm               pitch_forearm 10.93921
## yaw_belt                         yaw_belt  7.61584
## magnet_dumbbell_z       magnet_dumbbell_z  6.71867
## roll_forearm                 roll_forearm  5.60517
## magnet_dumbbell_y       magnet_dumbbell_y  5.23373
## magnet_belt_z               magnet_belt_z  4.24058
## gyros_belt_z                 gyros_belt_z  3.68570
## accel_forearm_x           accel_forearm_x  2.70499
## accel_dumbbell_y         accel_dumbbell_y  2.70107
## pitch_belt                     pitch_belt  2.67203
## gyros_dumbbell_y         gyros_dumbbell_y  2.52905
## roll_dumbbell               roll_dumbbell  2.24721
## accel_dumbbell_x         accel_dumbbell_x  1.99300
## accel_forearm_z           accel_forearm_z  1.94759
## magnet_belt_y               magnet_belt_y  1.82710
## yaw_arm                           yaw_arm  1.80176
## magnet_dumbbell_x       magnet_dumbbell_x  1.55326
## magnet_arm_z                 magnet_arm_z  1.52196
## magnet_forearm_z         magnet_forearm_z  1.49438
## magnet_belt_x               magnet_belt_x  0.91883
## magnet_forearm_x         magnet_forearm_x  0.85223
## total_accel_forearm   total_accel_forearm  0.81792
## accel_belt_z                 accel_belt_z  0.80910
## magnet_arm_x                 magnet_arm_x  0.80029
## accel_dumbbell_z         accel_dumbbell_z  0.77048
## total_accel_dumbbell total_accel_dumbbell  0.73065
## magnet_arm_y                 magnet_arm_y  0.67431
## roll_arm                         roll_arm  0.56302
## gyros_belt_y                 gyros_belt_y  0.55962
## gyros_arm_y                   gyros_arm_y  0.49502
## magnet_forearm_y         magnet_forearm_y  0.39396
## gyros_dumbbell_x         gyros_dumbbell_x  0.32784
## accel_arm_x                   accel_arm_x  0.32184
## gyros_forearm_z           gyros_forearm_z  0.22067
## gyros_dumbbell_z         gyros_dumbbell_z  0.17740
## accel_arm_y                   accel_arm_y  0.12884
## total_accel_arm           total_accel_arm  0.11874
## yaw_dumbbell                 yaw_dumbbell  0.08060
## pitch_arm                       pitch_arm  0.06573
## gyros_forearm_x           gyros_forearm_x  0.04691
## total_accel_belt         total_accel_belt  0.00000
## gyros_belt_x                 gyros_belt_x  0.00000
## accel_belt_x                 accel_belt_x  0.00000
## accel_belt_y                 accel_belt_y  0.00000
## gyros_arm_x                   gyros_arm_x  0.00000
## gyros_arm_z                   gyros_arm_z  0.00000
## accel_arm_z                   accel_arm_z  0.00000
## pitch_dumbbell             pitch_dumbbell  0.00000
## yaw_forearm                   yaw_forearm  0.00000
## gyros_forearm_y           gyros_forearm_y  0.00000
## accel_forearm_y           accel_forearm_y  0.00000

## plot model important variables
plot(varImp(RfBoModel))

plot of chunk info_boost

Observation
The Boosting Algorithm with 10-Fold Cross Validation generated a good model as shown below
Accuracy : 0.9586
Kappa : 0.9477

Check Data / Test Data for Prediction

RfBoPredictn <- predict(RfBoModel, newdata=PartCheck)
RfBoConfMtrx <- confusionMatrix(RfBoPredictn, PartCheck$classe)
RfBoConfMtrx

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    A    B    C    D    E
##          A 2205   46    0    4    8
##          B   20 1433   32    9   12
##          C    2   36 1314   41   10
##          D    3    1   18 1221   19
##          E    2    2    4   11 1393
## 
## Overall Statistics
##                                        
##                Accuracy : 0.964        
##                  95% CI : (0.96, 0.968)
##     No Information Rate : 0.284        
##     P-Value [Acc > NIR] : < 2e-16      
##                                        
##                   Kappa : 0.955        
##  Mcnemar's Test P-Value : 4.16e-06     
## 
## Statistics by Class:
## 
##                      Class: A Class: B Class: C Class: D Class: E
## Sensitivity             0.988    0.944    0.961    0.949    0.966
## Specificity             0.990    0.988    0.986    0.994    0.997
## Pos Pred Value          0.974    0.952    0.937    0.968    0.987
## Neg Pred Value          0.995    0.987    0.992    0.990    0.992
## Prevalence              0.284    0.193    0.174    0.164    0.184
## Detection Rate          0.281    0.183    0.167    0.156    0.178
## Detection Prevalence    0.288    0.192    0.179    0.161    0.180
## Balanced Accuracy       0.989    0.966    0.973    0.972    0.982

Observation
The Boosting Algorithm with 10-Fold Cross Validation when checked against the Check Data / Test Data (part of original training dataset) shows the following
Overall Accuracy : 0.9643
Out-of-Sample Error: 0.9548
95% ConfInt Lower : 0.96
95% ConfInt Upper : 0.9683

Random Forests - Default Method With 10-Fold Cross Validation

In this section,
1. We will create a model based on Random Forests - Default Method With 10-Fold Cross Validation
2. View summary of the model
3. Plot Predictive / Important Variables
4. Compare this model using other (check) part of the training data

## set seed
set.seed(707)
## generate model
cvCtrl <- trainControl(method="cv",  number=10, allowParallel=T)
RfDmModel <- train(classe~ ., 
                  method="rf", 
                  data=PartTrain, 
                  verbose=F,
                  importance=T,
                  trControl=cvCtrl
                  )

## view model summary
summary(RfDmModel)

##                 Length Class      Mode     
## call                6  -none-     call     
## type                1  -none-     character
## predicted       11776  factor     numeric  
## err.rate         3000  -none-     numeric  
## confusion          30  -none-     numeric  
## votes           58880  matrix     numeric  
## oob.times       11776  -none-     numeric  
## classes             5  -none-     character
## importance        364  -none-     numeric  
## importanceSD      312  -none-     numeric  
## localImportance     0  -none-     NULL     
## proximity           0  -none-     NULL     
## ntree               1  -none-     numeric  
## mtry                1  -none-     numeric  
## forest             14  -none-     list     
## y               11776  factor     numeric  
## test                0  -none-     NULL     
## inbag               0  -none-     NULL     
## xNames             52  -none-     character
## problemType         1  -none-     character
## tuneValue           1  data.frame list     
## obsLevels           5  -none-     character

## plot model important variables
plot(varImp(RfDmModel))

plot of chunk view_default

Observation
The Default Method with 10-Fold Cross Validation generated a very accurate model as shown below
Accuracy : 0.9916
Kappa : 0.9894

Check Data / Test Data for Prediction

RfDmPredictn <- predict(RfDmModel, newdata=PartCheck)
RfDmConfMtrx <- confusionMatrix(RfDmPredictn, PartCheck$classe)
RfDmConfMtrx

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    A    B    C    D    E
##          A 2230   22    0    1    0
##          B    2 1491    3    0    1
##          C    0    5 1357   18    2
##          D    0    0    8 1267    5
##          E    0    0    0    0 1434
## 
## Overall Statistics
##                                         
##                Accuracy : 0.991         
##                  95% CI : (0.989, 0.993)
##     No Information Rate : 0.284         
##     P-Value [Acc > NIR] : <2e-16        
##                                         
##                   Kappa : 0.989         
##  Mcnemar's Test P-Value : NA            
## 
## Statistics by Class:
## 
##                      Class: A Class: B Class: C Class: D Class: E
## Sensitivity             0.999    0.982    0.992    0.985    0.994
## Specificity             0.996    0.999    0.996    0.998    1.000
## Pos Pred Value          0.990    0.996    0.982    0.990    1.000
## Neg Pred Value          1.000    0.996    0.998    0.997    0.999
## Prevalence              0.284    0.193    0.174    0.164    0.184
## Detection Rate          0.284    0.190    0.173    0.161    0.183
## Detection Prevalence    0.287    0.191    0.176    0.163    0.183
## Balanced Accuracy       0.998    0.991    0.994    0.992    0.997

Observation
The Default Method with 10-Fold Cross Validation when checked against the Check Data / Test Data (part of original training dataset) shows the following
Overall Accuracy : 0.9915
Out-of-Sample Error: 0.9892
95% ConfInt Lower : 0.9892
95% ConfInt Upper : 0.9934

Random Forests - Custom Algorithm (No Method & Tuning) With 10-Fold Cross Validation

In this section,
1. We will create a model based on Random Forests - Custom Algorithm (No Method & Tuning) with 10-Fold Cross Validation
2. View summary of the model
3. Plot Predictive / Important Variables
4. Compare this model using other (check) part of the training data

## set seed
set.seed(707)
## generate model
csCtrl <- trainControl(method="oob", number=10, allowParallel=T)
RfCaModel <- train(PartTrain$classe~., 
                   data=PartTrain, 
                   tuneGrid=data.frame(mtry=10),
                   trControl=csCtrl
                   )

## view model summary
summary(RfCaModel)

##                 Length Class      Mode     
## call                4  -none-     call     
## type                1  -none-     character
## predicted       11776  factor     numeric  
## err.rate         3000  -none-     numeric  
## confusion          30  -none-     numeric  
## votes           58880  matrix     numeric  
## oob.times       11776  -none-     numeric  
## classes             5  -none-     character
## importance         52  -none-     numeric  
## importanceSD        0  -none-     NULL     
## localImportance     0  -none-     NULL     
## proximity           0  -none-     NULL     
## ntree               1  -none-     numeric  
## mtry                1  -none-     numeric  
## forest             14  -none-     list     
## y               11776  factor     numeric  
## test                0  -none-     NULL     
## inbag               0  -none-     NULL     
## xNames             52  -none-     character
## problemType         1  -none-     character
## tuneValue           1  data.frame list     
## obsLevels           5  -none-     character

## plot model important variables
plot(varImp(RfCaModel))

plot of chunk info_custom Observation
The Custom Algorithm (No Method & Tuning) with 10-Fold Cross Validation generated a decent model as shown below Accuracy : 0.9939
Kappa : 0.9923

Check Data / Test Data for Prediction

RfCaPredictn <- predict(RfCaModel, newdata=PartCheck)
RfCaConfMtrx <- confusionMatrix(RfCaPredictn, PartCheck$classe)
RfCaConfMtrx

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    A    B    C    D    E
##          A 2230    8    0    0    0
##          B    2 1504    5    0    0
##          C    0    6 1359   19    0
##          D    0    0    4 1267    5
##          E    0    0    0    0 1437
## 
## Overall Statistics
##                                         
##                Accuracy : 0.994         
##                  95% CI : (0.992, 0.995)
##     No Information Rate : 0.284         
##     P-Value [Acc > NIR] : <2e-16        
##                                         
##                   Kappa : 0.992         
##  Mcnemar's Test P-Value : NA            
## 
## Statistics by Class:
## 
##                      Class: A Class: B Class: C Class: D Class: E
## Sensitivity             0.999    0.991    0.993    0.985    0.997
## Specificity             0.999    0.999    0.996    0.999    1.000
## Pos Pred Value          0.996    0.995    0.982    0.993    1.000
## Neg Pred Value          1.000    0.998    0.999    0.997    0.999
## Prevalence              0.284    0.193    0.174    0.164    0.184
## Detection Rate          0.284    0.192    0.173    0.161    0.183
## Detection Prevalence    0.285    0.193    0.176    0.163    0.183
## Balanced Accuracy       0.999    0.995    0.995    0.992    0.998

Observation
The Custom Algorithm (No Method & Tuning) with 10-Fold Cross Validation when checked against the Check Data / Test Data (part of original training dataset) shows the following
Overall Accuracy : 0.9938
Out-of-Sample Error: 0.9921
95% ConfInt Lower : 0.9918
95% ConfInt Upper : 0.9954

Conclusions

From the above Random Forests models we see the following

##           Model.Type Overall.Accuracy OutOfSample.Error
## 1 Boosting Algorithm           0.9643            0.9548
## 2     Default Method           0.9915            0.9892
## 3   Custom Algorithm           0.9938            0.9921

Observation
From the above table it is clear that both Default Method & Custom Algorithm are far superior models than Boosting Algorith.
We will use both these models to evaluate the performance with the Test Data Cases.

Performance With Test Data Cases

Custom Algorithm

DcCaPredict <- predict(RfCaModel, newdata=DataCases)
DcCaPredict

##  [1] B A B A A E D B A A B C B A E E A B B B
## Levels: A B C D E

Default Method

DcDmPredict <- predict(RfDmModel, newdata=DataCases)
DcDmPredict

##  [1] B A B A A E D B A A B C B A E E A B B B
## Levels: A B C D E

Observation
The Prediction Vetors generated by both models are identical.

Result Submission

For each test case we need to submit a text file with a single capital letter (A, B, C, D, or E) corresponding to our prediction for the corresponding problem in the test data set.

We need to create a folder where we want the files to be written. Set that to be the working directory and run the script. The functions given below will create one file for each submission.

pml_write_files = function(x){
  n = length(x)
  for(i in 1:n){
    filename = paste0("problem_id_",i,".txt")
    write.table(x[i],file=filename,quote=FALSE,row.names=FALSE,col.names=FALSE)
  }
}
pml_write_files(DcCaPredict)

End Of Report

Practical Machine Learning Project

Cyrus Lentin

Wednesday, November 19, 2014

Background

Overview

Data

Pre Process

Exploratory Data Analysis

Random Forests - Boosting Algorithm With 10-Fold Cross Validation

Random Forests - Default Method With 10-Fold Cross Validation

Random Forests - Custom Algorithm (No Method & Tuning) With 10-Fold Cross Validation

Conclusions

Performance With Test Data Cases

Result Submission